Skip to content

Commit dd391fd

Browse files
yossiovadiaclauderootfs
authored
fix: correct yaml linting hook and fix trailing spaces/comment spacing (#611)
* fix: correct yaml linting hook and fix trailing spaces/comment spacing This PR addresses two issues: 1. **Fixed pre-commit hook configuration bug** - Changed line 57 in `.pre-commit-config.yaml` to call `make yaml-lint` instead of `make markdown-lint` 2. **Fixed simple YAML linting errors** - Applied automated fixes for: - Trailing whitespace in YAML files - Comment spacing (ensuring 2 spaces before inline comments) ## Problem The bug in `.pre-commit-config.yaml` caused: - ❌ YAML files not being properly linted locally - ✅ GitHub Actions CI catching the issues - 🤔 PRs failing in CI even though `pre-commit run --all-files` passed locally - 😓 Contributors forced to fix pre-existing YAML issues ## Changes 1. Changed `.pre-commit-config.yaml` line 57 from `make markdown-lint` to `make yaml-lint` 2. Fixed trailing spaces and comment spacing in 22 YAML files ## Note on Remaining Issues Some YAML files still have indentation errors that require more careful manual fixes. These can be addressed in follow-up PRs as files are modified. The important fix here is that local pre-commit checks now match CI checks. Fixes #608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix: resolve YAML indentation errors and exclude .venv from linting This commit fixes the pre-commit failures from the previous commit by: 1. **Fixed YAML indentation errors** in 16 Kubernetes/OpenShift deployment files - Re-parsed and reformatted YAML files with proper 2-space indentation - Fixed wrong indentation issues flagged by yamllint 2. **Excluded .venv from yamllint checks** - Added `.venv` to the ignore list in `tools/linter/yaml/.yamllint` - Prevents linting errors from third-party dependencies in virtual environment Files fixed: - deploy/kubernetes/ai-gateway/aigw-resources/* - deploy/kubernetes/aibrix/aigw-resources/* - deploy/kubernetes/istio/* - deploy/kubernetes/llmd-base/* - deploy/openshift/observability/prometheus/deployment.yaml - deploy/openshift/template.yaml Pre-commit now passes successfully. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> --------- Signed-off-by: Yossi Ovadia <yovadia@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
1 parent 4fc69b5 commit dd391fd

39 files changed

+599
-663
lines changed

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ repos:
5454
hooks:
5555
- id: yaml-and-yml-fmt
5656
name: yaml/yml fmt
57-
entry: bash -c "make markdown-lint"
57+
entry: bash -c "make yaml-lint"
5858
language: system
5959
files: \.(yaml|yml)$
6060
exclude: ^(\node_modules/)

config/config.yaml

Lines changed: 40 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,21 @@ bert_model:
55

66
semantic_cache:
77
enabled: true
8-
backend_type: "memory" # Options: "memory", "milvus", or "hybrid"
8+
backend_type: "memory" # Options: "memory", "milvus", or "hybrid"
99
similarity_threshold: 0.8
10-
max_entries: 1000 # Only applies to memory backend
10+
max_entries: 1000 # Only applies to memory backend
1111
ttl_seconds: 3600
1212
eviction_policy: "fifo"
1313
# HNSW index configuration (for memory backend only)
14-
use_hnsw: true # Enable HNSW index for faster similarity search
15-
hnsw_m: 16 # Number of bi-directional links (higher = better recall, more memory)
16-
hnsw_ef_construction: 200 # Construction parameter (higher = better quality, slower build)
17-
14+
use_hnsw: true # Enable HNSW index for faster similarity search
15+
hnsw_m: 16 # Number of bi-directional links (higher = better recall, more memory)
16+
hnsw_ef_construction: 200 # Construction parameter (higher = better quality, slower build)
17+
1818
# Hybrid cache configuration (when backend_type: "hybrid")
1919
# Combines in-memory HNSW for fast search with Milvus for scalable storage
2020
# max_memory_entries: 100000 # Max entries in HNSW index (default: 100,000)
2121
# backend_config_path: "config/milvus.yaml" # Path to Milvus config
22-
22+
2323
# Embedding model for semantic similarity matching
2424
# Options: "bert" (fast, 384-dim), "qwen3" (high quality, 1024-dim, 32K context), "gemma" (balanced, 768-dim, 8K context)
2525
# Default: "bert" (fastest, lowest memory)
@@ -46,13 +46,13 @@ prompt_guard:
4646
# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
4747
vllm_endpoints:
4848
- name: "endpoint1"
49-
address: "172.28.0.20" # Static IPv4 of llm-katan within docker compose network
49+
address: "172.28.0.20" # Static IPv4 of llm-katan within docker compose network
5050
port: 8002
5151
weight: 1
5252

5353
model_config:
5454
"qwen3":
55-
reasoning_family: "qwen3" # This model uses Qwen-3 reasoning syntax
55+
reasoning_family: "qwen3" # This model uses Qwen-3 reasoning syntax
5656
preferred_endpoints: ["endpoint1"] # Optional: omit to let upstream handle endpoint selection
5757
pii_policy:
5858
allow_by_default: true
@@ -81,7 +81,7 @@ categories:
8181
model_scores:
8282
- model: qwen3
8383
score: 0.7
84-
use_reasoning: false # Business performs better without reasoning
84+
use_reasoning: false # Business performs better without reasoning
8585
- name: law
8686
system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters."
8787
model_scores:
@@ -91,7 +91,7 @@ categories:
9191
- name: psychology
9292
system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice."
9393
semantic_cache_enabled: true
94-
semantic_cache_similarity_threshold: 0.92 # High threshold for psychology - sensitive to nuances
94+
semantic_cache_similarity_threshold: 0.92 # High threshold for psychology - sensitive to nuances
9595
model_scores:
9696
- model: qwen3
9797
score: 0.6
@@ -107,7 +107,7 @@ categories:
107107
model_scores:
108108
- model: qwen3
109109
score: 0.6
110-
use_reasoning: true # Enable reasoning for complex chemistry
110+
use_reasoning: true # Enable reasoning for complex chemistry
111111
- name: history
112112
system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis."
113113
model_scores:
@@ -117,15 +117,15 @@ categories:
117117
- name: other
118118
system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics."
119119
semantic_cache_enabled: true
120-
semantic_cache_similarity_threshold: 0.75 # Lower threshold for general chat - less sensitive
120+
semantic_cache_similarity_threshold: 0.75 # Lower threshold for general chat - less sensitive
121121
model_scores:
122122
- model: qwen3
123123
score: 0.7
124124
use_reasoning: false
125125
- name: health
126126
system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies."
127127
semantic_cache_enabled: true
128-
semantic_cache_similarity_threshold: 0.95 # High threshold for health - very sensitive to word changes
128+
semantic_cache_similarity_threshold: 0.95 # High threshold for health - very sensitive to word changes
129129
model_scores:
130130
- model: qwen3
131131
score: 0.5
@@ -141,13 +141,13 @@ categories:
141141
model_scores:
142142
- model: qwen3
143143
score: 1.0
144-
use_reasoning: true # Enable reasoning for complex math
144+
use_reasoning: true # Enable reasoning for complex math
145145
- name: physics
146146
system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate."
147147
model_scores:
148148
- model: qwen3
149149
score: 0.7
150-
use_reasoning: true # Enable reasoning for physics
150+
use_reasoning: true # Enable reasoning for physics
151151
- name: computer science
152152
system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful."
153153
model_scores:
@@ -195,24 +195,24 @@ router:
195195
lora_default_success_rate: 0.98
196196
traditional_default_success_rate: 0.95
197197
# Scoring weights for intelligent path selection (balanced approach)
198-
multi_task_lora_weight: 0.30 # LoRA advantage for multi-task processing
199-
single_task_traditional_weight: 0.30 # Traditional advantage for single tasks
200-
large_batch_lora_weight: 0.25 # LoRA advantage for large batches (≥4)
201-
small_batch_traditional_weight: 0.25 # Traditional advantage for single items
202-
medium_batch_weight: 0.10 # Neutral weight for medium batches (2-3)
203-
high_confidence_lora_weight: 0.25 # LoRA advantage for high confidence (≥0.99)
204-
low_confidence_traditional_weight: 0.25 # Traditional for lower confidence (≤0.9)
205-
low_latency_lora_weight: 0.30 # LoRA advantage for low latency (≤2000ms)
206-
high_latency_traditional_weight: 0.10 # Traditional acceptable for relaxed timing
207-
performance_history_weight: 0.20 # Historical performance comparison factor
198+
multi_task_lora_weight: 0.30 # LoRA advantage for multi-task processing
199+
single_task_traditional_weight: 0.30 # Traditional advantage for single tasks
200+
large_batch_lora_weight: 0.25 # LoRA advantage for large batches (≥4)
201+
small_batch_traditional_weight: 0.25 # Traditional advantage for single items
202+
medium_batch_weight: 0.10 # Neutral weight for medium batches (2-3)
203+
high_confidence_lora_weight: 0.25 # LoRA advantage for high confidence (≥0.99)
204+
low_confidence_traditional_weight: 0.25 # Traditional for lower confidence (≤0.9)
205+
low_latency_lora_weight: 0.30 # LoRA advantage for low latency (≤2000ms)
206+
high_latency_traditional_weight: 0.10 # Traditional acceptable for relaxed timing
207+
performance_history_weight: 0.20 # Historical performance comparison factor
208208
# Traditional model specific configurations
209-
traditional_bert_confidence_threshold: 0.95 # Traditional BERT confidence threshold
210-
traditional_modernbert_confidence_threshold: 0.8 # Traditional ModernBERT confidence threshold
211-
traditional_pii_detection_threshold: 0.5 # Traditional PII detection confidence threshold
209+
traditional_bert_confidence_threshold: 0.95 # Traditional BERT confidence threshold
210+
traditional_modernbert_confidence_threshold: 0.8 # Traditional ModernBERT confidence threshold
211+
traditional_pii_detection_threshold: 0.5 # Traditional PII detection confidence threshold
212212
traditional_token_classification_threshold: 0.9 # Traditional token classification threshold
213-
traditional_dropout_prob: 0.1 # Traditional model dropout probability
214-
traditional_attention_dropout_prob: 0.1 # Traditional model attention dropout probability
215-
tie_break_confidence: 0.5 # Confidence value for tie-breaking situations
213+
traditional_dropout_prob: 0.1 # Traditional model dropout probability
214+
traditional_attention_dropout_prob: 0.1 # Traditional model attention dropout probability
215+
tie_break_confidence: 0.5 # Confidence value for tie-breaking situations
216216

217217
default_model: qwen3
218218

@@ -253,7 +253,7 @@ api:
253253

254254
# Embedding Models Configuration
255255
# These models provide intelligent embedding generation with automatic routing:
256-
# - Qwen3-Embedding-0.6B: Up to 32K context, high quality,
256+
# - Qwen3-Embedding-0.6B: Up to 32K context, high quality,
257257
# - EmbeddingGemma-300M: Up to 8K context, fast inference, Matryoshka support (768/512/256/128)
258258
embedding_models:
259259
qwen3_model_path: "models/Qwen3-Embedding-0.6B"
@@ -263,15 +263,15 @@ embedding_models:
263263
# Observability Configuration
264264
observability:
265265
tracing:
266-
enabled: true # Enable distributed tracing for docker-compose stack
267-
provider: "opentelemetry" # Provider: opentelemetry, openinference, openllmetry
266+
enabled: true # Enable distributed tracing for docker-compose stack
267+
provider: "opentelemetry" # Provider: opentelemetry, openinference, openllmetry
268268
exporter:
269-
type: "otlp" # Export spans to Jaeger (via OTLP gRPC)
270-
endpoint: "jaeger:4317" # Jaeger collector inside compose network
271-
insecure: true # Use insecure connection (no TLS)
269+
type: "otlp" # Export spans to Jaeger (via OTLP gRPC)
270+
endpoint: "jaeger:4317" # Jaeger collector inside compose network
271+
insecure: true # Use insecure connection (no TLS)
272272
sampling:
273-
type: "always_on" # Sampling: always_on, always_off, probabilistic
274-
rate: 1.0 # Sampling rate for probabilistic (0.0-1.0)
273+
type: "always_on" # Sampling: always_on, always_off, probabilistic
274+
rate: 1.0 # Sampling rate for probabilistic (0.0-1.0)
275275
resource:
276276
service_name: "vllm-semantic-router"
277277
service_version: "v0.1.0"

0 commit comments

Comments
 (0)