Skip to content

Commit ce99067

Browse files
JaredforRealrootfs
andauthored
feat: make llm-katan as default in docker compose up (#426)
* feat: make llm-katan as default in docker compose up Signed-off-by: JaredforReal <w13431838023@gmail.com> * revert semantic-router image Signed-off-by: JaredforReal <w13431838023@gmail.com> * revert semantic-router image Signed-off-by: JaredforReal <w13431838023@gmail.com> --------- Signed-off-by: JaredforReal <w13431838023@gmail.com> Co-authored-by: Huamin Chen <rootfs@users.noreply.github.com>
1 parent ac1f367 commit ce99067

File tree

7 files changed

+101
-30
lines changed

7 files changed

+101
-30
lines changed

Dockerfile.extproc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,10 @@ RUN mkdir -p src/semantic-router
6262
COPY src/semantic-router/go.mod src/semantic-router/go.sum src/semantic-router/
6363
COPY candle-binding/go.mod candle-binding/semantic-router.go candle-binding/
6464

65+
# Pre-download modules to fail fast if mirrors are unreachable
66+
RUN cd src/semantic-router && go mod download && \
67+
cd /app/candle-binding && go mod download
68+
6569
# Copy semantic-router source code
6670
COPY src/semantic-router/ src/semantic-router/
6771

README.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,36 @@ This command will:
9191

9292
For detailed installation and configuration instructions, see the [Complete Documentation](https://vllm-semantic-router.com/docs/installation/).
9393

94+
### What This Starts By Default
95+
96+
`make docker-compose-up` now launches the full stack including a lightweight local OpenAI-compatible model server powered by **llm-katan** (serving the small model `Qwen/Qwen3-0.6B` under the alias `qwen3`). The semantic router is configured to route classification & default generations to this local endpoint out-of-the-box. This gives you an entirely self-contained experience (no external API keys required) while still letting you add remote / larger models later.
97+
98+
### Core Mode (Without Local Model)
99+
100+
If you only want the core semantic-router + Envoy + observability stack (and will point to external OpenAI-compatible endpoints yourself):
101+
102+
```bash
103+
make docker-compose-up-core
104+
```
105+
106+
### Prerequisite Model Download (Speeds Up First Run)
107+
108+
The existing model bootstrap targets now also pre-download the small llm-katan model so the first `docker-compose-up` avoids an on-demand Hugging Face fetch.
109+
110+
Minimal set (fast):
111+
112+
```bash
113+
make models-download-minimal
114+
```
115+
116+
Full set:
117+
118+
```bash
119+
make models-download
120+
```
121+
122+
Both create a stamp file once `Qwen/Qwen3-0.6B` is present to keep subsequent runs idempotent.
123+
94124
## Documentation 📖
95125

96126
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:

config/config.yaml

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -32,13 +32,13 @@ prompt_guard:
3232
# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
3333
vllm_endpoints:
3434
- name: "endpoint1"
35-
address: "127.0.0.1" # IPv4 address - REQUIRED format
36-
port: 8000
35+
address: "172.28.0.20" # Static IPv4 of llm-katan within docker compose network
36+
port: 8002
3737
weight: 1
3838

3939
model_config:
40-
"openai/gpt-oss-20b":
41-
reasoning_family: "gpt-oss" # This model uses GPT-OSS reasoning syntax
40+
"qwen3":
41+
reasoning_family: "qwen3" # This model uses Qwen-3 reasoning syntax
4242
preferred_endpoints: ["endpoint1"]
4343
pii_policy:
4444
allow_by_default: true
@@ -63,89 +63,89 @@ categories:
6363
- name: business
6464
system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations."
6565
model_scores:
66-
- model: openai/gpt-oss-20b
66+
- model: qwen3
6767
score: 0.7
6868
use_reasoning: false # Business performs better without reasoning
6969
- name: law
7070
system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters."
7171
model_scores:
72-
- model: openai/gpt-oss-20b
72+
- model: qwen3
7373
score: 0.4
7474
use_reasoning: false
7575
- name: psychology
7676
system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice."
7777
model_scores:
78-
- model: openai/gpt-oss-20b
78+
- model: qwen3
7979
score: 0.6
8080
use_reasoning: false
8181
- name: biology
8282
system_prompt: "You are a biology expert with comprehensive knowledge spanning molecular biology, genetics, cell biology, ecology, evolution, anatomy, physiology, and biotechnology. Explain biological concepts with scientific accuracy, use appropriate terminology, and provide examples from current research. Connect biological principles to real-world applications and emphasize the interconnectedness of biological systems."
8383
model_scores:
84-
- model: openai/gpt-oss-20b
84+
- model: qwen3
8585
score: 0.9
8686
use_reasoning: false
8787
- name: chemistry
8888
system_prompt: "You are a chemistry expert specializing in chemical reactions, molecular structures, and laboratory techniques. Provide detailed, step-by-step explanations."
8989
model_scores:
90-
- model: openai/gpt-oss-20b
90+
- model: qwen3
9191
score: 0.6
9292
use_reasoning: true # Enable reasoning for complex chemistry
9393
- name: history
9494
system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis."
9595
model_scores:
96-
- model: openai/gpt-oss-20b
96+
- model: qwen3
9797
score: 0.7
9898
use_reasoning: false
9999
- name: other
100100
system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics."
101101
model_scores:
102-
- model: openai/gpt-oss-20b
102+
- model: qwen3
103103
score: 0.7
104104
use_reasoning: false
105105
- name: health
106106
system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies."
107107
model_scores:
108-
- model: openai/gpt-oss-20b
108+
- model: qwen3
109109
score: 0.5
110110
use_reasoning: false
111111
- name: economics
112112
system_prompt: "You are an economics expert with deep understanding of microeconomics, macroeconomics, econometrics, financial markets, monetary policy, fiscal policy, international trade, and economic theory. Analyze economic phenomena using established economic principles, provide data-driven insights, and explain complex economic concepts in accessible terms. Consider both theoretical frameworks and real-world applications in your responses."
113113
model_scores:
114-
- model: openai/gpt-oss-20b
114+
- model: qwen3
115115
score: 1.0
116116
use_reasoning: false
117117
- name: math
118118
system_prompt: "You are a mathematics expert. Provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way."
119119
model_scores:
120-
- model: openai/gpt-oss-20b
120+
- model: qwen3
121121
score: 1.0
122122
use_reasoning: true # Enable reasoning for complex math
123123
- name: physics
124124
system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate."
125125
model_scores:
126-
- model: openai/gpt-oss-20b
126+
- model: qwen3
127127
score: 0.7
128128
use_reasoning: true # Enable reasoning for physics
129129
- name: computer science
130130
system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful."
131131
model_scores:
132-
- model: openai/gpt-oss-20b
132+
- model: qwen3
133133
score: 0.6
134134
use_reasoning: false
135135
- name: philosophy
136136
system_prompt: "You are a philosophy expert with comprehensive knowledge of philosophical traditions, ethical theories, logic, metaphysics, epistemology, political philosophy, and the history of philosophical thought. Engage with complex philosophical questions by presenting multiple perspectives, analyzing arguments rigorously, and encouraging critical thinking. Draw connections between philosophical concepts and contemporary issues while maintaining intellectual honesty about the complexity and ongoing nature of philosophical debates."
137137
model_scores:
138-
- model: openai/gpt-oss-20b
138+
- model: qwen3
139139
score: 0.5
140140
use_reasoning: false
141141
- name: engineering
142142
system_prompt: "You are an engineering expert with knowledge across multiple engineering disciplines including mechanical, electrical, civil, chemical, software, and systems engineering. Apply engineering principles, design methodologies, and problem-solving approaches to provide practical solutions. Consider safety, efficiency, sustainability, and cost-effectiveness in your recommendations. Use technical precision while explaining concepts clearly, and emphasize the importance of proper engineering practices and standards."
143143
model_scores:
144-
- model: openai/gpt-oss-20b
144+
- model: qwen3
145145
score: 0.7
146146
use_reasoning: false
147147

148-
default_model: openai/gpt-oss-20b
148+
default_model: "qwen3"
149149

150150
# Reasoning family configurations
151151
reasoning_families:

deploy/docker-compose/docker-compose.yml

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,12 @@ services:
99
volumes:
1010
- ../../config:/app/config:ro
1111
- ../../models:/app/models:ro
12+
- ~/.cache/huggingface:/root/.cache/huggingface
1213
environment:
1314
- LD_LIBRARY_PATH=/app/lib
1415
- CONFIG_FILE=${CONFIG_FILE:-/app/config/config.yaml}
16+
- HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface
17+
- HF_HUB_ENABLE_HF_TRANSFER=1
1518
networks:
1619
- semantic-network
1720
healthcheck:
@@ -134,18 +137,27 @@ services:
134137

135138
# LLM Katan service for testing
136139
llm-katan:
137-
build:
138-
context: ../../e2e-tests/llm-katan
139-
dockerfile: Dockerfile
140+
image: ${LLM_KATAN_IMAGE:-ghcr.io/vllm-project/semantic-router/llm-katan:latest}
140141
container_name: llm-katan
141142
profiles: ["testing", "llm-katan"]
142143
ports:
143-
- "8002:8000"
144+
- "8002:8002"
144145
environment:
145146
- HUGGINGFACE_HUB_TOKEN=${HUGGINGFACE_HUB_TOKEN:-}
147+
- HF_HUB_ENABLE_HF_TRANSFER=1
148+
volumes:
149+
- ../../models:/app/models:ro
150+
- hf-cache:/home/llmkatan/.cache/huggingface
146151
networks:
147-
- semantic-network
148-
command: ["llm-katan", "--model", "Qwen/Qwen3-0.6B", "--host", "0.0.0.0", "--port", "8000"]
152+
semantic-network:
153+
ipv4_address: 172.28.0.20
154+
command: ["llm-katan", "--model", "/app/models/Qwen/Qwen3-0.6B", "--served-model-name", "qwen3", "--host", "0.0.0.0", "--port", "8002"]
155+
healthcheck:
156+
test: ["CMD", "curl", "-fsS", "http://localhost:8002/health"]
157+
interval: 10s
158+
timeout: 5s
159+
retries: 5
160+
start_period: 10s
149161

150162
# Semantic Router Dashboard
151163
dashboard:
@@ -202,3 +214,4 @@ volumes:
202214
grafana-data:
203215
openwebui-data:
204216
openwebui-pipelines:
217+
hf-cache:

tools/make/docker.mk

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ BUILD_FLAG=$(if $(REBUILD),--build,)
9999
# Docker compose shortcuts (no rebuild by default)
100100
docker-compose-up:
101101
@$(LOG_TARGET)
102-
@echo "Starting services with docker-compose (REBUILD=$(REBUILD))..."
103-
@docker compose up -d $(BUILD_FLAG)
102+
@echo "Starting services with docker-compose (default includes llm-katan) (REBUILD=$(REBUILD))..."
103+
@docker compose --profile llm-katan up -d $(BUILD_FLAG)
104104

105105
docker-compose-up-testing:
106106
@$(LOG_TARGET)
@@ -112,6 +112,12 @@ docker-compose-up-llm-katan:
112112
@echo "Starting services with llm-katan profile (REBUILD=$(REBUILD))..."
113113
@docker compose --profile llm-katan up -d $(BUILD_FLAG)
114114

115+
# Start core services only (closer to production; excludes llm-katan)
116+
docker-compose-up-core:
117+
@$(LOG_TARGET)
118+
@echo "Starting core services (no llm-katan) (REBUILD=$(REBUILD))..."
119+
@docker compose up -d $(BUILD_FLAG)
120+
115121
# Explicit rebuild targets for convenience
116122
docker-compose-rebuild: REBUILD=1
117123
docker-compose-rebuild: docker-compose-up
@@ -139,7 +145,8 @@ docker-help:
139145
@echo " docker-run-llm-katan - Run llm-katan Docker image locally"
140146
@echo " docker-run-llm-katan-custom SERVED_NAME=name - Run with custom served model name"
141147
@echo " docker-clean - Clean up Docker images"
142-
@echo " docker-compose-up - Start services (add REBUILD=1 to rebuild)"
148+
@echo " docker-compose-up - Start services (default includes llm-katan; REBUILD=1 to rebuild)"
149+
@echo " docker-compose-up-core - Start core services only (no llm-katan)"
143150
@echo " docker-compose-up-testing - Start with testing profile (REBUILD=1 optional)"
144151
@echo " docker-compose-up-llm-katan - Start with llm-katan profile (REBUILD=1 optional)"
145152
@echo " docker-compose-rebuild - Force rebuild then start"

tools/make/linter.mk

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,20 @@ docs-lint-fix: docs-install
1212

1313
markdown-lint:
1414
@$(LOG_TARGET)
15-
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" --ignore node_modules --ignore website/node_modules --ignore dashboard/frontend/node_modules
15+
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" \
16+
--ignore node_modules \
17+
--ignore website/node_modules \
18+
--ignore dashboard/frontend/node_modules \
19+
--ignore models
1620

1721
markdown-lint-fix:
1822
@$(LOG_TARGET)
19-
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" --ignore node_modules --ignore website/node_modules --ignore dashboard/frontend/node_modules --fix
23+
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" \
24+
--ignore node_modules \
25+
--ignore website/node_modules \
26+
--ignore dashboard/frontend/node_modules \
27+
--ignore models \
28+
--fix
2029

2130
yaml-lint:
2231
@$(LOG_TARGET)

tools/make/models.mk

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ download-models:
2424

2525
download-models-minimal:
2626
@mkdir -p models
27+
# Pre-download tiny LLM for llm-katan (optional but speeds up first start)
28+
@if [ ! -f "models/Qwen/Qwen3-0.6B/.downloaded" ] || [ ! -d "models/Qwen/Qwen3-0.6B" ]; then \
29+
hf download Qwen/Qwen3-0.6B --local-dir models/Qwen/Qwen3-0.6B && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/Qwen/Qwen3-0.6B/.downloaded; \
30+
fi
2731
@if [ ! -f "models/category_classifier_modernbert-base_model/.downloaded" ] || [ ! -d "models/category_classifier_modernbert-base_model" ]; then \
2832
hf download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir models/category_classifier_modernbert-base_model && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/category_classifier_modernbert-base_model/.downloaded; \
2933
fi
@@ -41,6 +45,10 @@ download-models-minimal:
4145

4246
download-models-full:
4347
@mkdir -p models
48+
# Pre-download tiny LLM for llm-katan (optional but speeds up first start)
49+
@if [ ! -f "models/Qwen/Qwen3-0.6B/.downloaded" ] || [ ! -d "models/Qwen/Qwen3-0.6B" ]; then \
50+
hf download Qwen/Qwen3-0.6B --local-dir models/Qwen/Qwen3-0.6B && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/Qwen/Qwen3-0.6B/.downloaded; \
51+
fi
4452
@if [ ! -f "models/category_classifier_modernbert-base_model/.downloaded" ] || [ ! -d "models/category_classifier_modernbert-base_model" ]; then \
4553
hf download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir models/category_classifier_modernbert-base_model && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/category_classifier_modernbert-base_model/.downloaded; \
4654
fi

0 commit comments

Comments
 (0)