You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,6 +91,36 @@ This command will:
91
91
92
92
For detailed installation and configuration instructions, see the [Complete Documentation](https://vllm-semantic-router.com/docs/installation/).
93
93
94
+
### What This Starts By Default
95
+
96
+
`make docker-compose-up` now launches the full stack including a lightweight local OpenAI-compatible model server powered by **llm-katan** (serving the small model `Qwen/Qwen3-0.6B` under the alias `qwen3`). The semantic router is configured to route classification & default generations to this local endpoint out-of-the-box. This gives you an entirely self-contained experience (no external API keys required) while still letting you add remote / larger models later.
97
+
98
+
### Core Mode (Without Local Model)
99
+
100
+
If you only want the core semantic-router + Envoy + observability stack (and will point to external OpenAI-compatible endpoints yourself):
101
+
102
+
```bash
103
+
make docker-compose-up-core
104
+
```
105
+
106
+
### Prerequisite Model Download (Speeds Up First Run)
107
+
108
+
The existing model bootstrap targets now also pre-download the small llm-katan model so the first `docker-compose-up` avoids an on-demand Hugging Face fetch.
109
+
110
+
Minimal set (fast):
111
+
112
+
```bash
113
+
make models-download-minimal
114
+
```
115
+
116
+
Full set:
117
+
118
+
```bash
119
+
make models-download
120
+
```
121
+
122
+
Both create a stamp file once `Qwen/Qwen3-0.6B` is present to keep subsequent runs idempotent.
123
+
94
124
## Documentation 📖
95
125
96
126
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
Copy file name to clipboardExpand all lines: config/config.yaml
+19-19Lines changed: 19 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -32,13 +32,13 @@ prompt_guard:
32
32
# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
33
33
vllm_endpoints:
34
34
- name: "endpoint1"
35
-
address: "127.0.0.1"# IPv4 address - REQUIRED format
36
-
port: 8000
35
+
address: "172.28.0.20"#Static IPv4 of llm-katan within docker compose network
36
+
port: 8002
37
37
weight: 1
38
38
39
39
model_config:
40
-
"openai/gpt-oss-20b":
41
-
reasoning_family: "gpt-oss"# This model uses GPT-OSS reasoning syntax
40
+
"qwen3":
41
+
reasoning_family: "qwen3"# This model uses Qwen-3 reasoning syntax
42
42
preferred_endpoints: ["endpoint1"]
43
43
pii_policy:
44
44
allow_by_default: true
@@ -63,89 +63,89 @@ categories:
63
63
- name: business
64
64
system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations."
65
65
model_scores:
66
-
- model: openai/gpt-oss-20b
66
+
- model: qwen3
67
67
score: 0.7
68
68
use_reasoning: false # Business performs better without reasoning
69
69
- name: law
70
70
system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters."
71
71
model_scores:
72
-
- model: openai/gpt-oss-20b
72
+
- model: qwen3
73
73
score: 0.4
74
74
use_reasoning: false
75
75
- name: psychology
76
76
system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice."
77
77
model_scores:
78
-
- model: openai/gpt-oss-20b
78
+
- model: qwen3
79
79
score: 0.6
80
80
use_reasoning: false
81
81
- name: biology
82
82
system_prompt: "You are a biology expert with comprehensive knowledge spanning molecular biology, genetics, cell biology, ecology, evolution, anatomy, physiology, and biotechnology. Explain biological concepts with scientific accuracy, use appropriate terminology, and provide examples from current research. Connect biological principles to real-world applications and emphasize the interconnectedness of biological systems."
83
83
model_scores:
84
-
- model: openai/gpt-oss-20b
84
+
- model: qwen3
85
85
score: 0.9
86
86
use_reasoning: false
87
87
- name: chemistry
88
88
system_prompt: "You are a chemistry expert specializing in chemical reactions, molecular structures, and laboratory techniques. Provide detailed, step-by-step explanations."
89
89
model_scores:
90
-
- model: openai/gpt-oss-20b
90
+
- model: qwen3
91
91
score: 0.6
92
92
use_reasoning: true # Enable reasoning for complex chemistry
93
93
- name: history
94
94
system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis."
95
95
model_scores:
96
-
- model: openai/gpt-oss-20b
96
+
- model: qwen3
97
97
score: 0.7
98
98
use_reasoning: false
99
99
- name: other
100
100
system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics."
101
101
model_scores:
102
-
- model: openai/gpt-oss-20b
102
+
- model: qwen3
103
103
score: 0.7
104
104
use_reasoning: false
105
105
- name: health
106
106
system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies."
107
107
model_scores:
108
-
- model: openai/gpt-oss-20b
108
+
- model: qwen3
109
109
score: 0.5
110
110
use_reasoning: false
111
111
- name: economics
112
112
system_prompt: "You are an economics expert with deep understanding of microeconomics, macroeconomics, econometrics, financial markets, monetary policy, fiscal policy, international trade, and economic theory. Analyze economic phenomena using established economic principles, provide data-driven insights, and explain complex economic concepts in accessible terms. Consider both theoretical frameworks and real-world applications in your responses."
113
113
model_scores:
114
-
- model: openai/gpt-oss-20b
114
+
- model: qwen3
115
115
score: 1.0
116
116
use_reasoning: false
117
117
- name: math
118
118
system_prompt: "You are a mathematics expert. Provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way."
119
119
model_scores:
120
-
- model: openai/gpt-oss-20b
120
+
- model: qwen3
121
121
score: 1.0
122
122
use_reasoning: true # Enable reasoning for complex math
123
123
- name: physics
124
124
system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate."
125
125
model_scores:
126
-
- model: openai/gpt-oss-20b
126
+
- model: qwen3
127
127
score: 0.7
128
128
use_reasoning: true # Enable reasoning for physics
129
129
- name: computer science
130
130
system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful."
131
131
model_scores:
132
-
- model: openai/gpt-oss-20b
132
+
- model: qwen3
133
133
score: 0.6
134
134
use_reasoning: false
135
135
- name: philosophy
136
136
system_prompt: "You are a philosophy expert with comprehensive knowledge of philosophical traditions, ethical theories, logic, metaphysics, epistemology, political philosophy, and the history of philosophical thought. Engage with complex philosophical questions by presenting multiple perspectives, analyzing arguments rigorously, and encouraging critical thinking. Draw connections between philosophical concepts and contemporary issues while maintaining intellectual honesty about the complexity and ongoing nature of philosophical debates."
137
137
model_scores:
138
-
- model: openai/gpt-oss-20b
138
+
- model: qwen3
139
139
score: 0.5
140
140
use_reasoning: false
141
141
- name: engineering
142
142
system_prompt: "You are an engineering expert with knowledge across multiple engineering disciplines including mechanical, electrical, civil, chemical, software, and systems engineering. Apply engineering principles, design methodologies, and problem-solving approaches to provide practical solutions. Consider safety, efficiency, sustainability, and cost-effectiveness in your recommendations. Use technical precision while explaining concepts clearly, and emphasize the importance of proper engineering practices and standards."
0 commit comments