Skip to content

Commit 9ff50de

Browse files
yossiovadiaclaude
andauthored
feat: add OpenShift demo scripts and documentation (#446)
* feat: add OpenShift demo scripts and documentation Add comprehensive demo toolkit for semantic router capabilities: - Interactive demo script (demo-semantic-router.py) with menu options: - Single classification (cache demo with fixed prompt) - All classifications (10 golden prompts) - PII detection test - Jailbreak detection test - Run all tests - Live log viewers: - live-semantic-router-logs.sh: Envoy traffic with routing decisions - live-classifier-logs.sh: Classification API activity - Demo utilities: - curl-examples.sh: Quick classification examples - cache-management.sh: Cache status and clearing - Documentation: - DEMO-README.md: Complete demo guide with setup instructions - CATEGORY-MODEL-MAPPING.md: Category to model routing reference All scripts use dynamic URL discovery from OpenShift routes (requires oc login). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix(envoy): use ORIGINAL_DST cluster for dynamic routing Replace static cluster routing with ORIGINAL_DST cluster type to fix routing bug where all requests were going to model_b_cluster. **Problem:** - Envoy evaluates routes BEFORE ExtProc filter runs - Header-based routing never matched because header wasn't set yet - All requests fell through to default route (model_b_cluster) - Router selected Model-A but Envoy routed to Model-B **Solution:** - Use ORIGINAL_DST cluster with use_http_header: true - Cluster reads x-gateway-destination-endpoint header set by ExtProc - Routes to correct endpoint (127.0.0.1:8000 or 8001) dynamically **Testing:** Verified with Envoy logs showing: - selected_model: Model-A, upstream_host: 127.0.0.1:8001 (WRONG - before fix) - After fix: destination determined by header value This aligns OpenShift config with local config/envoy.yaml approach. Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * feat(demo): add reasoning showcase test to OpenShift demo Add interactive test showcasing Chain-of-Thought (CoT) reasoning vs standard routing: - 2 reasoning-enabled examples (math, chemistry with use_reasoning: true) - 1 reasoning-disabled example (history with use_reasoning: false) - Summary statistics showing success rates for each mode - Clear visual distinction between CoT and standard routing This helps demonstrate how the semantic router intelligently routes prompts that require multi-step reasoning vs factual queries. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Yossi Ovadia <yovadia@redhat.com> * fix: apply markdown linting fixes to demo documentation Signed-off-by: Yossi Ovadia <yovadia@redhat.com> --------- Signed-off-by: Yossi Ovadia <yovadia@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent b5e81f4 commit 9ff50de

File tree

9 files changed

+1539
-57
lines changed

9 files changed

+1539
-57
lines changed
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Category to Model Mapping
2+
3+
**Configuration File:** [deploy/openshift/config-openshift.yaml](../config-openshift.yaml)
4+
5+
## Model-A Categories (Default Model)
6+
7+
Model-A handles **9 categories** (primarily science and technical topics):
8+
9+
| Category | Score | Reasoning Enabled | Description |
10+
|----------|-------|-------------------|-------------|
11+
| **math** | 1.0 | ✅ Yes | Mathematics expert with step-by-step solutions |
12+
| **economics** | 1.0 | ❌ No | Economics expert (micro, macro, policy) |
13+
| **biology** | 0.9 | ❌ No | Biology expert (molecular, genetics, ecology) |
14+
| **physics** | 0.7 | ✅ Yes | Physics expert with mathematical derivations |
15+
| **history** | 0.7 | ❌ No | Historian across different periods and cultures |
16+
| **engineering** | 0.7 | ❌ No | Engineering expert (mechanical, electrical, civil, etc.) |
17+
| **other** | 0.7 | ❌ No | General helpful assistant (fallback) |
18+
| **chemistry** | 0.6 | ✅ Yes | Chemistry expert with lab techniques |
19+
| **computer science** | 0.6 | ❌ No | Computer science expert (algorithms, programming) |
20+
21+
---
22+
23+
## Model-B Categories
24+
25+
Model-B handles **5 categories** (primarily social sciences and humanities):
26+
27+
| Category | Score | Reasoning Enabled | Description |
28+
|----------|-------|-------------------|-------------|
29+
| **business** | 0.7 | ❌ No | Business consultant and strategic advisor |
30+
| **psychology** | 0.6 | ❌ No | Psychology expert (cognitive, behavioral, mental health) |
31+
| **health** | 0.5 | ❌ No | Health and medical information expert |
32+
| **philosophy** | 0.5 | ❌ No | Philosophy expert (ethics, logic, metaphysics) |
33+
| **law** | 0.4 | ❌ No | Legal expert (case law, statutory interpretation) |
34+
35+
---
36+
37+
## Prompts Routing (Tested & Verified)
38+
39+
These prompts have **100% classification accuracy** and route as follows:
40+
41+
| Category | Example Prompt | Routes To | Confidence |
42+
|----------|---------------|-----------|------------|
43+
| **Math** | "Is 17 a prime number?" | Model-B* | ~0.326 |
44+
| **Chemistry** | "What are atoms made of?" | Model-B* | ~0.196 |
45+
| **Chemistry** | "Explain oxidation and reduction" | Model-B* | ~0.200 |
46+
| **Chemistry** | "Explain chemical equilibrium" | Model-B* | ~0.197 |
47+
| **History** | "What were the main causes of World War I?" | Model-B* | ~0.218 |
48+
| **History** | "What was the Cold War?" | Model-B* | ~0.219 |
49+
| **Psychology** | "What is the nature vs nurture debate?" | Model-B | ~0.391 |
50+
| **Psychology** | "What are the stages of grief?" | Model-B | ~0.403 |
51+
| **Health** | "How to maintain a healthy lifestyle?" | Model-B | ~0.221 |
52+
| **Health** | "What is a balanced diet?" | Model-B | ~0.268 |
53+
54+
---
55+
56+
## Reasoning Mode (Chain-of-Thought)
57+
58+
Categories with **reasoning enabled** use extended thinking for complex problems:
59+
60+
-**Math** (Model-A) - Step-by-step mathematical solutions
61+
-**Chemistry** (Model-A) - Complex chemical reactions and analysis
62+
-**Physics** (Model-A) - Mathematical derivations and proofs
63+
64+
---
65+
66+
## Default Behavior
67+
68+
- **Default Model:** Model-A
69+
- **Fallback Category:** "other" (score: 0.7)
70+
- **Unmatched queries** route to Model-A with the "other" category system prompt
71+
72+
### Key Parameters:
73+
74+
- **name:** Category identifier
75+
- **system_prompt:** Specialized prompt for this category
76+
- **model_scores.model:** Target model (Model-A or Model-B)
77+
- **model_scores.score:** Routing priority (0.0 to 1.0)
78+
- **use_reasoning:** Enable extended thinking mode
79+
80+
---
81+
82+
## Confidence Scores Explained
83+
84+
**Why are confidence scores low (0.2-0.4)?**
85+
86+
1. **Softmax across 14 categories** - Even the "winning" category may only get 20-40% probability
87+
2. **Relative, not absolute** - Scores are compared against other categories
88+
3. **Consistency matters** - Same prompt always gets same category (100% in our tests)
89+
4. **Highest score wins** - 0.326 for "math" means it beat all other 13 categories
90+
91+
**What's important:**
92+
93+
- ✅ Classification is **consistent** across multiple runs
94+
- ✅ Same prompt → same category every time
95+
- ✅ Confidence is **relative** to other categories, not absolute certainty
Lines changed: 242 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
# Demo Scripts for Semantic Router
2+
3+
This directory contains demo scripts to showcase the semantic router capabilities.
4+
5+
## Quick Demo Guide
6+
7+
### 1. Live Log Viewer (Run in Terminal 1)
8+
9+
Shows real-time classification, routing, and security decisions:
10+
11+
```bash
12+
./deploy/openshift/demo/live-semantic-router-logs.sh
13+
```
14+
15+
**What it shows:**
16+
17+
- 📨 **Incoming requests** with user prompts
18+
- 🛡️ **Security checks** (jailbreak detection)
19+
- 🔍 **Classification** (category detection with confidence)
20+
- 🎯 **Routing decisions** (which model was selected)
21+
- 💾 **Cache hits** (semantic similarity matching)
22+
- 🧠 **Reasoning mode** activation
23+
24+
**Tip:** Run this in a split terminal or separate window during your demo!
25+
26+
---
27+
28+
### 2. Interactive Demo (Run in Terminal 2)
29+
30+
Interactive menu-driven semantic router demo:
31+
32+
```bash
33+
python3 deploy/openshift/demo/demo-semantic-router.py
34+
```
35+
36+
**Features:**
37+
38+
1. **Single Classification** - Tests random prompt from golden set
39+
2. **All Classifications** - Tests all 10 golden prompts
40+
3. **PII Detection Test** - Tests personal information filtering
41+
4. **Jailbreak Detection Test** - Tests security filtering
42+
5. **Run All Tests** - Executes all tests sequentially
43+
44+
**Requirements:**
45+
46+
- ✅ Must be logged into OpenShift (`oc login`)
47+
- URLs are discovered automatically from routes
48+
49+
**What it does:**
50+
51+
- Goes through Envoy (same path as OpenWebUI)
52+
- Shows routing decisions and response previews
53+
- **Appears in Grafana dashboard!**
54+
- Interactive - choose what to test
55+
56+
---
57+
58+
## Demo Flow Suggestion
59+
60+
### Setup (Before Demo)
61+
62+
```bash
63+
# Terminal 1: Start log viewer
64+
./deploy/openshift/demo/live-semantic-router-logs.sh
65+
66+
# Terminal 2: Ready to run classification test
67+
# (don't run yet)
68+
69+
# Browser Tab 1: Open Grafana
70+
# http://grafana-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
71+
72+
# Browser Tab 2: Open OpenWebUI
73+
# http://openwebui-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com
74+
```
75+
76+
### During Demo
77+
78+
1. **Show the system overview**
79+
- Explain semantic routing concept
80+
- Show the architecture diagram
81+
82+
2. **Run interactive demo** (Terminal 2)
83+
84+
```bash
85+
python3 deploy/openshift/demo/demo-semantic-router.py
86+
```
87+
88+
Choose option 2 (All Classifications)
89+
90+
3. **Point to live logs** (Terminal 1)
91+
- Show real-time classification
92+
- Highlight security checks (jailbreak: BENIGN)
93+
- Show routing decisions (Model-A vs Model-B)
94+
- Point out cache hits
95+
96+
4. **Switch to Grafana** (Browser Tab 1)
97+
- Show request metrics appearing
98+
- Show classification category distribution
99+
- Show model usage breakdown
100+
101+
5. **Show OpenWebUI integration** (Browser Tab 2)
102+
- Type one of the golden prompts
103+
- Watch it appear in logs (Terminal 1)
104+
- Show the same routing happening
105+
106+
---
107+
108+
## Key Talking Points
109+
110+
### Classification Accuracy
111+
112+
- **10 golden prompts** with 100% accuracy
113+
- Categories: Chemistry, History, Psychology, Health, Math
114+
- Shows consistent classification behavior
115+
116+
### Security Features
117+
118+
- **Jailbreak detection** on every request
119+
- Shows "BENIGN" for safe requests
120+
- Confidence scores displayed
121+
122+
### Smart Routing
123+
124+
- Automatic model selection based on content
125+
- Load balancing across Model-A and Model-B
126+
- Routing decisions visible in logs
127+
128+
### Performance
129+
130+
- **Semantic caching** reduces latency
131+
- Cache hits shown in logs with similarity scores
132+
- Sub-second response times
133+
134+
### Observability
135+
136+
- Real-time logs with structured JSON
137+
- Grafana metrics and dashboards
138+
- Request tracing and debugging
139+
140+
---
141+
142+
## Troubleshooting
143+
144+
### Log viewer shows no output
145+
146+
```bash
147+
# Check if semantic-router pod is running
148+
oc get pods -n vllm-semantic-router-system | grep semantic-router
149+
150+
# Check logs manually
151+
oc logs -n vllm-semantic-router-system deployment/semantic-router --tail=20
152+
```
153+
154+
### Classification test fails
155+
156+
```bash
157+
# Verify Envoy route is accessible
158+
curl http://envoy-http-vllm-semantic-router-system.apps.cluster-pbd96.pbd96.sandbox5333.opentlc.com/v1/models
159+
160+
# Check if models are ready
161+
oc get pods -n vllm-semantic-router-system
162+
```
163+
164+
### Grafana doesn't show metrics
165+
166+
- Wait 15-30 seconds for metrics to appear
167+
- Refresh the dashboard
168+
- Check the time range (last 5 minutes)
169+
170+
---
171+
172+
## Cache Management
173+
174+
### Check Cache Status
175+
176+
```bash
177+
./deploy/openshift/demo/cache-management.sh status
178+
```
179+
180+
Shows recent cache activity and cached queries.
181+
182+
### Clear Cache (for demo)
183+
184+
```bash
185+
./deploy/openshift/demo/cache-management.sh clear
186+
```
187+
188+
Restarts semantic-router deployment to clear in-memory cache (~30 seconds).
189+
190+
### Demo Cache Feature
191+
192+
**Workflow to show caching in action:**
193+
194+
1. Clear the cache:
195+
196+
```bash
197+
./deploy/openshift/demo/cache-management.sh clear
198+
```
199+
200+
2. Run classification test (first time - no cache):
201+
202+
```bash
203+
python3 deploy/openshift/demo/demo-semantic-router.py
204+
```
205+
206+
Choose option 2 (All Classifications)
207+
- Processing time: ~3-4 seconds per query
208+
- Logs show queries going to model
209+
210+
3. Run classification test again (second time - with cache):
211+
212+
```bash
213+
python3 deploy/openshift/demo/demo-semantic-router.py
214+
```
215+
216+
Choose option 2 (All Classifications) again
217+
- Processing time: ~400ms per query (10x faster!)
218+
- Logs show "💾 CACHE HIT" for all queries
219+
- Similarity scores ~0.99999
220+
221+
**Key talking point:** Cache uses **semantic similarity**, not exact string matching!
222+
223+
---
224+
225+
## Files
226+
227+
- `live-semantic-router-logs.sh` - Envoy traffic log viewer (security, cache, routing)
228+
- `live-classifier-logs.sh` - Classification API log viewer
229+
- `demo-semantic-router.py` - Interactive demo with multiple test options
230+
- `curl-examples.sh` - Quick classification examples (direct API)
231+
- `cache-management.sh` - Cache management helper
232+
- `CATEGORY-MODEL-MAPPING.md` - Category to model routing reference
233+
- `demo-classification-results.json` - Test results (auto-generated)
234+
235+
---
236+
237+
## Notes
238+
239+
- The log viewer uses `oc logs --follow`, so it will run indefinitely until you press Ctrl+C
240+
- Classification test takes ~60 seconds (10 prompts with 0.5s delay between each)
241+
- All requests go through Envoy, triggering the full routing pipeline
242+
- Grafana metrics update in real-time (with slight delay)

0 commit comments

Comments
 (0)