Skip to content

Commit f203719

Browse files
szedan-rhclaude
andauthored
feat(openshift): Split vllm-katan-a and vllm-katan-b to run on separate pods rather than the same semantic router pod. (#593)
* feat(openshift): consolidate deployment with dynamic IP configuration This commit consolidates the OpenShift deployment into a single unified script with automatic ClusterIP discovery for cross-cluster portability. Changes: - Enhanced deploy-to-openshift.sh with dynamic IP discovery - Auto-discovers vLLM service ClusterIPs at deployment time - Generates configuration with actual IPs (portable across clusters) - Fallback sed replacement for robustness - Updated deployment.yaml for split architecture - Separate pods for vllm-model-a, vllm-model-b, and semantic-router - Each vLLM model with dedicated cache PVC - semantic-router + envoy-proxy in single pod (2 containers) - Updated config-openshift.yaml with placeholder IPs - Comments indicate dynamic replacement by deploy script - Template IPs: 172.30.64.134 (model-a), 172.30.116.177 (model-b) - Added comprehensive documentation - README-DYNAMIC-IPS.md: Technical details on dynamic IP feature - Updated README.md: Reflects consolidated script usage - Removed single-namespace/ directory (consolidation complete) Architecture: - 3 Deployments: vllm-model-a, vllm-model-b, semantic-router - Dynamic service discovery using oc get svc -o jsonpath - llm-katan image built from Dockerfile via OpenShift BuildConfig - gp3-csi storage class for all PVCs Tested on OpenShift cluster with successful deployment verification: - Model-A ClusterIP: 172.30.89.145:8000 ✓ - Model-B ClusterIP: 172.30.255.34:8001 ✓ - Both models responding to health checks 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: szedan <szedan@redhat.com> * chore: fix markdown linting issues in OpenShift docs Applied markdownlint auto-fixes to ensure documentation follows project style guidelines: - Added blank lines around lists (MD032) - Added blank lines around fenced code blocks (MD031) Files fixed: - deploy/openshift/README.md - deploy/openshift/README-DYNAMIC-IPS.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: szedan <szedan@redhat.com> * fix(openshift): add missing vLLM model PVCs to deployment script Add PersistentVolumeClaim creation for vllm-model-a-cache and vllm-model-b-cache to ensure all required storage is provisioned during automated deployment. This fixes pods being stuck in Pending state when running the deploy-to-openshift.sh script. Changes: - Add vllm-model-a-cache PVC (10Gi) - Add vllm-model-b-cache PVC (10Gi) - Ensures full automation without manual PVC creation Signed-off-by: szedan <szedan@redhat.com> * fix: remove useless cat in deploy script to fix shellcheck SC2002 Signed-off-by: szedan <szedan@redhat.com> --------- Signed-off-by: szedan <szedan@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent e59c52e commit f203719

File tree

5 files changed

+769
-1106
lines changed

5 files changed

+769
-1106
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
# Dynamic IP Configuration for Cross-Cluster Deployments
2+
3+
## Overview
4+
5+
This deployment uses **dynamic IP configuration** to ensure portability across different OpenShift/Kubernetes clusters. Instead of hardcoding ClusterIPs, the deployment script automatically discovers service IPs at deployment time.
6+
7+
## Architecture
8+
9+
### Pod Structure
10+
11+
1. **semantic-router Pod**:
12+
- Container 1: `semantic-router` (ExtProc service)
13+
- Container 2: `envoy-proxy` (Proxy)
14+
15+
2. **vllm-model-a Pod**:
16+
- Container: `model-a` (llm-katan serving Qwen3-0.6B)
17+
18+
3. **vllm-model-b Pod**:
19+
- Container: `model-b` (llm-katan serving Qwen3-0.6B)
20+
21+
All pods run in the same namespace: `vllm-semantic-router-system`
22+
23+
### Dynamic IP Discovery Process
24+
25+
The `deploy-split.sh` script implements dynamic IP configuration:
26+
27+
```bash
28+
# 1. Deploy vLLM model services first
29+
oc apply -f deployment-split.yaml
30+
31+
# 2. Wait for services to get ClusterIPs
32+
MODEL_A_IP=$(oc get svc vllm-model-a -o jsonpath='{.spec.clusterIP}')
33+
MODEL_B_IP=$(oc get svc vllm-model-b -o jsonpath='{.spec.clusterIP}')
34+
35+
# 3. Generate config with actual IPs
36+
sed "s/172.30.64.134/$MODEL_A_IP/g" config-split.yaml > dynamic-config.yaml
37+
38+
# 4. Create ConfigMap with dynamic config
39+
oc create configmap semantic-router-config --from-file=dynamic-config.yaml
40+
```
41+
42+
## Benefits
43+
44+
### ✅ Cross-Cluster Portability
45+
46+
- Works on any OpenShift/Kubernetes cluster
47+
- No manual IP configuration needed
48+
- IPs are discovered automatically
49+
50+
### ✅ Service-Based Routing
51+
52+
- Uses Kubernetes ClusterIP services
53+
- Automatic service discovery
54+
- Load balancing handled by Kubernetes
55+
56+
### ✅ Separation of Concerns
57+
58+
- vLLM models in separate pods
59+
- Independent scaling
60+
- Better resource isolation
61+
62+
## Deployment
63+
64+
### Quick Deploy
65+
66+
```bash
67+
cd deploy/openshift/single-namespace
68+
./deploy-split.sh
69+
```
70+
71+
### What Happens
72+
73+
1. **Namespace Creation**: `vllm-semantic-router-system`
74+
2. **Image Build**: `llm-katan` image (if not exists)
75+
3. **PVC Creation**: Persistent volumes for models and cache
76+
4. **Service Deployment**: vLLM model services created first
77+
5. **IP Discovery**: Script queries ClusterIPs dynamically
78+
6. **Config Generation**: Creates config with actual IPs
79+
7. **Router Deployment**: semantic-router deployed with dynamic config
80+
8. **Route Creation**: OpenShift routes for external access
81+
82+
### Verification
83+
84+
```bash
85+
# Check all pods are running
86+
oc get pods -n vllm-semantic-router-system
87+
88+
# Verify services have ClusterIPs
89+
oc get svc -n vllm-semantic-router-system
90+
91+
# Test Model-A endpoint
92+
oc exec deployment/semantic-router -c semantic-router -- \
93+
curl -s http://$(oc get svc vllm-model-a -o jsonpath='{.spec.clusterIP}'):8000/v1/models
94+
95+
# Test Model-B endpoint
96+
oc exec deployment/semantic-router -c semantic-router -- \
97+
curl -s http://$(oc get svc vllm-model-b -o jsonpath='{.spec.clusterIP}'):8001/v1/models
98+
```
99+
100+
## Configuration Files
101+
102+
### Template: config-split.yaml
103+
Contains **placeholder IPs** that get replaced:
104+
105+
```yaml
106+
vllm_endpoints:
107+
- name: "model-a-endpoint"
108+
address: "172.30.64.134" # PLACEHOLDER - replaced at deploy time
109+
port: 8000
110+
- name: "model-b-endpoint"
111+
address: "172.30.116.177" # PLACEHOLDER - replaced at deploy time
112+
port: 8001
113+
```
114+
115+
### Generated: ConfigMap
116+
Contains **actual ClusterIPs** discovered during deployment:
117+
118+
```yaml
119+
vllm_endpoints:
120+
- name: "model-a-endpoint"
121+
address: "172.30.64.134" # Actual ClusterIP from cluster
122+
port: 8000
123+
- name: "model-b-endpoint"
124+
address: "172.30.116.177" # Actual ClusterIP from cluster
125+
port: 8001
126+
```
127+
128+
## Testing on Different Clusters
129+
130+
### Scenario: Deploy to New Cluster
131+
132+
```bash
133+
# 1. Login to new cluster
134+
oc login https://new-cluster-api.example.com:6443
135+
136+
# 2. Run deploy script (IPs auto-discovered)
137+
cd deploy/openshift/single-namespace
138+
./deploy-split.sh
139+
140+
# 3. Verify new ClusterIPs
141+
oc get svc -n vllm-semantic-router-system
142+
# vllm-model-a ClusterIP 10.96.10.50 <none> 8000/TCP
143+
# vllm-model-b ClusterIP 10.96.20.80 <none> 8001/TCP
144+
145+
# 4. Check config has new IPs
146+
oc get configmap semantic-router-config -o yaml | grep address:
147+
# address: "10.96.10.50" # New cluster IP for Model-A
148+
# address: "10.96.20.80" # New cluster IP for Model-B
149+
```
150+
151+
## Troubleshooting
152+
153+
### Issue: Classification Errors
154+
155+
If you see classification errors, verify model connectivity:
156+
157+
```bash
158+
# From semantic-router pod, test Model-A
159+
oc exec deployment/semantic-router -c semantic-router -- \
160+
curl http://$(oc get svc vllm-model-a -o jsonpath='{.spec.clusterIP}'):8000/health
161+
162+
# Test Model-B
163+
oc exec deployment/semantic-router -c semantic-router -- \
164+
curl http://$(oc get svc vllm-model-b -o jsonpath='{.spec.clusterIP}'):8001/health
165+
```
166+
167+
### Issue: IP Discovery Fails
168+
169+
If the script fails to get ClusterIPs:
170+
171+
```bash
172+
# Check services exist
173+
oc get svc -n vllm-semantic-router-system
174+
175+
# Manually verify ClusterIPs
176+
oc get svc vllm-model-a -o jsonpath='{.spec.clusterIP}'
177+
oc get svc vllm-model-b -o jsonpath='{.spec.clusterIP}'
178+
```
179+
180+
### Issue: ConfigMap Not Updated
181+
182+
Restart semantic-router to pick up new config:
183+
184+
```bash
185+
oc rollout restart deployment/semantic-router -n vllm-semantic-router-system
186+
oc rollout status deployment/semantic-router -n vllm-semantic-router-system
187+
```
188+
189+
## Comparison: Alternative Approaches
190+
191+
### ❌ Hardcoded IPs (Original)
192+
193+
```yaml
194+
address: "172.30.64.134" # Works only on specific cluster
195+
```
196+
197+
### ❌ Localhost (Sidecar Pattern)
198+
199+
```yaml
200+
address: "127.0.0.1" # Requires all containers in same pod
201+
```
202+
203+
### ✅ Dynamic IPs (Current Solution)
204+
205+
```yaml
206+
address: "$DISCOVERED_IP" # Works on any cluster
207+
```
208+
209+
### 🚀 DNS Names (Future Enhancement)
210+
211+
```yaml
212+
address: "vllm-model-a.vllm-semantic-router-system.svc.cluster.local"
213+
```
214+
215+
**Note**: Requires Go code changes to accept DNS names (see `src/semantic-router/pkg/config/validator.go`)
216+
217+
## Future Improvements
218+
219+
1. **DNS-Based Routing**: Modify validator to accept Kubernetes service DNS names
220+
2. **Multi-Cluster Support**: Deploy across multiple clusters with federation
221+
3. **Auto-Scaling**: Horizontal pod autoscaling based on traffic
222+
4. **Health Checks**: Enhanced health probes for better reliability
223+
224+
## Related Files
225+
226+
- `deploy-split.sh`: Main deployment script with dynamic IP logic (deploy/openshift/single-namespace/deploy-split.sh:109-164)
227+
- `config-split.yaml`: Configuration template with placeholder IPs (deploy/openshift/single-namespace/config-split.yaml:30-41)
228+
- `deployment-split.yaml`: Kubernetes manifests for split architecture (deploy/openshift/single-namespace/deployment-split.yaml)
229+
- `validator.go`: IP validation code (requires modification for DNS support) (src/semantic-router/pkg/config/validator.go:20-51)

deploy/openshift/README.md

Lines changed: 26 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# OpenShift Deployment for Semantic Router
22

3-
This directory contains OpenShift-specific deployment manifests for the vLLM Semantic Router.
3+
This directory contains OpenShift-specific deployment manifests for the vLLM Semantic Router with **dynamic IP configuration** for cross-cluster portability.
44

55
## Quick Deployment
66

@@ -10,34 +10,49 @@ This directory contains OpenShift-specific deployment manifests for the vLLM Sem
1010
- `oc` CLI tool configured and logged in
1111
- Cluster admin privileges (or permissions to create namespaces and routes)
1212

13-
### One-Command Deployment
13+
### Automated Deployment (Recommended)
14+
15+
The deployment script automatically handles everything including dynamic IP configuration:
1416

1517
```bash
16-
oc apply -k deploy/openshift/
18+
cd deploy/openshift
19+
./deploy-to-openshift.sh
1720
```
1821

19-
### Step-by-Step Deployment
22+
This script will:
23+
24+
- ✅ Build the llm-katan image from Dockerfile
25+
- ✅ Create namespace and PVCs
26+
- ✅ Deploy vLLM model services (model-a and model-b)
27+
- ✅ Auto-discover Kubernetes service ClusterIPs
28+
- ✅ Generate configuration with actual IPs (portable across clusters)
29+
- ✅ Deploy semantic-router with Envoy proxy
30+
- ✅ Create OpenShift routes for external access
31+
32+
### Manual Deployment (Advanced)
33+
34+
If you prefer manual deployment or need to customize:
2035

2136
1. **Create namespace:**
2237

2338
```bash
24-
oc apply -f deploy/openshift/namespace.yaml
39+
oc create namespace vllm-semantic-router-system
2540
```
2641

27-
2. **Deploy core resources:**
42+
2. **Build llm-katan image:**
2843

2944
```bash
30-
oc apply -f deploy/openshift/pvc.yaml
31-
oc apply -f deploy/openshift/deployment.yaml
32-
oc apply -f deploy/openshift/service.yaml
45+
oc new-build --dockerfile - --name llm-katan -n vllm-semantic-router-system < Dockerfile.llm-katan
3346
```
3447

35-
3. **Create external routes:**
48+
3. **Deploy resources:**
3649

3750
```bash
38-
oc apply -f deploy/openshift/routes.yaml
51+
oc apply -f deployment.yaml -n vllm-semantic-router-system
3952
```
4053

54+
4. **Note:** You'll need to manually configure ClusterIPs in `config-openshift.yaml`
55+
4156
## Accessing Services
4257

4358
After deployment, the services will be accessible via OpenShift Routes:

0 commit comments

Comments
 (0)