Skip to content

Commit e9d38c0

Browse files
committed
expand pvc size & fix inference-pool selector error
Signed-off-by: JaredforReal <w13431838023@gmail.com>
1 parent 97840a9 commit e9d38c0

File tree

4 files changed

+52
-33
lines changed

4 files changed

+52
-33
lines changed

deploy/kubernetes/README.md

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ This directory contains Kubernetes manifests for deploying the Semantic Router u
77
The deployment consists of:
88

99
- **ConfigMap**: Contains `config.yaml` and `tools_db.json` configuration files
10-
- **PersistentVolumeClaim**: 10Gi storage for model files
10+
- **PersistentVolumeClaim**: 30Gi storage for model files (adjust based on models you enable)
1111
- **Deployment**:
1212
- **Init Container**: Downloads/copies model files to persistent volume
1313
- **Main Container**: Runs the semantic router service
@@ -29,11 +29,11 @@ The deployment consists of:
2929
kubectl apply -k deploy/kubernetes/
3030

3131
# Check deployment status
32-
kubectl get pods -l app=semantic-router -n semantic-router
33-
kubectl get services -l app=semantic-router -n semantic-router
32+
kubectl get pods -l app=semantic-router -n vllm-semantic-router-system
33+
kubectl get services -l app=semantic-router -n vllm-semantic-router-system
3434

3535
# View logs
36-
kubectl logs -l app=semantic-router -n semantic-router -f
36+
kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
3737
```
3838

3939
### Kind (Kubernetes in Docker) Deployment
@@ -86,20 +86,20 @@ kubectl wait --for=condition=Ready nodes --all --timeout=300s
8686
kubectl apply -k deploy/kubernetes/
8787

8888
# Wait for deployment to be ready
89-
kubectl wait --for=condition=Available deployment/semantic-router -n semantic-router --timeout=600s
89+
kubectl wait --for=condition=Available deployment/semantic-router -n vllm-semantic-router-system --timeout=600s
9090
```
9191

9292
**Step 3: Check deployment status**
9393

9494
```bash
9595
# Check pods
96-
kubectl get pods -n semantic-router -o wide
96+
kubectl get pods -n vllm-semantic-router-system -o wide
9797

9898
# Check services
99-
kubectl get services -n semantic-router
99+
kubectl get services -n vllm-semantic-router-system
100100

101101
# View logs
102-
kubectl logs -l app=semantic-router -n semantic-router -f
102+
kubectl logs -l app=semantic-router -n vllm-semantic-router-system -f
103103
```
104104

105105
#### Resource Requirements for Kind
@@ -137,13 +137,13 @@ Or using kubectl directly:
137137

138138
```bash
139139
# Access Classification API (HTTP REST)
140-
kubectl port-forward -n semantic-router svc/semantic-router 8080:8080
140+
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 8080:8080
141141

142142
# Access gRPC API
143-
kubectl port-forward -n semantic-router svc/semantic-router 50051:50051
143+
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router 50051:50051
144144

145145
# Access metrics
146-
kubectl port-forward -n semantic-router svc/semantic-router-metrics 9190:9190
146+
kubectl port-forward -n vllm-semantic-router-system svc/semantic-router-metrics 9190:9190
147147
```
148148

149149
#### Testing the Deployment
@@ -195,6 +195,11 @@ kubectl delete -k deploy/kubernetes/
195195
kind delete cluster --name semantic-router-cluster
196196
```
197197

198+
## Notes on dependencies
199+
200+
- Gateway API Inference Extension CRDs are required only when using the Envoy AI Gateway integration in `deploy/kubernetes/ai-gateway/`. Follow the installation steps in `website/docs/installation/kubernetes.md` if you plan to use the gateway path.
201+
- The core kustomize deployment in this folder does not install Envoy Gateway or AI Gateway; those are optional components documented separately.
202+
198203
## Make Commands Reference
199204

200205
The project provides comprehensive make targets for managing kind clusters and deployments:
@@ -293,6 +298,11 @@ kubectl top pods -n semantic-router
293298
# Adjust resource limits in deployment.yaml if needed
294299
```
295300

301+
### Storage sizing
302+
303+
- The default PVC is 30Gi. If the enabled models are small, you can reduce it; otherwise reserve at least 2–3x the total model size.
304+
- If your cluster's default StorageClass isn't named `standard`, change `storageClassName` in `pvc.yaml` accordingly or remove the field to use the default class.
305+
296306
### Resource Optimization
297307

298308
For different environments, you can adjust resource requirements:

deploy/kubernetes/ai-gateway/inference-pool/inference-pool.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ spec:
1111
- number: 50051
1212
selector:
1313
matchLabels:
14-
app: vllm-semantic-router
14+
app: semantic-router
1515
endpointPickerRef:
1616
name: semantic-router
1717
port:

deploy/kubernetes/pvc.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,5 @@ spec:
99
- ReadWriteOnce
1010
resources:
1111
requests:
12-
storage: 10Gi
12+
storage: 30Gi
1313
storageClassName: standard

website/docs/installation/kubernetes.md

Lines changed: 29 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,13 @@ kubectl wait --for=condition=Ready nodes --all --timeout=300s
3737

3838
Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies.
3939

40+
Important notes before you apply manifests:
41+
42+
- `vllm_endpoints.address` must be an IP address (not hostname) reachable from inside the cluster. If your LLM backends run as K8s Services, use the ClusterIP (for example `10.96.0.10`) and set `port` accordingly. Do not include protocol or path.
43+
- The PVC in `deploy/kubernetes/pvc.yaml` uses `storageClassName: standard`. On some clouds or local clusters, the default StorageClass name may differ (e.g., `standard-rwo`, `gp2`, or a provisioner like local-path). Adjust as needed.
44+
- Default PVC size is 30Gi. Size it to at least 2–3x of your total model footprint to leave room for indexes and updates.
45+
- The initContainer downloads several models from Hugging Face on first run and writes them into the PVC. Ensure outbound egress to Hugging Face is allowed and there is at least ~6–8 GiB free space for the models specified.
46+
4047
Deploy the semantic router service with all required components:
4148

4249
```bash
@@ -135,26 +142,28 @@ Expected output should show the inference pool in `Accepted` state:
135142
```yaml
136143
status:
137144
parent:
138-
- conditions:
139-
- lastTransitionTime: "2025-09-27T09:27:32Z"
140-
message: 'InferencePool has been Accepted by controller ai-gateway-controller:
141-
InferencePool reconciled successfully'
142-
observedGeneration: 1
143-
reason: Accepted
144-
status: "True"
145-
type: Accepted
146-
- lastTransitionTime: "2025-09-27T09:27:32Z"
147-
message: 'Reference resolution by controller ai-gateway-controller: All references
148-
resolved successfully'
149-
observedGeneration: 1
150-
reason: ResolvedRefs
151-
status: "True"
152-
type: ResolvedRefs
153-
parentRef:
154-
group: gateway.networking.k8s.io
155-
kind: Gateway
156-
name: vllm-semantic-router
157-
namespace: vllm-semantic-router-system
145+
- conditions:
146+
- lastTransitionTime: "2025-09-27T09:27:32Z"
147+
message:
148+
"InferencePool has been Accepted by controller ai-gateway-controller:
149+
InferencePool reconciled successfully"
150+
observedGeneration: 1
151+
reason: Accepted
152+
status: "True"
153+
type: Accepted
154+
- lastTransitionTime: "2025-09-27T09:27:32Z"
155+
message:
156+
"Reference resolution by controller ai-gateway-controller: All references
157+
resolved successfully"
158+
observedGeneration: 1
159+
reason: ResolvedRefs
160+
status: "True"
161+
type: ResolvedRefs
162+
parentRef:
163+
group: gateway.networking.k8s.io
164+
kind: Gateway
165+
name: vllm-semantic-router
166+
namespace: vllm-semantic-router-system
158167
```
159168
160169
## Testing the Deployment

0 commit comments

Comments
 (0)