You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Gateway API Inference Extension CRDs are required only when using the Envoy AI Gateway integration in `deploy/kubernetes/ai-gateway/`. Follow the installation steps in `website/docs/installation/kubernetes.md` if you plan to use the gateway path.
201
+
- The core kustomize deployment in this folder does not install Envoy Gateway or AI Gateway; those are optional components documented separately.
202
+
198
203
## Make Commands Reference
199
204
200
205
The project provides comprehensive make targets for managing kind clusters and deployments:
@@ -293,6 +298,11 @@ kubectl top pods -n semantic-router
293
298
# Adjust resource limits in deployment.yaml if needed
294
299
```
295
300
301
+
### Storage sizing
302
+
303
+
- The default PVC is 30Gi. If the enabled models are small, you can reduce it; otherwise reserve at least 2–3x the total model size.
304
+
- If your cluster's default StorageClass isn't named `standard`, change `storageClassName` in `pvc.yaml` accordingly or remove the field to use the default class.
305
+
296
306
### Resource Optimization
297
307
298
308
For different environments, you can adjust resource requirements:
Configure the semantic router by editing `deploy/kubernetes/config.yaml`. This file contains the vLLM configuration, including model config, endpoints, and policies.
39
39
40
+
Important notes before you apply manifests:
41
+
42
+
-`vllm_endpoints.address` must be an IP address (not hostname) reachable from inside the cluster. If your LLM backends run as K8s Services, use the ClusterIP (for example `10.96.0.10`) and set `port` accordingly. Do not include protocol or path.
43
+
- The PVC in `deploy/kubernetes/pvc.yaml` uses `storageClassName: standard`. On some clouds or local clusters, the default StorageClass name may differ (e.g., `standard-rwo`, `gp2`, or a provisioner like local-path). Adjust as needed.
44
+
- Default PVC size is 30Gi. Size it to at least 2–3x of your total model footprint to leave room for indexes and updates.
45
+
- The initContainer downloads several models from Hugging Face on first run and writes them into the PVC. Ensure outbound egress to Hugging Face is allowed and there is at least ~6–8 GiB free space for the models specified.
46
+
40
47
Deploy the semantic router service with all required components:
41
48
42
49
```bash
@@ -135,26 +142,28 @@ Expected output should show the inference pool in `Accepted` state:
135
142
```yaml
136
143
status:
137
144
parent:
138
-
- conditions:
139
-
- lastTransitionTime: "2025-09-27T09:27:32Z"
140
-
message: 'InferencePool has been Accepted by controller ai-gateway-controller:
141
-
InferencePool reconciled successfully'
142
-
observedGeneration: 1
143
-
reason: Accepted
144
-
status: "True"
145
-
type: Accepted
146
-
- lastTransitionTime: "2025-09-27T09:27:32Z"
147
-
message: 'Reference resolution by controller ai-gateway-controller: All references
148
-
resolved successfully'
149
-
observedGeneration: 1
150
-
reason: ResolvedRefs
151
-
status: "True"
152
-
type: ResolvedRefs
153
-
parentRef:
154
-
group: gateway.networking.k8s.io
155
-
kind: Gateway
156
-
name: vllm-semantic-router
157
-
namespace: vllm-semantic-router-system
145
+
- conditions:
146
+
- lastTransitionTime: "2025-09-27T09:27:32Z"
147
+
message:
148
+
"InferencePool has been Accepted by controller ai-gateway-controller:
149
+
InferencePool reconciled successfully"
150
+
observedGeneration: 1
151
+
reason: Accepted
152
+
status: "True"
153
+
type: Accepted
154
+
- lastTransitionTime: "2025-09-27T09:27:32Z"
155
+
message:
156
+
"Reference resolution by controller ai-gateway-controller: All references
0 commit comments