Skip to content

Commit fcdddaa

Browse files
[SLO Routing] Add Latency Predictor sidecars and EPP tools (#1791)
* Add latency predictor sidecars and epp tools * Fix nits and whitespace * Break out latencypredictor_async into multiple files, clean up lints * Replace fmt.Error with errors.New for lint * Fix boilerplate, move WaitGroup add back out of backgroundLoop as it causes a race condition error during unit tests * Fix boilerplate * Move latencypredictor_async into sidecards in root, remove dev project dependencies, and unnecessary read lock, fix gitignore to ignore pycache again
1 parent cbb8928 commit fcdddaa

19 files changed

+9841
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
bin/*
88
Dockerfile.cross
99
artifacts
10+
latencypredictor/__pycache__
1011

1112
# Test binary, built with `go test -c`
1213
*.test
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Use an official Python runtime as a parent image
2+
FROM python:3.11-slim
3+
4+
# Set the working directory in the container
5+
WORKDIR /app
6+
7+
# Copy the requirements file and install dependencies
8+
# (It's good practice to manage dependencies in a requirements.txt file)
9+
10+
11+
RUN apt-get update && apt-get install -y \
12+
libgomp1 \
13+
&& rm -rf /var/lib/apt/lists/*
14+
15+
COPY requirements.txt .
16+
RUN pip install --no-cache-dir -r requirements.txt
17+
18+
# Copy the rest of the application code
19+
COPY . .
20+
21+
# Expose the port the app runs on
22+
EXPOSE 8001
23+
24+
# Command to run the application using uvicorn
25+
# We use 0.0.0.0 to bind to all network interfaces inside the container
26+
CMD ["uvicorn", "prediction_server:app", "--host", "0.0.0.0", "--port", "8001"]

latencypredictor/Dockerfile-test

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Dockerfile-test
2+
FROM python:3.9-slim
3+
4+
# Install system dependencies
5+
RUN apt-get update && apt-get install -y \
6+
curl \
7+
wget \
8+
jq \
9+
&& rm -rf /var/lib/apt/lists/*
10+
11+
# Set working directory
12+
WORKDIR /app
13+
14+
# Copy requirements and install Python dependencies
15+
COPY requirements.txt .
16+
RUN pip install --no-cache-dir -r requirements.txt
17+
18+
# Install additional testing dependencies
19+
RUN pip install --no-cache-dir \
20+
pytest \
21+
pytest-asyncio \
22+
requests \
23+
httpx \
24+
aiohttp
25+
26+
# Copy test files
27+
COPY test_dual_server_client.py .
28+
29+
30+
# Create test results directory
31+
RUN mkdir -p /test-results
32+
33+
# Set environment variables
34+
ENV PYTHONPATH=/app
35+
ENV PYTHONUNBUFFERED=1
36+
37+
# Default command runs the specific test
38+
CMD ["pytest", "-v", "-s", "test_dual_server_client.py"]
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use an official Python runtime as a parent image
2+
FROM python:3.11-slim
3+
4+
# Set the working directory in the container
5+
WORKDIR /app
6+
7+
# Copy the requirements file and install dependencies
8+
# (It's good practice to manage dependencies in a requirements.txt file)
9+
10+
11+
RUN apt-get update && apt-get install -y \
12+
libgomp1 \
13+
&& rm -rf /var/lib/apt/lists/*
14+
15+
16+
COPY requirements.txt .
17+
RUN pip install --no-cache-dir -r requirements.txt
18+
19+
# Copy the rest of the application code
20+
COPY . .
21+
22+
# Expose the port the app runs on
23+
EXPOSE 8000
24+
25+
# Command to run the application using uvicorn
26+
# We use 0.0.0.0 to bind to all network interfaces inside the container
27+
CMD ["uvicorn", "training_server:app", "--host", "0.0.0.0", "--port", "8000"]

latencypredictor/README.md

Lines changed: 220 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,220 @@
1+
# Latency Predictor - Build Guide
2+
3+
This directory contains the Latency Predictor component with dual server architecture (training and prediction servers). Use the provided `build-deploy.sh` script to build and deploy container images to Google Cloud Platform.
4+
5+
## Prerequisites
6+
7+
- Docker (latest version)
8+
- Google Cloud SDK (`gcloud`) configured and authenticated
9+
- Required files in directory:
10+
- `training_server.py`
11+
- `prediction_server.py`
12+
- `requirements.txt`
13+
- `Dockerfile-training`
14+
- `Dockerfile-prediction`
15+
- `dual-server-deployment.yaml`
16+
17+
**Optional (for deployment and testing):**
18+
- kubectl configured for GKE cluster access
19+
20+
## Configuration
21+
22+
Before running the script, update the configuration variables in `build-deploy.sh`:
23+
24+
```bash
25+
# Edit these values in the script
26+
PROJECT_ID="your-gcp-project-id"
27+
REGION="your-gcp-region"
28+
REPOSITORY="your-artifact-registry-repo"
29+
TRAINING_IMAGE="latencypredictor-training-server"
30+
PREDICTION_IMAGE="latencypredictor-prediction-server"
31+
TAG="latest"
32+
```
33+
34+
## Usage
35+
36+
### Build Images Only
37+
38+
```bash
39+
# Make script executable
40+
chmod +x build-deploy.sh
41+
42+
# Build and push images to registry
43+
./build-deploy.sh build
44+
./build-deploy.sh push
45+
```
46+
47+
### Complete Build and Deploy (Optional)
48+
49+
```bash
50+
# Run complete process (build, push, deploy, test)
51+
# Note: This requires GKE cluster access
52+
./build-deploy.sh all
53+
```
54+
55+
### Individual Commands
56+
57+
```bash
58+
# Check if all required files exist
59+
./build-deploy.sh check
60+
61+
# Build Docker images only
62+
./build-deploy.sh build
63+
64+
# Push images to Google Artifact Registry
65+
./build-deploy.sh push
66+
67+
# Optional: Deploy to GKE cluster (requires cluster access)
68+
./build-deploy.sh deploy
69+
70+
# Optional: Get service information and IPs
71+
./build-deploy.sh info
72+
73+
# Optional: Test the deployed services
74+
./build-deploy.sh test
75+
```
76+
77+
## What the Script Does
78+
79+
### Build Phase (`./build-deploy.sh build`)
80+
- Builds training server image from `Dockerfile-training`
81+
- Builds prediction server image from `Dockerfile-prediction`
82+
- Tags images for Google Artifact Registry
83+
- Images created:
84+
- `latencypredictor-training-server:latest`
85+
- `latencypredictor-prediction-server:latest`
86+
87+
### Push Phase (`./build-deploy.sh push`)
88+
- Configures Docker for Artifact Registry authentication
89+
- Pushes both images to:
90+
- `us-docker.pkg.dev/PROJECT_ID/REPOSITORY/latencypredictor-training-server:latest`
91+
- `us-docker.pkg.dev/PROJECT_ID/REPOSITORY/latencypredictor-prediction-server:latest`
92+
93+
### Deploy Phase (`./build-deploy.sh deploy`) - Optional
94+
- Applies Kubernetes manifests from `dual-server-deployment.yaml`
95+
- Waits for deployments to be ready (5-minute timeout)
96+
- Creates services:
97+
- `training-service-external` (LoadBalancer)
98+
- `prediction-service` (LoadBalancer)
99+
100+
### Test Phase (`./build-deploy.sh test`) - Optional
101+
- Tests health endpoint: `/healthz`
102+
- Tests prediction endpoint: `/predict` with sample data
103+
- Sample prediction request:
104+
```json
105+
{
106+
"kv_cache_percentage": 0.3,
107+
"input_token_length": 100,
108+
"num_request_waiting": 2,
109+
"num_request_running": 1,
110+
"num_tokens_generated": 50
111+
}
112+
```
113+
114+
## Setup Instructions
115+
116+
1. **Configure GCP Authentication**:
117+
```bash
118+
gcloud auth login
119+
gcloud config set project YOUR_PROJECT_ID
120+
```
121+
122+
2. **Configure kubectl for GKE (Optional - only needed for deployment)**:
123+
```bash
124+
gcloud container clusters get-credentials CLUSTER_NAME --zone ZONE
125+
```
126+
127+
3. **Update Script Configuration**:
128+
```bash
129+
# Edit build-deploy.sh with your project details
130+
nano build-deploy.sh
131+
```
132+
133+
4. **Build Images**:
134+
```bash
135+
./build-deploy.sh build
136+
./build-deploy.sh push
137+
```
138+
139+
5. **Optional: Deploy and Test**:
140+
```bash
141+
./build-deploy.sh deploy
142+
./build-deploy.sh test
143+
# Or run everything at once
144+
./build-deploy.sh all
145+
```
146+
147+
## Troubleshooting
148+
149+
### Permission Issues
150+
```bash
151+
chmod +x build-deploy.sh
152+
```
153+
154+
### GCP Authentication
155+
```bash
156+
gcloud auth configure-docker us-docker.pkg.dev
157+
```
158+
159+
### Check Cluster Access
160+
```bash
161+
kubectl cluster-info
162+
kubectl get nodes
163+
```
164+
165+
### View Service Status
166+
```bash
167+
./build-deploy.sh info
168+
kubectl get services
169+
kubectl get pods
170+
```
171+
172+
### Check Logs
173+
```bash
174+
# Training server logs
175+
kubectl logs -l app=training-server
176+
177+
# Prediction server logs
178+
kubectl logs -l app=prediction-server
179+
```
180+
181+
## Development Workflow
182+
183+
1. **Make code changes** to `training_server.py` or `prediction_server.py`
184+
2. **Test locally** (optional):
185+
```bash
186+
python training_server.py
187+
python prediction_server.py
188+
```
189+
3. **Build and push images**:
190+
```bash
191+
./build-deploy.sh build
192+
./build-deploy.sh push
193+
```
194+
195+
4. **Optional: Deploy and test**:
196+
```bash
197+
./build-deploy.sh deploy
198+
./build-deploy.sh test
199+
```
200+
201+
## Service Endpoints
202+
203+
After successful deployment:
204+
205+
- **Training Service**: External LoadBalancer IP (check with `./build-deploy.sh info`)
206+
- **Prediction Service**: External LoadBalancer IP (check with `./build-deploy.sh info`)
207+
- **Health Check**: `http://PREDICTION_IP/healthz`
208+
- **Prediction API**: `http://PREDICTION_IP/predict` (POST)
209+
210+
## Manual Build (Alternative)
211+
212+
If you need to build manually:
213+
214+
```bash
215+
# Build training server
216+
docker build -f Dockerfile-training -t training-server .
217+
218+
# Build prediction server
219+
docker build -f Dockerfile-prediction -t prediction-server .
220+
```

0 commit comments

Comments
 (0)