Skip to content

Commit 77585b4

Browse files
committed
Update README.md
1 parent c563c8c commit 77585b4

File tree

1 file changed

+35
-0
lines changed

1 file changed

+35
-0
lines changed

optillm/plugins/proxy/README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,15 @@ providers:
3333
base_url: https://api.openai.com/v1
3434
api_key: ${OPENAI_API_KEY}
3535
weight: 2
36+
max_concurrent: 5 # Optional: limit this provider to 5 concurrent requests
3637
model_map:
3738
gpt-4: gpt-4-turbo-preview # Optional: map model names
3839

3940
- name: backup
4041
base_url: https://api.openai.com/v1
4142
api_key: ${OPENAI_API_KEY_BACKUP}
4243
weight: 1
44+
max_concurrent: 2 # Optional: limit this provider to 2 concurrent requests
4345

4446
routing:
4547
strategy: weighted # Options: weighted, round_robin, failover
@@ -189,6 +191,39 @@ queue:
189191
- **Automatic Failover**: When a provider times out, it's marked unhealthy and the request automatically fails over to the next available provider.
190192
- **Protection**: Prevents slow backends from causing queue buildup that can crash the proxy server.
191193

194+
### Per-Provider Concurrency Limits
195+
196+
Control the maximum number of concurrent requests each provider can handle:
197+
198+
```yaml
199+
providers:
200+
- name: slow_server
201+
base_url: http://192.168.1.100:8080/v1
202+
api_key: dummy
203+
max_concurrent: 1 # This server can only handle 1 request at a time
204+
205+
- name: fast_server
206+
base_url: https://api.fast.com/v1
207+
api_key: ${API_KEY}
208+
max_concurrent: 10 # This server can handle 10 concurrent requests
209+
210+
- name: unlimited_server
211+
base_url: https://api.unlimited.com/v1
212+
api_key: ${API_KEY}
213+
# No max_concurrent means no limit for this provider
214+
```
215+
216+
**Use Cases:**
217+
- **Hardware-limited servers**: Set `max_concurrent: 1` for servers that can't handle parallel requests
218+
- **Rate limiting**: Prevent overwhelming providers with too many concurrent requests
219+
- **Resource management**: Balance load across providers with different capacities
220+
- **Cost control**: Limit expensive providers while allowing more requests to cheaper ones
221+
222+
**Behavior:**
223+
- If a provider is at max capacity, the proxy tries the next available provider
224+
- Requests wait briefly (0.5s) for a slot before moving to the next provider
225+
- Works with all routing strategies (weighted, round_robin, failover)
226+
192227
### Environment Variables
193228

194229
The configuration supports flexible environment variable interpolation:

0 commit comments

Comments
 (0)