@@ -33,13 +33,15 @@ providers:
3333 base_url : https://api.openai.com/v1
3434 api_key : ${OPENAI_API_KEY}
3535 weight : 2
36+ max_concurrent : 5 # Optional: limit this provider to 5 concurrent requests
3637 model_map :
3738 gpt-4 : gpt-4-turbo-preview # Optional: map model names
3839
3940 - name : backup
4041 base_url : https://api.openai.com/v1
4142 api_key : ${OPENAI_API_KEY_BACKUP}
4243 weight : 1
44+ max_concurrent : 2 # Optional: limit this provider to 2 concurrent requests
4345
4446routing :
4547 strategy : weighted # Options: weighted, round_robin, failover
@@ -189,6 +191,39 @@ queue:
189191- **Automatic Failover**: When a provider times out, it's marked unhealthy and the request automatically fails over to the next available provider.
190192- **Protection**: Prevents slow backends from causing queue buildup that can crash the proxy server.
191193
194+ # ## Per-Provider Concurrency Limits
195+
196+ Control the maximum number of concurrent requests each provider can handle :
197+
198+ ` ` ` yaml
199+ providers:
200+ - name: slow_server
201+ base_url: http://192.168.1.100:8080/v1
202+ api_key: dummy
203+ max_concurrent: 1 # This server can only handle 1 request at a time
204+
205+ - name: fast_server
206+ base_url: https://api.fast.com/v1
207+ api_key: ${API_KEY}
208+ max_concurrent: 10 # This server can handle 10 concurrent requests
209+
210+ - name: unlimited_server
211+ base_url: https://api.unlimited.com/v1
212+ api_key: ${API_KEY}
213+ # No max_concurrent means no limit for this provider
214+ ` ` `
215+
216+ **Use Cases:**
217+ - **Hardware-limited servers**: Set `max_concurrent: 1` for servers that can't handle parallel requests
218+ - **Rate limiting**: Prevent overwhelming providers with too many concurrent requests
219+ - **Resource management**: Balance load across providers with different capacities
220+ - **Cost control**: Limit expensive providers while allowing more requests to cheaper ones
221+
222+ **Behavior:**
223+ - If a provider is at max capacity, the proxy tries the next available provider
224+ - Requests wait briefly (0.5s) for a slot before moving to the next provider
225+ - Works with all routing strategies (weighted, round_robin, failover)
226+
192227# ## Environment Variables
193228
194229The configuration supports flexible environment variable interpolation :
0 commit comments