You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Autocomplete now makes use of the default-max-batch-size (#120)
* autocomplete now makes use of the default-max-batch-size
* check to determine dynamic batch scheduler after we set the new batch size
* added default-max-batch-size section
* when default_max_batch_size=0, then max_batch_size=0
*`memory.enable_memory_arena_shrinkage`: See [this](https://github.com/microsoft/onnxruntime/blob/master/include/onnxruntime/core/session/onnxruntime_run_options_config_keys.h) for more information.
186
186
187
187
### Command line options
188
+
189
+
#### Thread Pools
190
+
188
191
When intra and inter op threads is set to 0 or a value higher than 1, by default ORT creates threadpool per session. This may not be ideal in every scenario, therefore ORT also supports global threadpools. When global threadpools are enabled ORT creates 1 global threadpool which is shared by every session. Use the backend config to enable global threadpool. When global threadpool is enabled, intra and inter op num threads config should also be provided via backend config. Config values provided in model config will be ignored.
The default-max-batch-size value is used for max_batch_size during [Autocomplete](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#auto-generated-model-configuration) when no
200
+
other value is found. If the `--strict-model-config=false` command-line
201
+
option is used, the onnxruntime backend will set the max_batch_size
202
+
of the model to this default value under the following conditions:
203
+
204
+
1. Autocomplete has determined the model is capable of batching requests.
205
+
2. max_batch_size is 0 in the model configuration or max_batch_size
206
+
is omitted from the model configuration.
207
+
208
+
If max_batch_size > 1 and no [scheduler](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#scheduling-and-batching) is provided, the dynamic batch scheduler will be used.
0 commit comments