Skip to content

Commit d43022f

Browse files
authored
[doc]fix readme for kv pool user guide (#4271)
### What this PR does / why we need it? Add the parameter "register_buffer" for PD Aggregated Scenario in the given example. - vLLM version: v0.11.0 - vLLM main: vllm-project/vllm@2918c1b Signed-off-by: Pz1116 <zpbzpb123123@gmail.com>
1 parent 2938bd5 commit d43022f

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

docs/source/user_guide/feature_guide/kv_pool_mooncake.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,12 +266,15 @@ python3 -m vllm.entrypoints.openai.api_server \
266266
"kv_connector": "MooncakeConnectorStoreV1",
267267
"kv_role": "kv_both",
268268
"kv_connector_extra_config": {
269+
"register_buffer": true,
269270
"use_layerwise": false,
270271
"mooncake_rpc_port":"0"
271272
}
272273
}' > mix.log 2>&1
273274
```
274275

276+
`register_buffer` is set to `false` by default and need to be set to `true` only in PD-mixed scenario.
277+
275278
### 2. Run Inference
276279

277280
Configure the localhost, port, and model weight path in the command to your own settings. The requests sent will only go to the port where the mixed deployment script is located, and there is no need to start a separate proxy.

0 commit comments

Comments
 (0)