Skip to content

Commit 3c3a7a9

Browse files
authored
Update finished KV transfer state after every step (#532)
In P/D disaggregation scenario, most of time are decoding forward runs in decode instances, we need update finished KV transfer states after decode forward as well (not only prefill forward). Otherwise, even KV transfer is already finished in prefill instance, while decode instance cannot get finished state in time (switching state from `WAITING_FOR_REMOTE_KVS` to `WAITING`) which will increase TTFT. Signed-off-by: Wuxun Zhang <wuxun.zhang@intel.com>
1 parent 257dada commit 3c3a7a9

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm_gaudi/v1/worker/hpu_model_runner.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3137,7 +3137,6 @@ def execute_model(
31373137
self.profiler.record_counter(self.event_start, counters)
31383138
if not warmup_mode:
31393139
self.maybe_wait_for_kv_save()
3140-
finished_sending, finished_recving = (self.get_finished_kv_transfers(scheduler_output))
31413140

31423141
if self.is_driver_worker and self.profiler.enabled:
31433142
self.profiler_counter_helper.reset_prompt_seq_stats()
@@ -3377,6 +3376,9 @@ def execute_model(
33773376
all_req_ids = pd_info.decode_req_ids + pd_info.prompt_req_ids
33783377
logprobs = None
33793378

3379+
if not warmup_mode:
3380+
finished_sending, finished_recving = self.get_finished_kv_transfers(scheduler_output)
3381+
33803382
if self.use_async_scheduling:
33813383
model_runner_output = ModelRunnerOutput(
33823384
req_ids=req_ids_output_copy, # CHECK

0 commit comments

Comments
 (0)