[Core] Send kv events from worker side to scheduler side #28309

hickeyma · 2025-11-07T17:19:34Z

Purpose

This is required for when worker side operations from a connector generates KV cache events. This PR enables events to be passed to the scheduler side so that they can be published by the engine.

Related to #28252 and from feedback by @markmc in #28252 (comment)

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request introduces a mechanism to send KV cache events from the worker side to the scheduler side, which is a crucial feature for connectors that generate these events. The changes are logical, but I've identified a critical bug in the lmcache_connector.py that could lead to event loss, and an incorrect type hint in kv_connector_model_runner_mixin.py. Addressing these issues will improve the correctness and maintainability of the new functionality.

vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py

vllm/v1/worker/kv_connector_model_runner_mixin.py

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

@markmc

This commit follows recommendation from @markmc to use a specific event property instead of piggybacking on stats. PR vllm-project#28309 adds the events property to KVConnectorOutput and this commit picks up the new property and uses it to pass the events from worker side to scheduler side. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

njhill · 2025-11-07T19:56:51Z

@orozery

orozery · 2025-11-09T08:01:58Z

@hickeyma I think you're missing aggregation code in KVOutputAggregator.
This means adding more state to KVOutputAggregator to aggregate per-block KVEvents.

@njhill Let's think how this change fits to the longer-term future.
Right now, scheduler can send arbitrary metadata to the workers (KVConnectorMetadata).
On the other hand, workers are limited to an explicit KVConnectorOutput, which includes:

request-level completions (finished_sending, finished_recving)
abstract stats
block-level errors
(just noticed this) expected_finished_count - looks like a niche bootstrap configuration param

At the time when I introduced worker->scheduler output aggregation (#19555), my initial suggestion was to allow abstract metadata to flow from worker connectors to the scheduler connector.
I think that the current approach where we limit the worker output has 2 drawbacks:

It makes it harder to implement connector logic (e.g. the current PR is needed).
For those connectors who does succeed in pushing their logic to KVConnectorOutput, the result is a growing-over-time KVConnectorOutput which contains a scattered list of fields which are practically connector-specific.

My view is that we should rename KVConnectorStats to KVConnectorWorkerMeta (which could support get_stats() -> KVConnectorStats).
This will allow free us from maintaining the list of connector-specific worker output fields, and delegate it entirely to the connector implementations themselves.

cc @sdavidbd

markmc · 2025-11-10T11:07:08Z

@njhill Let's think how this change fits to the longer-term future. Right now, scheduler can send arbitrary metadata to the workers (KVConnectorMetadata). On the other hand, workers are limited to an explicit KVConnectorOutput,

I agree with this in general - e.g. in an earlier iteration of #26172 I found myself adding NIXL-specific semantics to the contents of finished_recving, which was only possible because I happened to be able to use the update_connector_output() to extract the NIXL-specific data and translate the contents to what the scheduler expected. An obscure example, but I did think it was a weird contrast with the complete flexibility of connector-specific KVConnectorMetadata subclasses

My view is that we should rename KVConnectorStats to KVConnectorWorkerMeta (which could support get_stats() -> KVConnectorStats).

I'm not sure I agree with this, though:

Although KVConnectorStats is a recent addition, we should by default assume that we need to maintain compatibility and break any potential external connectors that might have already adopted
KVConnectorStats seems like a useful abstraction that should probably be adopted by all connectors over time
KVConnectorStats is very specifically about metrics and integration with Prometheus/console stat logging - for example the aggregate() and reduce() methods - so not easily repurposed as a generic worker->scheduler channel
KVEventsBatch also seems like a useful, connector-independent abstraction, that ties in nicely with the existing public KV events publishing API, so if connectors have to use a connector-specific metadata channel for them, we'd probably see a lot of duplication

But I do think it could be positive to add KVConnectorWorkerMetdata anyway ...

NickLucche · 2025-11-10T11:39:42Z

Thanks for the great breakdown @orozery.
Still, I agree with @markmc as in I think the KVConnectorStats is a connector-agnostic abstraction with a specific purpose.
I also like that the design direction that was followed for worker->scheduler comm was to have a clearer interface instead of a loose container.

the result is a growing-over-time KVConnectorOutput

I understand your point here, but I believe that adding more fields to KVConnectorOutput was kinda part of the original design where it was expected that it would naturally grow to contain more stuff (I think this PR is a good example of that).
Naturally nobody would want the struct to grow indefinitely if we sense that this is the direction in which we're heading.

orozery · 2025-11-10T12:13:46Z

I agree with the usefulness of KVConnectorStats for multiple connectors.
And I think that any field which may be useful for multiple connectors has a good reason to stay explicit (including KVConnectorStats).

My main point is actually on the other hand, for fields that are useful only for a single connector (and I think this may be the case here?), we should have an abstract field that each connector can use freely (similar to KVConnectorMetadata).

markmc · 2025-11-10T12:30:02Z

for fields that are useful only for a single connector, we should have an abstract field that each connector can use freely (similar to KVConnectorMetadata).

Agree 👍

(and I think this may be the case here?),

Honestly, I don't yet fully understand the motivation for the lmcache connector to emit its own events (as per #28252 and LMCache/LMCache#1846) in addition to the KV events already emitted by vLLM ... so I don't have a strong sense either way whether this need is highly-specific to lmcache

hickeyma · 2025-11-12T18:59:15Z

Thanks @njhill @orozery @markmc @NickLucche for the feedback. I will get to it in due course as bit delayed this week because I'm at KubeCon.

Honestly, I don't yet fully understand the motivation for the lmcache connector to emit its own events (as per #28252 and LMCache/LMCache#1846) in addition to the KV events already emitted by vLLM ... so I don't have a strong sense either way whether this need is highly-specific to lmcache

@markmc The reason that a connector like LMCache would need to generate their own events is down to specific details or information. How LMcache stores the tokens/hashes (e.g. LMcache stores tokens with a granularity of 256 as a block and vLLM granularity is 16 as a block) and where it stores them (e.g. LMCache might store them to disk) will differ to vLLM. Therefore, a consumer of the events would be missing the full context of the cache if it only consumed the vLLM events.

hickeyma · 2025-11-12T19:23:06Z

I understand your point here, but I believe that adding more fields to KVConnectorOutput was kinda part of the original design where it was expected that it would naturally grow to contain more stuff (I think this PR is a good example of that).
Naturally nobody would want the struct to grow indefinitely if we sense that this is the direction in which we're heading.

I agree with @NickLucche about having a separate field. @orozery I understand you reservation about only 1 connector possibly using it but still think the separation is best to clearly identify it.

KuntaiDu · 2025-11-12T20:23:10Z

Honestly, I don't yet fully understand the motivation for the lmcache connector to emit its own events (as per #28252 and LMCache/LMCache#1846) in addition to the KV events already emitted by vLLM ... so I don't have a strong sense either way whether this need is highly-specific to lmcache

On LMCache-side, the motivation for LMCache to emit KVEvents is to help the KV-cache-aware routing logics make best decisions, based on not only KV caches in GPU (where current vLLM does), but also KV caches in CPU and local disk (where only LMCache has visibility).

KuntaiDu · 2025-11-12T20:38:20Z

@ApostaC Can you also take a look?

ApostaC · 2025-11-12T22:04:19Z

I like the idea of separating the KVEvents. My understanding is that KVEvents is becoming a standard format of fine-grained KV cache monitoring, and there will be other components using it in the future. Therefore, leaving a standard interface for it makes a lot of sense to me.

Regarding KVConnectorWorkerMeta, I would say it's more like the runtime statistics generated during the inference rather than "metadata" (which sounds more like static stuff). Thus, I would prefer to use output or stats in the name.

cc @orozery @hickeyma @markmc

vllm/distributed/kv_transfer/kv_connector/v1/base.py

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

ApostaC · 2025-11-17T21:54:56Z

Quick follow-up regarding the KV event aggregator:

@hickeyma There is a class that combines worker processes' the connector outputs to a single output and send it back to the scheduler class:

vllm/vllm/distributed/kv_transfer/kv_connector/utils.py

Lines 123 to 125 in d8874c6

    
           class KVOutputAggregator: 
        
               """Utility class to aggregate the output of all workers into a single 
        
               output corresponding to Rank 0 for scheduler."""

IIUC, since KV events should work for all the connectors, it's better to add such aggregation logic for KV events in KVOutputAggregator as well. It should be almost the same as how it deals with kv_connector_stats now.

cc @orozery @NickLucche @njhill (please correct me if my understanding is wrong).

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma · 2025-11-20T15:59:52Z

I stand corrected.

@sdavidbd Not a bother, totally understandable. I divided it out because I thought it was independent of whatever connector provides the events. I can integrate #28252 into this PR if people want.

Thanks @sdavidbd for the deep reviews and feedback.

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

mergify · 2025-11-20T20:09:51Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @hickeyma.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Clarify that it is the workser side that calls the method. Review comment: - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma · 2025-11-24T18:11:51Z

I chatted with @orozery today and he raised 2 concerns as follows:

On LMCache side, he wonders when workers generate events for vLLM does this clearly show that the operation has completed for that KV cache? In other words, does an events need to be generated from all workers to be certain.

Conclusion: I spoke with @ApostaC about this and he agrees that events needs to be checked against the number of workers.

On vLLM side, he would like to look at extensibility and maintainability with connector data going from worker to server side. He would like to abstract this to avoid connectors having to modify it each time it has different data to transfer.

Conclusion: With that said, he is ok to allow the events field in KVConnectorOutput for now and work on the extensibility afterwards outside of the events PR.

@orozery Let me know if I captured your thoughts correctly.

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Clarify that it is the workser side that calls the method. Review comment: - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Clarify that it is the workser side that calls the method. Review comment: - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

This is required for when worker side operations like CPU offloading generate KV cache events. This commit enables theses events to be passed to the scheduler side so that they can be published by the engine. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

The changes to the connector is for a separate PR and this PR is independent of it for now. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

LMCache connector changes are not part of this PR as this PR deals with adding ability to pass KV events between worker and scheduler sides and not a specific connector usage of this. The connector implementation is in a separate PR. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

It is part of the aggregation of kv_connector_output from all workers. For KV cache events, this means combining events from all workers, remvoing any duplications. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Clarify that it is the workser side that calls the method. Review comment: - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Equality methods are unnecessary as its provided by the parent class 'msgspec.Struct' which they extend from. Review comment: https://github.com/vllm-project/vllm/pull/28309/files#r2539044714 Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Workers will generate duplicate kv events in LMCache. This commit adds capability to aggregate on events from workers by returning only those that were emitted by all workers. It also provide an abstract class KVConnectorKVEevnts that is implemented by connectors to handle how they emit events. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma · 2025-11-26T16:07:31Z

Updated the PR to handle duplicate KV events emitted by LMCache workers (as per pt 1. in #28309 (comment)). I have added an abstract class KVConnectorKVEvents to handle KV events between worker and scheduler side. This is implemented per connector as per the policy they emit events. I also re-added the LMCache Connector changes to better show how it all integrates with the connector.

@njhill @orozery @markmc @NickLucche @ApostaC @KuntaiDu @sdavidbd I would appreciate if you could review again. Thanks.

hickeyma requested review from ApostaC and NickLucche as code owners November 7, 2025 17:19

mergify bot added v1 kv-connector labels Nov 7, 2025

gemini-code-assist bot reviewed Nov 7, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py Outdated Show resolved Hide resolved

vllm/v1/worker/kv_connector_model_runner_mixin.py Outdated Show resolved Hide resolved

hickeyma force-pushed the send-events-worker-to-scheduler branch from fc375f8 to 499a425 Compare November 7, 2025 17:27

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 7, 2025

Update after review from gemini-code

5ec45be

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma mentioned this pull request Nov 7, 2025

[WIP][KVConnector] Retrive KV events from LMCache #28252

Draft

5 tasks

ApostaC reviewed Nov 12, 2025

View reviewed changes

vllm/distributed/kv_transfer/kv_connector/v1/base.py Outdated Show resolved Hide resolved

hickeyma force-pushed the send-events-worker-to-scheduler branch from 1763194 to d6a23a6 Compare November 17, 2025 10:42

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 17, 2025

Update after review from gemini-code

a906ef0

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the send-events-worker-to-scheduler branch from 7bcaa13 to 7ad9600 Compare November 18, 2025 10:15

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 18, 2025

Update after review from gemini-code

e6d572b

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma requested a review from markmc as a code owner November 18, 2025 12:13

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 18, 2025

Update after review from gemini-code

9f42a20

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the send-events-worker-to-scheduler branch 2 times, most recently from 7a2e362 to 6487bda Compare November 18, 2025 14:48

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 20, 2025

Add improvements from code review comments

fc498d8

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

mergify bot added the needs-rebase label Nov 20, 2025

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 20, 2025

Update after review from gemini-code

426823f

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 20, 2025

Add improvements from code review comments

577abd2

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the send-events-worker-to-scheduler branch from fc498d8 to 577abd2 Compare November 20, 2025 20:18

mergify bot removed the needs-rebase label Nov 20, 2025

hickeyma force-pushed the send-events-worker-to-scheduler branch from 577abd2 to 7cbff40 Compare November 25, 2025 09:35

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 25, 2025

Update after review from gemini-code

98c3aa0

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 25, 2025

Add improvements from code review comments

7cbff40

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 25, 2025

Update after review from gemini-code

4f2bbef

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma added a commit to hickeyma/vllm that referenced this pull request Nov 25, 2025

Add improvements from code review comments

6973123

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the send-events-worker-to-scheduler branch from 7cbff40 to 6973123 Compare November 25, 2025 10:02

hickeyma added 9 commits November 26, 2025 15:59

Update after review from gemini-code

2da163b

Update comments: - vllm-project#28309 (review) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Remove change to the LMCache connector

2c1008f

The changes to the connector is for a separate PR and this PR is independent of it for now. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Combine events from different worker connectors

19f97cb

It is part of the aggregation of kv_connector_output from all workers. For KV cache events, this means combining events from all workers, remvoing any duplications. Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Add comment to get_kv_connector_kv_cache_events for clarity

0965792

Clarify that it is the workser side that calls the method. Review comment: - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

Add improvements from code review comments

430e970

Review comments: - vllm-project#28309 (comment) - vllm-project#28309 (comment) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>

hickeyma force-pushed the send-events-worker-to-scheduler branch from fdf5784 to db43f44 Compare November 26, 2025 16:00

hickeyma mentioned this pull request Nov 26, 2025

[Core][vLLM]: Add KV event generation LMCache/LMCache#1959

Open

2 tasks

Uh oh!

[Core] Send kv events from worker side to scheduler side #28309

Are you sure you want to change the base?

[Core] Send kv events from worker side to scheduler side #28309

Conversation

hickeyma commented Nov 7, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

njhill commented Nov 7, 2025

Uh oh!

orozery commented Nov 9, 2025

Uh oh!

markmc commented Nov 10, 2025

Uh oh!

NickLucche commented Nov 10, 2025

Uh oh!

orozery commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markmc commented Nov 10, 2025

Uh oh!

hickeyma commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hickeyma commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KuntaiDu commented Nov 12, 2025

Uh oh!

KuntaiDu commented Nov 12, 2025

Uh oh!

ApostaC commented Nov 12, 2025

Uh oh!

Uh oh!

ApostaC commented Nov 17, 2025

Uh oh!

hickeyma commented Nov 20, 2025

Uh oh!

mergify bot commented Nov 20, 2025

Uh oh!

hickeyma commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hickeyma commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

hickeyma commented Nov 7, 2025 •

edited by github-actions bot

Loading

orozery commented Nov 10, 2025 •

edited

Loading

hickeyma commented Nov 12, 2025 •

edited

Loading

hickeyma commented Nov 12, 2025 •

edited

Loading

hickeyma commented Nov 24, 2025 •

edited

Loading