Skip to content

Commit d4acf51

Browse files
authored
[Metrics] Fix KV cache usage percent metric multiproc (vllm-project#28792)
The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning ``` vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0 vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035 ... ``` The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`. Signed-off-by: Jae-Won Chung <jwnchung@umich.edu>
1 parent ab01cd1 commit d4acf51

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

vllm/v1/metrics/loggers.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,7 @@ def __init__(
494494
gauge_kv_cache_usage = self._gauge_cls(
495495
name="vllm:kv_cache_usage_perc",
496496
documentation="KV-cache usage. 1 means 100 percent usage.",
497+
multiprocess_mode="mostrecent",
497498
labelnames=labelnames,
498499
)
499500
self.gauge_kv_cache_usage = make_per_engine(

0 commit comments

Comments
 (0)