[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU #530

kzawora-intel · 2025-11-05T15:24:43Z

requires #526, the next logical step - we remove usage of metadata postprocessor inside HpuModelAdapter and do it at input preparation time, and on CPU, copying data asynchronously to HPU. I needed also to change some stuff around for the processor to accept untrimmed metadata - this works as-is, but unfortunately I've noticed pretty significant performance drop in small models e2e perf.

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

…ata_processor

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

This PR moves HPU attention metadata processing from the HpuModelAdapter into a dedicated HPUAttentionMetadataProcessor class, allowing metadata biases to be computed on CPU and copied asynchronously to HPU. This refactoring removes metadata processing logic from the model forward path and handles it at input preparation time instead.

Key Changes:

Extracted metadata processing into a standalone HPUAttentionMetadataProcessor class
Moved metadata processing to occur during input preparation (prefill/decode batch formation) rather than in model forward
Added support for processing metadata on CPU with async copy to HPU device

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/v1/worker/hpu_model_runner.py

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

…or' into private/kzawora/metadata_process_cpu Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vllm_gaudi/v1/worker/hpu_model_runner.py

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot

Pull Request Overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-05T15:58:19Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+def metadata_update_with_trim(obj: object, typename: str, trim: bool, **to_override):
+    if trim:
+        return custom_tuple_replace(obj, typename, **to_override)
+
+    for key in to_override:
+        assert hasattr(obj, key), f"Field {key} must exist in untrimmed metadata."
+        setattr(obj, key, to_override[key])
+    return obj


The function metadata_update_with_trim lacks a docstring explaining its purpose, parameters, return value, and the distinction between trimmed and untrimmed metadata handling. This is especially important given the conditional logic and the use of setattr for dynamic attribute modification.

Copilot · 2025-11-05T15:58:20Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+        assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias"
+        context_lens_t = prefill_metadata.context_lens_tensor
+        assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"


The error message should be more specific by indicating which phase (prefill) or operation is being performed when this assertion fails, to help with debugging.

Suggested change

assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias"

context_lens_t = prefill_metadata.context_lens_tensor

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"

assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias during prefill (prompt) phase"

context_lens_t = prefill_metadata.context_lens_tensor

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias during prefill (prompt) phase"

Copilot · 2025-11-05T15:58:20Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+        seq_lens_t = prefill_metadata.seq_lens_tensor
+        assert seq_lens_t is not None, "seq_lens_tensor is required to build attn_bias"
+        context_lens_t = prefill_metadata.context_lens_tensor
+        assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"


The error message should be more specific by indicating which phase (prefill) or operation is being performed when this assertion fails, to help with debugging.

Suggested change

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias during prefill phase"

Copilot · 2025-11-05T15:58:20Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+
+        if self.prefill_use_fusedsdpa and attn_metadata.block_list is not None:
+            context_lens_t = prefill_metadata.context_lens_tensor
+            assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"


The error message should be more specific by indicating this is for sliding window attention to aid debugging when this assertion fails.

Suggested change

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"

assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias for sliding window attention"

Copilot · 2025-11-05T15:58:21Z

vllm_gaudi/v1/worker/hpu_model_runner.py

+            # NOTE(kzawora): I'm not sure why we set block mapping twice for sliding window
+            # - we should check if that can be reduced to a single call.


This TODO-style comment expresses uncertainty about the implementation. Either investigate and resolve this concern, or rephrase as a clearer explanation if the double call is intentional (e.g., for separate window and non-window blocks).

Suggested change

# NOTE(kzawora): I'm not sure why we set block mapping twice for sliding window

# - we should check if that can be reduced to a single call.

# For sliding window, we set block mapping twice: once for the base mapping and once for the sliding window mapping.

# This ensures both standard and sliding window block mappings are correctly applied.

kzawora-intel added 7 commits November 5, 2025 13:44

WA for preemptions

8ab19f2

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Fix spec decode & unified attn preemptions

602ac39

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

code cleanup

1391286

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Extract metadata update to HPUAttentionMetadataProcessor

0f8b45f

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Merge remote-tracking branch 'origin/main' into private/kzawora/metad…

6cdfbb6

…ata_processor

use vllm_config

437b0c3

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Move metadata processing outside HPUModelAdapter, process biases on CPU

5b79e4c

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:24

kzawora-intel requested review from adobrzyn, afierka-intel, iboiko-habana, mgawarkiewicz-intel, michalkuligowski, mswiniarsk, vivekgoe and xuechendi as code owners November 5, 2025 15:24

kzawora-intel changed the title ~~[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, process biases on CPU~~ [Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU Nov 5, 2025

Copilot AI reviewed Nov 5, 2025

View reviewed changes

kzawora-intel added 4 commits November 5, 2025 17:32

fix precommit

00c22bc

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

make copilot happy

1a5a3e1

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Merge remote-tracking branch 'origin/private/kzawora/metadata_process…

e446c09

…or' into private/kzawora/metadata_process_cpu Signed-off-by: Konrad Zawora <kzawora@habana.ai>

make copilot happy

09d7130

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:44

Copilot AI reviewed Nov 5, 2025

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

kzawora-intel added 3 commits November 5, 2025 17:47

make copilot happy again

e35abfb

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

make copilot happy again

36aef30

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

GODDAMN COMMA

138ecaf

Signed-off-by: Konrad Zawora <kzawora@habana.ai>

Copilot AI review requested due to automatic review settings November 5, 2025 15:56

Copilot AI reviewed Nov 5, 2025

View reviewed changes

PatrykWo self-requested a review November 5, 2025 16:07

PatrykWo removed their request for review November 5, 2025 16:07

	assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias"
	assert context_lens_t is not None, "context_lens_tensor is required to build attn_bias during prefill phase"

		# NOTE(kzawora): I'm not sure why we set block mapping twice for sliding window
		# - we should check if that can be reduced to a single call.

[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU #530

Are you sure you want to change the base?

[Attention Metadata Overhaul 2/N] Move metadata processing outside HPUModelAdapter, prepare biases on CPU #530

Uh oh!

Conversation

kzawora-intel commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kzawora-intel commented Nov 5, 2025 •

edited

Loading