Enable conditional visual token pruning configuration for WWB #2928

yangwang201911 · 2025-10-28T03:15:23Z

Add configuration for WWB to enable conditional visual token pruning based on PR#2714.

Ticket: CVS-173849

…e configuration.

…ased models

…yDPP

…r requests for performance optimization

…e configuration. Refactor CDPruner to use visual tokens percentage instead of count for pruning configuration

…ode and remove unused visual token pruning methods

…e arguments and update GenerationConfig structure

…onfig

…e vision config handling

…genai into ywang2/vlm-cdpruner

…ross codebase for consistency in CDPruner configuration

…ructor

…in_percentage" - Updated Python scripts to reflect the corrected parameter name in argument parsing and configuration settings. 2. Added unit tests for the FastGreedyDPP class to ensure proper functionality and selection behavior based on the visual tokens retention percentage.

… FastGreedyDPP

…genai into ywang2/vlm-cdpruner

…ions

…elated configurations

…ove performance

…ht assignment in visual language generation

… in model_utils

…genai into ywang2/vlm-cdpruner

Copilot

Pull Request Overview

This PR enables conditional visual token pruning configuration for WWB (WhoWhatBench) by integrating the CDPruner feature from PR#2714. The implementation adds configuration options for visual token pruning in the WWB benchmark tool and provides comprehensive C++ infrastructure for the CDPruner functionality.

Key Changes

Added --generate-config argument to WWB for pruning configuration via JSON
Exposed pruning_ratio and relevance_weight parameters in GenerationConfig for Python/C++ APIs
Implemented CDPruner infrastructure with OpenCL GPU acceleration support and CPU fallback
Extended Qwen2VL model to support visual token pruning with proper position ID adjustments

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tools/who_what_benchmark/whowhatbench/wwb.py	Added CLI argument for generation config and renamed config reading function
tools/who_what_benchmark/whowhatbench/visualtext_evaluator.py	Updated function signature to accept generation_config parameter
tests/cpp/test_cdpruner_dpp.cpp	New comprehensive test suite for CDPruner DPP algorithm
tests/cpp/CMakeLists.txt	Added OpenCL support for CDPruner tests
src/python/py_generation_config.cpp	Exposed pruning parameters to Python bindings
src/python/openvino_genai/py_openvino_genai.pyi	Added type hints for pruning parameters
src/cpp/src/visual_language/vision_encoder.{hpp,cpp}	Added CDPruner integration and configuration methods
src/cpp/src/visual_language/qwen2vl/classes.{hpp,cpp}	Implemented text feature extraction and pruning logic for Qwen2VL
src/cpp/src/visual_language/pipeline_base.hpp	Fixed namespace qualifier for utility function
src/cpp/src/visual_language/pipeline.cpp	Added warning for pruning with non-PA backends
src/cpp/src/visual_language/inputs_embedder.{hpp,cpp}	Added pruning configuration interface
src/cpp/src/visual_language/cdpruner/*	New CDPruner implementation files (config, kernel builder, DPP, relevance calculator)
src/cpp/src/continuous_batching/pipeline_base.cpp	Applied pruning config from generation parameters
src/cpp/include/openvino/genai/generation_config.hpp	Added pruning_ratio and relevance_weight to GenerationConfig
src/cpp/CMakeLists.txt	Added OpenCL detection and SIMD compilation flags

Comments suppressed due to low confidence (1)

src/cpp/src/visual_language/cdpruner/cdpruner_config.hpp:1

In the multi-frame overview, the output says 'Keep {100 - pruning_ratio}%' but pruning_ratio is already a percentage value (0-100), not a ratio (0.0-1.0). This should be 'Keep {pruning_ratio}%' without the subtraction, as shown at line 399 in the single-frame version.

// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-10-28T03:16:36Z

tools/who_what_benchmark/whowhatbench/wwb.py

                processor=processor,
                crop_question=crop_question,
+                generation_config=gen_config,
+                seqs_per_request=getattr(args, "seqs_per_request", 1)  # Default to 1 if not set; make configurable to avoid magic number


The comment suggests making seqs_per_request configurable, but it appears to use getattr which would return the attribute if it exists. However, seqs_per_request is not defined anywhere in the diff as an argument. Either add the argument definition to the parser or remove the comment about making it configurable if this is intentional fallback behavior.

Copilot · 2025-10-28T03:16:37Z

src/cpp/src/visual_language/qwen2vl/classes.cpp

+    size_t num_patches = original_shape[0];
+    size_t embedding_dim = original_shape[1];
+    size_t new_patches = num_patches / image_num;
+    OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image");


The error message 'Inconsistent number of patches per image' is unclear. Consider providing more detail such as: "Expected total patches ({expected}) to equal patches per image ({new_patches}) * number of images ({image_num}), but got {original_shape[0]}".

Suggested change

OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image");

OPENVINO_ASSERT(

original_shape[0] == new_patches * image_num,

"Expected total patches (" + std::to_string(new_patches * image_num) +

") to equal patches per image (" + std::to_string(new_patches) +

") * number of images (" + std::to_string(image_num) +

"), but got " + std::to_string(original_shape[0])

);

Copilot · 2025-10-28T03:16:37Z

src/cpp/src/visual_language/pipeline.cpp

+
+        // Add text prompt to vision config for CDPruner
+        if (generation_config.pruning_ratio != 0) {
+            std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention "


The warning message uses 'PA' as an acronym without defining it. Consider spelling out 'PagedAttention' or providing more context about what PA means for better clarity.

Suggested change

std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention "

std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PagedAttention as the attention "

liangali and others added 30 commits August 1, 2025 10:06

[POC] implement cdpruner for qwen2.5-vl

3afb35b

Enhance CDPruner and RelevanceCalculator to support negative relevanc…

38879b5

…e configuration.

Update CDPruner configuration to enable negative relevance for CLIP-b…

5bedef4

…ased models

Add support for subgraph in CDPruner and ConditionalKernelBuilder

c81af98

Update L2 normalization function

4c7e1c0

Skip updating marginal gains for already selected tokens in FastGreed…

5c1b678

…yDPP

Enhance ConditionalKernelBuilder to precompile models and create infe…

4ac2a1c

…r requests for performance optimization

Enhance CDPruner and RelevanceCalculator to support negative relevanc…

1d2ff66

…e configuration. Refactor CDPruner to use visual tokens percentage instead of count for pruning configuration

Add CDPruner configuration parameters to GenerationConfig

95c243f

Implement GPU model compilation in constructor.

221456b

Refactor CDPruner configuration: rename debug_mode to pruning_debug_m…

79d7955

…ode and remove unused visual token pruning methods

Enhance CDPruner configuration: add pruning parameters to command-lin…

79529fa

…e arguments and update GenerationConfig structure

Merge remote-tracking branch 'upstream' into ywang2/enable_cdpruner_c…

99f55b0

…onfig

Refactor CDPruner configuration: remove unused settings and streamlin…

6a4a332

…e vision config handling

update format

5ba4d7d

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

b618463

…genai into ywang2/vlm-cdpruner

Merge branch 'ywang2/enable_cdpruner_config' into ywang2/vlm-cdpruner

e63d071

Refactor pruning debug mode checks and enable ops model by default

ebf1a18

Add logging for CDPruner configuration

087d1c8

Add logging for CDPruner configuration settings

81fcf68

Rename visual_tokens_percentage to viusal_tokens_retain_percentage ac…

cc89a26

…ross codebase for consistency in CDPruner configuration

Initialize CDPruner with default configuration in VisionEncoder const…

05e7e65

…ructor

Add debug logging for conditional kernel matrix and marginal gains in…

c1e1f45

… FastGreedyDPP

update.

2cb1e8f

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

572b251

…genai into ywang2/vlm-cdpruner

[visual_language_chat] Add CDPruner options and update usage instruct…

26b29f7

…ions

Enhance CDPruner functionality with new ops model option and update r…

c0280eb

…elated configurations

Refactor CDPruner debug output for consistency and clarity in logging

b2f2601

Optimize orthogonal vector computation: reduce memory access and impr…

9452f2f

…ove performance

yangwang201911 and others added 5 commits October 22, 2025 21:18

Fix help message for pruning ratio argument and update relevance weig…

344e5d4

…ht assignment in visual language generation

Update CDPruner configuration comments and simplify argument handling…

8d5a38d

… in model_utils

Merge branch 'master' into ywang2/vlm-cdpruner

3156976

Merge branch 'master' of https://github.com/openvinotoolkit/openvino.…

9cda093

…genai into ywang2/vlm-cdpruner

Merge branch 'master' into ywang2/vlm-cdpruner

62a6aca

Copilot AI review requested due to automatic review settings October 28, 2025 03:15

yangwang201911 requested a review from peterchen-intel October 28, 2025 03:15

Copilot AI reviewed Oct 28, 2025

View reviewed changes

peterchen-intel mentioned this pull request Oct 28, 2025

Conditional visual token pruning for QWen-VL models. #2714

Closed

enable cdpruner for wwb.

0892c4b

peterchen-intel added the do_not_merge label Oct 28, 2025

peterchen-intel added the Code Freeze label Oct 29, 2025

Remove CDPruner integration

16af778

peterchen-intel removed the Code Freeze label Nov 4, 2025

yangwang201911 mentioned this pull request Nov 28, 2025

Conditional visual token pruning for QWen-VL models. #3084

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable conditional visual token pruning configuration for WWB #2928

Enable conditional visual token pruning configuration for WWB #2928

Uh oh!

yangwang201911 commented Oct 28, 2025 •

edited by peterchen-intel

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 28, 2025

Uh oh!

Copilot AI Oct 28, 2025

Uh oh!

Copilot AI Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

-    OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image");
+    OPENVINO_ASSERT(
+        original_shape[0] == new_patches * image_num,
+        "Expected total patches (" + std::to_string(new_patches * image_num) +
+        ") to equal patches per image (" + std::to_string(new_patches) +
+        ") * number of images (" + std::to_string(image_num) +
+        "), but got " + std::to_string(original_shape[0])
+    );

	std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention "
	std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PagedAttention as the attention "

Enable conditional visual token pruning configuration for WWB #2928

Are you sure you want to change the base?

Enable conditional visual token pruning configuration for WWB #2928

Uh oh!

Conversation

yangwang201911 commented Oct 28, 2025 • edited by peterchen-intel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Key Changes

Reviewed Changes

Uh oh!

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yangwang201911 commented Oct 28, 2025 •

edited by peterchen-intel

Loading