Skip to content

Conversation

@yangwang201911
Copy link
Contributor

@yangwang201911 yangwang201911 commented Oct 28, 2025

Add configuration for WWB to enable conditional visual token pruning based on PR#2714.

Ticket: CVS-173849

liangali and others added 30 commits August 1, 2025 10:06
…e configuration.

Refactor CDPruner to use visual tokens percentage instead of count for pruning configuration
…ode and remove unused visual token pruning methods
…e arguments and update GenerationConfig structure
…ross codebase for consistency in CDPruner configuration
…in_percentage"

- Updated Python scripts to reflect the corrected parameter name in argument parsing and configuration settings.
2.  Added unit tests for the FastGreedyDPP class to ensure proper functionality and selection behavior based on the visual tokens retention percentage.
Copilot AI review requested due to automatic review settings October 28, 2025 03:15
@github-actions github-actions bot added category: WWB PR changes WWB category: visual language Visual language pipeline category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: cmake / build Cmake scripts category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers no-match-files labels Oct 28, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enables conditional visual token pruning configuration for WWB (WhoWhatBench) by integrating the CDPruner feature from PR#2714. The implementation adds configuration options for visual token pruning in the WWB benchmark tool and provides comprehensive C++ infrastructure for the CDPruner functionality.

Key Changes

  • Added --generate-config argument to WWB for pruning configuration via JSON
  • Exposed pruning_ratio and relevance_weight parameters in GenerationConfig for Python/C++ APIs
  • Implemented CDPruner infrastructure with OpenCL GPU acceleration support and CPU fallback
  • Extended Qwen2VL model to support visual token pruning with proper position ID adjustments

Reviewed Changes

Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/who_what_benchmark/whowhatbench/wwb.py Added CLI argument for generation config and renamed config reading function
tools/who_what_benchmark/whowhatbench/visualtext_evaluator.py Updated function signature to accept generation_config parameter
tests/cpp/test_cdpruner_dpp.cpp New comprehensive test suite for CDPruner DPP algorithm
tests/cpp/CMakeLists.txt Added OpenCL support for CDPruner tests
src/python/py_generation_config.cpp Exposed pruning parameters to Python bindings
src/python/openvino_genai/py_openvino_genai.pyi Added type hints for pruning parameters
src/cpp/src/visual_language/vision_encoder.{hpp,cpp} Added CDPruner integration and configuration methods
src/cpp/src/visual_language/qwen2vl/classes.{hpp,cpp} Implemented text feature extraction and pruning logic for Qwen2VL
src/cpp/src/visual_language/pipeline_base.hpp Fixed namespace qualifier for utility function
src/cpp/src/visual_language/pipeline.cpp Added warning for pruning with non-PA backends
src/cpp/src/visual_language/inputs_embedder.{hpp,cpp} Added pruning configuration interface
src/cpp/src/visual_language/cdpruner/* New CDPruner implementation files (config, kernel builder, DPP, relevance calculator)
src/cpp/src/continuous_batching/pipeline_base.cpp Applied pruning config from generation parameters
src/cpp/include/openvino/genai/generation_config.hpp Added pruning_ratio and relevance_weight to GenerationConfig
src/cpp/CMakeLists.txt Added OpenCL detection and SIMD compilation flags
Comments suppressed due to low confidence (1)

src/cpp/src/visual_language/cdpruner/cdpruner_config.hpp:1

  • In the multi-frame overview, the output says 'Keep {100 - pruning_ratio}%' but pruning_ratio is already a percentage value (0-100), not a ratio (0.0-1.0). This should be 'Keep {pruning_ratio}%' without the subtraction, as shown at line 399 in the single-frame version.
// Copyright (C) 2023-2025 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

processor=processor,
crop_question=crop_question,
generation_config=gen_config,
seqs_per_request=getattr(args, "seqs_per_request", 1) # Default to 1 if not set; make configurable to avoid magic number
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests making seqs_per_request configurable, but it appears to use getattr which would return the attribute if it exists. However, seqs_per_request is not defined anywhere in the diff as an argument. Either add the argument definition to the parser or remove the comment about making it configurable if this is intentional fallback behavior.

Copilot uses AI. Check for mistakes.
size_t num_patches = original_shape[0];
size_t embedding_dim = original_shape[1];
size_t new_patches = num_patches / image_num;
OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image");
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message 'Inconsistent number of patches per image' is unclear. Consider providing more detail such as: "Expected total patches ({expected}) to equal patches per image ({new_patches}) * number of images ({image_num}), but got {original_shape[0]}".

Suggested change
OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image");
OPENVINO_ASSERT(
original_shape[0] == new_patches * image_num,
"Expected total patches (" + std::to_string(new_patches * image_num) +
") to equal patches per image (" + std::to_string(new_patches) +
") * number of images (" + std::to_string(image_num) +
"), but got " + std::to_string(original_shape[0])
);

Copilot uses AI. Check for mistakes.

// Add text prompt to vision config for CDPruner
if (generation_config.pruning_ratio != 0) {
std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention "
Copy link

Copilot AI Oct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message uses 'PA' as an acronym without defining it. Consider spelling out 'PagedAttention' or providing more context about what PA means for better clarity.

Suggested change
std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention "
std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PagedAttention as the attention "

Copilot uses AI. Check for mistakes.
@github-actions github-actions bot removed category: visual language Visual language pipeline category: continuous batching Continuous batching category: sampling Sampling / Decoding algorithms category: cmake / build Cmake scripts category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers no-match-files labels Oct 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants