-
Notifications
You must be signed in to change notification settings - Fork 303
Enable conditional visual token pruning configuration for WWB #2928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Enable conditional visual token pruning configuration for WWB #2928
Conversation
…r requests for performance optimization
…e configuration. Refactor CDPruner to use visual tokens percentage instead of count for pruning configuration
…ode and remove unused visual token pruning methods
…e arguments and update GenerationConfig structure
…e vision config handling
…genai into ywang2/vlm-cdpruner
…ross codebase for consistency in CDPruner configuration
…in_percentage" - Updated Python scripts to reflect the corrected parameter name in argument parsing and configuration settings. 2. Added unit tests for the FastGreedyDPP class to ensure proper functionality and selection behavior based on the visual tokens retention percentage.
…genai into ywang2/vlm-cdpruner
…elated configurations
…ht assignment in visual language generation
…genai into ywang2/vlm-cdpruner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enables conditional visual token pruning configuration for WWB (WhoWhatBench) by integrating the CDPruner feature from PR#2714. The implementation adds configuration options for visual token pruning in the WWB benchmark tool and provides comprehensive C++ infrastructure for the CDPruner functionality.
Key Changes
- Added
--generate-configargument to WWB for pruning configuration via JSON - Exposed
pruning_ratioandrelevance_weightparameters in GenerationConfig for Python/C++ APIs - Implemented CDPruner infrastructure with OpenCL GPU acceleration support and CPU fallback
- Extended Qwen2VL model to support visual token pruning with proper position ID adjustments
Reviewed Changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/who_what_benchmark/whowhatbench/wwb.py | Added CLI argument for generation config and renamed config reading function |
| tools/who_what_benchmark/whowhatbench/visualtext_evaluator.py | Updated function signature to accept generation_config parameter |
| tests/cpp/test_cdpruner_dpp.cpp | New comprehensive test suite for CDPruner DPP algorithm |
| tests/cpp/CMakeLists.txt | Added OpenCL support for CDPruner tests |
| src/python/py_generation_config.cpp | Exposed pruning parameters to Python bindings |
| src/python/openvino_genai/py_openvino_genai.pyi | Added type hints for pruning parameters |
| src/cpp/src/visual_language/vision_encoder.{hpp,cpp} | Added CDPruner integration and configuration methods |
| src/cpp/src/visual_language/qwen2vl/classes.{hpp,cpp} | Implemented text feature extraction and pruning logic for Qwen2VL |
| src/cpp/src/visual_language/pipeline_base.hpp | Fixed namespace qualifier for utility function |
| src/cpp/src/visual_language/pipeline.cpp | Added warning for pruning with non-PA backends |
| src/cpp/src/visual_language/inputs_embedder.{hpp,cpp} | Added pruning configuration interface |
| src/cpp/src/visual_language/cdpruner/* | New CDPruner implementation files (config, kernel builder, DPP, relevance calculator) |
| src/cpp/src/continuous_batching/pipeline_base.cpp | Applied pruning config from generation parameters |
| src/cpp/include/openvino/genai/generation_config.hpp | Added pruning_ratio and relevance_weight to GenerationConfig |
| src/cpp/CMakeLists.txt | Added OpenCL detection and SIMD compilation flags |
Comments suppressed due to low confidence (1)
src/cpp/src/visual_language/cdpruner/cdpruner_config.hpp:1
- In the multi-frame overview, the output says 'Keep {100 - pruning_ratio}%' but pruning_ratio is already a percentage value (0-100), not a ratio (0.0-1.0). This should be 'Keep {pruning_ratio}%' without the subtraction, as shown at line 399 in the single-frame version.
// Copyright (C) 2023-2025 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| processor=processor, | ||
| crop_question=crop_question, | ||
| generation_config=gen_config, | ||
| seqs_per_request=getattr(args, "seqs_per_request", 1) # Default to 1 if not set; make configurable to avoid magic number |
Copilot
AI
Oct 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment suggests making seqs_per_request configurable, but it appears to use getattr which would return the attribute if it exists. However, seqs_per_request is not defined anywhere in the diff as an argument. Either add the argument definition to the parser or remove the comment about making it configurable if this is intentional fallback behavior.
| size_t num_patches = original_shape[0]; | ||
| size_t embedding_dim = original_shape[1]; | ||
| size_t new_patches = num_patches / image_num; | ||
| OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image"); |
Copilot
AI
Oct 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message 'Inconsistent number of patches per image' is unclear. Consider providing more detail such as: "Expected total patches ({expected}) to equal patches per image ({new_patches}) * number of images ({image_num}), but got {original_shape[0]}".
| OPENVINO_ASSERT(original_shape[0] == new_patches * image_num, "Inconsistent number of patches per image"); | |
| OPENVINO_ASSERT( | |
| original_shape[0] == new_patches * image_num, | |
| "Expected total patches (" + std::to_string(new_patches * image_num) + | |
| ") to equal patches per image (" + std::to_string(new_patches) + | |
| ") * number of images (" + std::to_string(image_num) + | |
| "), but got " + std::to_string(original_shape[0]) | |
| ); |
|
|
||
| // Add text prompt to vision config for CDPruner | ||
| if (generation_config.pruning_ratio != 0) { | ||
| std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention " |
Copilot
AI
Oct 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warning message uses 'PA' as an acronym without defining it. Consider spelling out 'PagedAttention' or providing more context about what PA means for better clarity.
| std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PA as the attention " | |
| std::cout << "[CDPruner] Warning: Pruning is disabled. It is only supported when using PagedAttention as the attention " |
Add configuration for WWB to enable conditional visual token pruning based on PR#2714.
Ticket: CVS-173849