Skip to content

Conversation

@krzyczar
Copy link
Contributor

@krzyczar krzyczar commented Nov 6, 2025

Description

CVS-173846

Fixes #(issue)

Checklist:

  • Tests have been updated or added to cover the new code.
  • This patch fully addresses the ticket.
  • I have made corresponding changes to the documentation.

@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Nov 6, 2025
@krzyczar krzyczar force-pushed the kcz/support-for-video-in-benchmark branch 6 times, most recently from 25aab2d to f179410 Compare November 6, 2025 14:37
@krzyczar krzyczar force-pushed the kcz/support-for-video-in-benchmark branch 3 times, most recently from 6b04a70 to aa756f2 Compare November 7, 2025 16:04
xipingyan and others added 2 commits November 8, 2025 09:25
1. Enable video preprocessing for Qwen VL model.
    Add ov::Property<std::vector<ov::Tensor>> videos{"videos"};
2. Support: mix images and videos input.
3. The main updates for Qwen-VL series models:
-- For video: For 2-in-1 merging, if 9 images are input, only 5 images
are actually processed.
-- For image: For 2-in-1 merging, we only double each image, so if we
input 9 images, we only actually process 9 images.
-- Introduce "`If`" node, merge video and image preprocess into one OV
subgroup.

**tickets**:  CVS-173219

---------

Signed-off-by: xipingya <xiping.yan@intel.com>
Signed-off-by: xiping.yan <xiping.yan@intel.com>
Co-authored-by: Wanglei Shen <wanglei.shen@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
Co-authored-by: Artur Paniukov <chgk1101@gmail.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Wovchena Wovchena requested a review from Copilot November 10, 2025 07:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for video input in the LLM benchmark tool, enabling benchmarking of visual language models with video data alongside existing image support.

Key Changes:

  • Added video processing capability through a new make_video_tensor function that reads video files and converts them to frame tensors
  • Extended benchmark functions to accept and process video inputs in addition to images
  • Updated JSON parsing to handle video file paths in prompt configurations

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tools/llm_bench/task/visual_language_generation.py Added video input handling, parameter passing for frame control, and validation to prevent simultaneous media/video specification
tools/llm_bench/requirements.txt Added opencv-python dependency for video processing
tools/llm_bench/llm_bench_utils/prompt_utils.py Implemented video frame extraction and decimation logic with new make_video_tensor function
tools/llm_bench/llm_bench_utils/parse_json_data.py Refactored JSON parsing with shared validation logic and added video key support
tools/llm_bench/benchmark.py Added command-line argument for video frame control

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


@print_video_frames_number_and_convert_to_tensor
def make_video_tensor(video_path, decym_frames=None):
supported_files = set([".mp4"])
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The set literal syntax {'.mp4'} is more idiomatic than set(['.mp4']) for a single element set.

Suggested change
supported_files = set([".mp4"])
supported_files = {".mp4"}

Copilot uses AI. Check for mistakes.
supported_files = set([".mp4"])

assert os.path.exists(video_path), f"no input video file: {video_path}"
assert video_path.suffix.lower() in supported_files, "no supported video file"
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error messages should be more descriptive and grammatically correct. Consider: 'Input video file not found: {video_path}' and 'Unsupported video file format. Supported formats: .mp4'

Suggested change
assert video_path.suffix.lower() in supported_files, "no supported video file"
assert video_path.suffix.lower() in supported_files, (
f"Unsupported video file format for input: {video_path}. Supported formats: {', '.join(supported_files)}"
)

Copilot uses AI. Check for mistakes.
Comment on lines 59 to 65
new_frame = np.zeros(shape, dtype)

width, height = pil_image.size
log.info(f"Video size: {width}x{height}")
for x in range(0, width):
for y in range(0, height):
new_frame[y, x] = frame_rgb[y, x]
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 55-65 create a new_frame array and manually copy pixels but then discard it by appending np.array(pil_image) instead. This entire block (lines 55-65) serves no purpose and should be removed, keeping only line 66.

Suggested change
new_frame = np.zeros(shape, dtype)
width, height = pil_image.size
log.info(f"Video size: {width}x{height}")
for x in range(0, width):
for y in range(0, height):
new_frame[y, x] = frame_rgb[y, x]
width, height = pil_image.size
log.info(f"Video size: {width}x{height}")

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, new_frame is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done

raise RuntimeError('== key word "prompt" does not exist ==')
prompt_data = create_base_prompt(json_data)
if ("media" in json_data) and ("video" in json_data):
raise ValueError("only one key is avaialble from media & video")
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'avaialble' to 'available'.

Suggested change
raise ValueError("only one key is avaialble from media & video")
raise ValueError("only one key is available from media & video")

Copilot uses AI. Check for mistakes.
vlm_file['media'] = model_utils.resolve_media_file_path(vlm_file.get("media"), args['prompt_file'][0])
if args['prompt_file'] is not None and len(args['prompt_file']) > 0 and 'media' in vlm_file:
if 'video' in vlm_file:
raise ValueError('media and video cannot be specify in a single prompt file')
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected grammar: 'specify' should be 'specified'.

Suggested change
raise ValueError('media and video cannot be specify in a single prompt file')
raise ValueError('media and video cannot be specified in a single prompt file')

Copilot uses AI. Check for mistakes.
parser.add_argument("--vocoder_path", type=str, default=None,
help="Path to vocoder for text to speech scenarios")
parser.add_argument("-vf", "--video_frames", type=int, default=None,
help="controler of video frames to process")
Copy link

Copilot AI Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'controler' to 'controller'.

Suggested change
help="controler of video frames to process")
help="controller of video frames to process")

Copilot uses AI. Check for mistakes.
iter_data_list, pretrain_time, iter_timestamp = CASE_TO_BENCH[model_args['use_case'].task](
model_path, framework, args.device, model_args, args.num_iters, memory_data_collector)
model_path, framework, args.device, model_args, args.num_iters,
memory_data_collector, args.video_frames)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

help="Path to .bin or .pt file with speaker embeddings for text to speech scenarios")
parser.add_argument("--vocoder_path", type=str, default=None,
help="Path to vocoder for text to speech scenarios")
parser.add_argument("-vf", "--video_frames", type=int, default=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need "--video" args parameter to have possibility to run llm bench with video by cmd

new_frame[y, x] = frame_rgb[y, x]
output_frames.append(np.array(pil_image))

if decym_frames is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if decym_frames is None:
if decym_frames is None or int(decym_frames) == 0:
return output_frames

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I suggest to check decym_frames on the step of collecting and analizing input args, so that decym_frames can't be negative or 0

import llm_bench_utils.output_file
import llm_bench_utils.gen_output_data as gen_output_data
import llm_bench_utils.parse_json_data as parse_json_data
import llm_bench_utils.prompt_utils as pu
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import llm_bench_utils.prompt_utils as pu
import llm_bench_utils.prompt_utils as prompt_utils

cap = cv2.VideoCapture(video_path)

output_frames = []
while True:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to read all frames if decym_frames is set ?

Comment on lines 59 to 65
new_frame = np.zeros(shape, dtype)

width, height = pil_image.size
log.info(f"Video size: {width}x{height}")
for x in range(0, width):
for y in range(0, height):
new_frame[y, x] = frame_rgb[y, x]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, new_frame is not used


shape = np.array(pil_image).shape
dtype = np.array(pil_image).dtype
log.info(f"Video shape: {shape}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, log once, otherwise if there are 1000 frames in the video, it will be printed 1000 times

# or decimation factor if negative

decym_frames = int(decym_frames)
if decym_frames > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controler of video frames to process is not very clear, I expected, that it's number of frames and we can just get output_frames[:decym_factor:]
Do we really need get that subsampling ? if yes, let's clarify that it's each n frames in help

if videos:
kwargs["videos"] = videos
prefix = '[warm-up]' if num == 0 else '[{}]'.format(num)
log.info(f'{prefix}[P{prompt_index}] Input image nums:{len(images)}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it under if images and if videos, specify the specific type image/video

if input_data.get("video", None):
entry = Path(input_data["video"])
video_tensor = pu.make_video_tensor(entry, required_frames)
videos.append(video_tensor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.array is not supported for GenAI

Suggested change
videos.append(video_tensor)
videos.append(ov.Tensor(video_tensor))

Comment on lines +60 to +50
if videos:
input_data["videos"] = videos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it

for bs_index, in_text in enumerate(prompts):
llm_bench_utils.output_file.output_input_text(in_text, args, model_precision, prompt_index, bs_index, proc_id)
tok_encode_start = time.perf_counter()
input_data = model.preprocess_inputs(text=prompts[0], image=images[0] if images else None, **processor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
input_data = model.preprocess_inputs(text=prompts[0], image=images[0] if images else None, **processor)
input_data = model.preprocess_inputs(text=prompts[0], image=images[0] if images else None, video=videos[0] if videos else None, **processor)

peterchen-intel and others added 5 commits November 11, 2025 00:23
…2985)

Port openvinotoolkit#2514
1. Enable video preprocessing for Qwen VL model. Add
ov::Property<std::vector<ov::Tensor>> videos{"videos"};
2. Support: mix images and videos input.
3. The main updates for Qwen-VL series models: -- For video: For 2-in-1
merging, if 9 images are input, only 5 images are actually processed.
-- For image: For 2-in-1 merging, we only double each image, so if we
input 9 images, we only actually process 9 images. -- Introduce "`If`"
node, merge video and image preprocess into one OV subgroup.

**tickets**:  CVS-173219

---------

<!-- Keep your pull requests (PRs) as atomic as possible. That increases
the likelihood that an individual PR won't be stuck because of adjacent
problems, merge conflicts, or code review.
Your merged PR is going to appear in the automatically generated release
notes on GitHub. So the clearer the title the better. -->
## Description
<!-- Please include a summary of the change. Also include relevant
motivation and context. -->

<!-- Jira ticket number (e.g., 123). Delete if there's no ticket. -->
CVS-###

<!-- Remove if not applicable -->
Fixes #(issue)

## Checklist:
- [ ] Tests have been updated or added to cover the new code. <!-- If
the change isn't maintenance related, update the tests at
https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or
explain in the description why the tests don't need an update. -->
- [ ] This patch fully addresses the ticket. <!--- If follow-up pull
requests are needed, specify in description. -->
- [ ] I have made corresponding changes to the documentation. <!-- Run
github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml
on your fork with your branch as a parameter to deploy a test version
with the updated content. Replace this comment with the link to the
built docs. -->
…nvinotoolkit#2979)

## Description
This PR updates the preprocessor condition for the deprecated
`ALTERNATE` enum value in `KVCrushAnchorPointMode` to avoid conflicts
with Windows headers and exclude it on all Windows platforms

CVS-175618


## Checklist:
- [ ] Tests have been updated or added to cover the new code - N/A
- [x] This patch fully addresses the ticket
- [ ] I have made corresponding changes to the documentation - N/A
This reverts commit 96d778b.
…rs (openvinotoolkit#2997)

## Description
This is port of openvinotoolkit#2979 to 2025.4 release branch.

This PR updates the preprocessor condition for the deprecated
`ALTERNATE` enum value in `KVCrushAnchorPointMode` to avoid conflicts
with Windows headers and exclude it on all Windows platforms

CVS-175618

## Checklist:
- [ ] Tests have been updated or added to cover the new code - N/A
- [x] This patch fully addresses the ticket
- [ ] I have made corresponding changes to the documentation - N/A
@krzyczar krzyczar force-pushed the kcz/support-for-video-in-benchmark branch from a13f999 to 7b05528 Compare November 12, 2025 10:24
@krzyczar krzyczar force-pushed the kcz/support-for-video-in-benchmark branch 5 times, most recently from f72acc5 to ff2eca8 Compare November 12, 2025 15:27
@krzyczar krzyczar force-pushed the kcz/support-for-video-in-benchmark branch from ff2eca8 to 5dcca63 Compare November 12, 2025 16:34
@krzyczar krzyczar changed the base branch from master to releases/2025/4 November 12, 2025 16:39
@krzyczar krzyczar changed the base branch from releases/2025/4 to releases/2024/5 November 12, 2025 16:41
@krzyczar krzyczar changed the base branch from releases/2024/5 to releases/2025/4 November 12, 2025 16:42
@krzyczar krzyczar marked this pull request as draft November 12, 2025 16:48
@krzyczar krzyczar changed the base branch from releases/2025/4 to master November 12, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: cmake / build Cmake scripts category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: GH Pages Docs Github Pages documentation category: GHA CI based on Github actions category: JS API GenAI JS API category: llm_bench Label for tool/llm_bench folder category: LLM samples GenAI LLM samples category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: RAG RAG pipeline components category: samples dependencies category: sampling Sampling / Decoding algorithms category: Speech generation samples category: Structured Output samples category: tests dependencies category: text streamer category: tokenizers Tokenizer class or submodule update category: visual language Visual language pipeline category: whisper Whisper pipeline category: WWB PR changes WWB no-match-files

Projects

None yet

Development

Successfully merging this pull request may close these issues.