You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update Llava15ChatHandler to accept use_gpu, image_min_tokens, and image_max_tokens.
Now can pass the`image_min_tokens`parameter in Qwen3VLChatHandler to support bbox grounding tasks.
Add validation to ensure max tokens are not less than min tokens.
if (self.image_max_tokens<self.image_min_tokens) andself.image_max_tokens>0:
2830
+
raiseValueError(f"image_max_pixels {self.image_max_tokens} is less than image_min_pixels {self.image_min_tokens}")
2822
2831
2823
2832
# Initialize mtmd context
2824
2833
self.mtmd_ctx=self._mtmd_cpp.mtmd_init_from_file(
@@ -3791,6 +3800,7 @@ def __init__(
3791
3800
self,
3792
3801
force_reasoning: bool=False,
3793
3802
add_vision_id: bool=True,
3803
+
image_min_tokens: int=-1,
3794
3804
**kwargs,
3795
3805
):
3796
3806
"""
@@ -3801,11 +3811,15 @@ def __init__(
3801
3811
- add_vision_id (bool):
3802
3812
- True (default): Count all the images. Recommended for multi-image.
3803
3813
- False: Doesn't count the images. Can save tokens with single-image.
3814
+
- image_min_tokens (int):
3815
+
It only takes effect when the value is greater than zero. the default value is -1 (i.e., using the default parameters in the model's preprocessor_config.json).
3816
+
Note: Qwen-VL models require at minimum 1024 image tokens to function correctly on bbox grounding tasks
0 commit comments