-
Notifications
You must be signed in to change notification settings - Fork 296
[RAG] Test qwen3 reranker #2933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[RAG] Test qwen3 reranker #2933
Conversation
|
Windows tests pass: Linux and ManyLinux fail: Cannot reproduce locally with the same packages versions. |
…nto as/rag_qwen3_reranker_bad_sim
|
Not reproducible any more |
|
The issue seems to be with the padding. On win padding is left, on lin - right. Padding needs to be fixed as left. |
…nto as/rag_qwen3_reranker_bad_sim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for configurable padding side (left/right) in the Qwen3 reranker and includes extensive debugging output for testing. The workflow files are temporarily modified to focus test execution on the specific reranker tests.
- Adds
padding_sideconfiguration parameter toTextRerankPipeline::Config - Adds debug logging throughout the reranking pipeline
- Modifies CI workflows to run only Qwen3 reranker tests with verbose output
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python_tests/test_rag.py | Adds test cases for left/right padding and debug print statements |
| src/python/py_rag.cpp | Exposes padding_side config field to Python bindings |
| src/python/openvino_genai/py_openvino_genai.pyi | Updates type hints to include padding_side field |
| src/cpp/src/rag/text_rerank_pipeline.cpp | Implements padding_side handling and adds debug output |
| src/cpp/src/debug_utils.hpp | Enhances print_tensor utility with full-print option |
| src/cpp/include/openvino/genai/rag/text_rerank_pipeline.hpp | Adds padding_side field to Config struct |
| .github/workflows/windows.yml | Comments out most tests, focuses on Qwen3 reranker |
| .github/workflows/manylinux_2_28.yml | Comments out most tests, focuses on Qwen3 reranker |
| .github/workflows/linux.yml | Comments out most tests, focuses on Qwen3 reranker |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| truncation=True, | ||
| return_tensors="pt", | ||
| ) | ||
| print(inputs["attention_mask"]) |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statement should be removed before merging to production. Consider using proper logging with configurable levels instead.
| print(inputs["attention_mask"]) |
| for opt, genai in zip(opt_result, genai_result): | ||
| print(f"Optimum: {opt}") | ||
| print(f"GenAI: {genai}") | ||
| print("-----") |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statements should be removed before merging to production. Consider using proper logging with configurable levels instead.
| for opt, genai in zip(opt_result, genai_result): | ||
| print(f"Optimum: {opt}") | ||
| print(f"GenAI: {genai}") | ||
| print("-----") |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug print statements should be removed before merging to production. Consider using proper logging with configurable levels instead.
| "top_n=4, padding_side=left", | ||
| "top_n=4, padding_side=right", |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent spacing in test ID. Line 548 uses no space after comma ("top_n=4,padding_side=left"), but this line uses a space. Test IDs should follow consistent formatting.
| "top_n=4, padding_side=left", | |
| "top_n=4, padding_side=right", | |
| "top_n=4,padding_side=left", | |
| "top_n=4,padding_side=right", |
| std::cout << "Start reranking..." << std::endl; | ||
| std::cout << "Query: " << query << std::endl; | ||
| for (size_t i = 0; i < texts.size(); i++) { | ||
| std::cout << "Text[" << i << "]: " << texts[i] << std::endl; | ||
| } |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug output statements should be removed before merging to production. Consider using proper logging mechanisms instead of direct stdout writes.
|
|
||
| print_tensor("scores", scores_tensor); |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug tensor printing should be removed before merging to production. Consider using conditional compilation or logging levels for debug output.
| print_tensor("scores", scores_tensor); | |
| #ifdef DEBUG | |
| print_tensor("scores", scores_tensor); | |
| #endif |
| std::string padding_side_val; | ||
| ov::genai::utils::read_anymap_param(m_tokenization_params, padding_side.name(), padding_side_val); | ||
|
|
||
| std::cout << "Requested padding_side: " << (padding_side_val.empty() ? "not set" : padding_side_val) << std::endl; |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug output statement should be removed before merging to production. Consider using proper logging mechanisms instead of direct stdout writes.
| std::cout << "Requested padding_side: " << (padding_side_val.empty() ? "not set" : padding_side_val) << std::endl; | |
| DEBUG_LOG("Requested padding_side: {}", (padding_side_val.empty() ? "not set" : padding_side_val)); |
| # timeout: 60 | ||
| - name: 'Rag tests' | ||
| cmd: 'python -m pytest -s -v tests/python_tests/test_rag.py' | ||
| cmd: 'pip list; python -m pytest -s -v tests/python_tests/test_rag.py -k test_qwen3_rerank_documents' |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow configuration has been modified to only run specific tests. This change should be reverted before merging to ensure full test coverage in CI.
| # timeout: 60 | ||
| - name: 'Rag tests' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_rag.py' | ||
| cmd: 'python -m pytest -s -v ./tests/python_tests/test_rag.py -k test_qwen3_rerank_documents' |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow configuration has been modified to only run specific tests. This change should be reverted before merging to ensure full test coverage in CI.
| # timeout: 60 | ||
| - name: 'Rag tests' | ||
| cmd: 'python -m pytest -v ./tests/python_tests/test_rag.py' | ||
| cmd: 'pip list && python -m pytest -s -v ./tests/python_tests/test_rag.py -k test_qwen3_rerank_documents' |
Copilot
AI
Nov 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflow configuration has been modified to only run specific tests. This change should be reverted before merging to ensure full test coverage in CI.
|
@apaniukov On linux hf optimum default padding is left, for genai tokenizer default padding is right. |
Description
Ticket:
Fixes #(issue)
Checklist: