[wwb] Update reranker/embedder tests #2983

sbalandi · 2025-11-07T20:12:37Z

Description

update tests for embedding/reranking pipeline
moved from consin_simularity from torch to sklearn to avoid problems with 1 (item() can cast it to 0.99 or add different umbers after dot)

CVS-###

Fixes #(issue)

Checklist:

Tests have been updated or added to cover the new code.
This patch fully addresses the ticket.
I have made corresponding changes to the documentation.

sbalandi · 2025-11-10T18:14:12Z

@as-suvorov could you please take a look ?

as-suvorov · 2025-11-11T16:26:18Z

tools/who_what_benchmark/tests/test_cli_embeddings.py

+    similarity = get_similarity(outputs)
+    assert similarity >= SIMILARITY_THRESHOLD
+
+    remove_artifacts(outputs_path.as_posix())


Why .as_posix() needed here?

tools/who_what_benchmark/tests/test_cli_embeddings.py

tools/who_what_benchmark/tests/test_cli_reranking.py

as-suvorov · 2025-11-12T17:40:07Z

tools/who_what_benchmark/tests/test_cli_reranking.py

-        subprocess.run(["optimum-cli", "export", "openvino", "--model", model_id, MODEL_PATH, "--task", task, "--trust-remote-code"],
-                       capture_output=True,
-                       text=True)
+def remove_artifacts(artifacts_path: Path, file_type="outputs"):


Use file_type or remove from arguments

Suggested change

def remove_artifacts(artifacts_path: Path, file_type="outputs"):

def remove_artifacts(artifacts_path: Path):

Copilot

Pull Request Overview

This PR updates the embedding and reranking test pipelines, primarily switching from PyTorch's cosine similarity to scikit-learn's implementation to avoid floating-point precision issues that can occur when converting tensors to scalars.

Key changes:

Replaced PyTorch's F.cosine_similarity with sklearn's cosine_similarity for more consistent numerical results
Restructured test files to add similarity threshold checks and improved artifact cleanup
Updated logging format for similarity metrics to use consistent precision formatting

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
whowhatbench/wwb.py	Updated similarity metric logging to use formatted precision (.5) and renamed score list output field
whowhatbench/whowhat_metrics.py	Switched from PyTorch to sklearn for cosine similarity calculation and added `.item()` call for score extraction
tests/test_cli_reranking.py	Refactored test structure with new helper functions, added similarity threshold assertions, and improved artifact cleanup
tests/test_cli_embeddings.py	Added similarity threshold checks, artifact cleanup helper, and additional output validation assertions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-12T17:42:00Z

tools/who_what_benchmark/whowhatbench/whowhat_metrics.py

                if document_idx in prediction_scores:
                    scores_diff = abs(gold_score - prediction_scores[document_idx])
-                per_query_text.append(scores_diff)
+                per_query_text.append(scores_diff.item())


Calling .item() on scores_diff assumes it's a tensor, but scores_diff is set to self.MISSING_DOCUMENT_PENALTY when the document is missing, which is likely a numeric constant (int/float). This will raise an AttributeError. Only call .item() when scores_diff comes from prediction_scores[document_idx].

Suggested change

per_query_text.append(scores_diff.item())

if isinstance(scores_diff, torch.Tensor):

per_query_text.append(scores_diff.item())

else:

per_query_text.append(scores_diff)

Copilot · 2025-11-12T17:42:01Z

tools/who_what_benchmark/whowhatbench/wwb.py

-        logger.info("## Similarity:\n%s\n", e["similarity"])
-        logger.info("## Top_n scores:\n%s\n", e["per_text_score_list"])
+        logger.info(f"## Similarity:\n{e['similarity']:.5}\n")
+        logger.info("## Difference in scores pre texts:\n%s\n", e['per_text_scores_diff'])


Typo in log message: 'pre texts' should be 'per texts'.

Suggested change

logger.info("## Difference in scores pre texts:\n%s\n", e['per_text_scores_diff'])

logger.info("## Difference in scores per texts:\n%s\n", e['per_text_scores_diff'])

github-actions bot added the category: WWB PR changes WWB label Nov 7, 2025

sbalandi force-pushed the qwen-cls-wwb branch from a1c00ca to b148c37 Compare November 7, 2025 20:42

[wwb] Update reranker/embedder tests

8331259

sbalandi force-pushed the qwen-cls-wwb branch from b148c37 to 8331259 Compare November 10, 2025 11:42

sbalandi marked this pull request as ready for review November 10, 2025 11:42

sbalandi requested a review from as-suvorov November 10, 2025 11:42

as-suvorov reviewed Nov 11, 2025

View reviewed changes

fixes

6b18943

sbalandi force-pushed the qwen-cls-wwb branch from c1008b5 to 6b18943 Compare November 12, 2025 14:59

as-suvorov reviewed Nov 12, 2025

View reviewed changes

as-suvorov requested a review from Copilot November 12, 2025 17:41

Copilot AI reviewed Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wwb] Update reranker/embedder tests #2983

[wwb] Update reranker/embedder tests #2983

sbalandi commented Nov 7, 2025

Uh oh!

sbalandi commented Nov 10, 2025

Uh oh!

as-suvorov Nov 11, 2025

Uh oh!

sbalandi Nov 12, 2025

Uh oh!

Uh oh!

Uh oh!

as-suvorov Nov 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Copilot AI Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	def remove_artifacts(artifacts_path: Path, file_type="outputs"):
	def remove_artifacts(artifacts_path: Path):

	logger.info("## Difference in scores pre texts:\n%s\n", e['per_text_scores_diff'])
	logger.info("## Difference in scores per texts:\n%s\n", e['per_text_scores_diff'])

[wwb] Update reranker/embedder tests #2983

Are you sure you want to change the base?

[wwb] Update reranker/embedder tests #2983

Conversation

sbalandi commented Nov 7, 2025

Description

Checklist:

Uh oh!

sbalandi commented Nov 10, 2025

Uh oh!

as-suvorov Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

sbalandi Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

as-suvorov Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants