Skip to content

Commit 71cd6ae

Browse files
authored
[LM Eval] Update / fix testing (#1992)
SUMMARY: - Bump LM Eval to support chartqa - #1953 landed without updating lm-eval which is required to support chatqa for VL models - Update recovery metrics for two cases to use different thresholds, not 95%
1 parent 3f6cfd6 commit 71cd6ae

File tree

3 files changed

+7
-1
lines changed

3 files changed

+7
-1
lines changed

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ def localversion_func(version: ScmVersion) -> str:
151151
"pytest>=6.0.0",
152152
"pytest-mock>=3.6.0",
153153
"pytest-rerunfailures>=13.0",
154-
"lm_eval==0.4.5",
154+
"lm_eval==0.4.9",
155155
# test dependencies
156156
"beautifulsoup4~=4.12.3",
157157
"cmarkgfm>=2024.1.14",

tests/lmeval/configs/w4a16_actorder_weight.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ recipe: tests/e2e/vLLM/recipes/actorder/recipe_w4a16_actorder_weight.yaml
55
dataset_id: HuggingFaceH4/ultrachat_200k
66
dataset_split: train_sft
77
lmeval:
8+
recovery_threshold:
9+
exact_match,strict-match: 0.94
10+
exact_match,flexible-extract: 0.94
811
metrics:
912
exact_match,flexible-extract: 0.72
1013
exact_match,strict-match: 0.72

tests/lmeval/configs/w4a16_awq_sym.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ recipe: tests/e2e/vLLM/recipes/WNA16/recipe_w4a16_awq_sym.yaml
55
dataset_id: HuggingFaceH4/ultrachat_200k
66
dataset_split: train_sft
77
lmeval:
8+
recovery_threshold:
9+
exact_match,strict-match: 0.92
10+
exact_match,flexible-extract: 0.93
811
metrics:
912
exact_match,flexible-extract: 0.70
1013
exact_match,strict-match: 0.70

0 commit comments

Comments
 (0)