Commit 51ff37d
authored
[tests] Update lm_eval VL tests to qwen 3 (#1953)
SUMMARY:
Upgrade the lm_eval vision languge tests from Qwen 2.5 to Qwen 3. After
updating to include `apply_chat_template`, the scores closely align with
what was achieved with Qwen 2.5
- [x] switch to `neuralmagic/calibration` dataset, based on suggestion
[here](#1941 (comment)),
to avoid tracing issues related to VL dataset.
- [x] switch to `chartqa` task, to increase number of samples to 500 and
reduce variance in accuracy.
- [x] pruned unused datasets (slimorca and llm_compression_calibration)
TEST PLAN:
The 3 lm_eval VL tests were run, and the accuracies were updated
- vl_fp8_dynamic_per_token.yaml runs in ~29m
- vl_int8_w8a8_dynamic_per_token.yaml runs in ~37m
- vl_w4a16_actorder_weight.yaml runs in ~34m
---------
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>1 parent 1aa196f commit 51ff37d
File tree
6 files changed
+85
-61
lines changed- tests
- e2e
- lmeval
- configs
6 files changed
+85
-61
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
65 | 80 | | |
66 | 81 | | |
67 | 82 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
| 2 | + | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | 10 | | |
12 | | - | |
| 11 | + | |
| 12 | + | |
13 | 13 | | |
14 | | - | |
15 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
16 | 19 | | |
17 | | - | |
18 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
Lines changed: 15 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
| 2 | + | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
7 | | - | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
12 | | - | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
| 15 | + | |
15 | 16 | | |
16 | | - | |
17 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
18 | 22 | | |
19 | | - | |
20 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
| 2 | + | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
7 | | - | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
12 | | - | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
| 15 | + | |
15 | 16 | | |
16 | | - | |
17 | | - | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
18 | 22 | | |
19 | | - | |
20 | | - | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
| |||
160 | 161 | | |
161 | 162 | | |
162 | 163 | | |
| 164 | + | |
163 | 165 | | |
164 | 166 | | |
165 | 167 | | |
| |||
190 | 192 | | |
191 | 193 | | |
192 | 194 | | |
| 195 | + | |
193 | 196 | | |
194 | 197 | | |
195 | 198 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
218 | 218 | | |
219 | 219 | | |
220 | 220 | | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | 221 | | |
236 | 222 | | |
237 | 223 | | |
| |||
246 | 232 | | |
247 | 233 | | |
248 | 234 | | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
254 | | - | |
255 | | - | |
256 | | - | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
264 | | - | |
265 | | - | |
266 | | - | |
267 | | - | |
268 | 235 | | |
269 | 236 | | |
270 | 237 | | |
| |||
285 | 252 | | |
286 | 253 | | |
287 | 254 | | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
288 | 280 | | |
289 | 281 | | |
290 | 282 | | |
| |||
0 commit comments