Commit d9c4ea4
Interleave 8 rows (Q8_0, IQ4_XS) (#178)
* Try interleaving 8 rows for iq4_xs
On Zen4, PP-512 goes up from ~260 t/s to 288 t/s for L3-8B.
TG-128 reaches max. performance at 2 threads and is slightly
higher than 4 interleaved rows (14.48 t/s vs 13.11 t/s @ 2 threads
and 14/28 t/s @ 4 threads).
* Try interleaving 8 iq4_xs rows
It is also faster on AVX2.
This is the NEON implementation. It is tiny bit faster than
4 interleaved rows (~0.5%).
So, this looks like a winner given the Zen4/AVX2 improvement
without associated NEON egression.
* Cleanup
* 8-rows interleaved q8_0 (AVX2)
* 8-rows interleaved q8_0 (Zen4)
* 8-rows interleaved q8_0 (Zen4) - slightly better
PP-512 is now 284 t/s compared to 257 t/s for 4-rows interleaved.
TG-128 reaches peak of 8.16 t/s at just 2 threads compared
to 7.95 t/s @ 4 threads before.
* 8-rows interleaved q8_0 (NEON)
PP-512 is slightly better (138 t/s vs 132.5 t/s), TG-128 is about the
same.
* FA: repack Q8_0 to Q8_0_R8
* Remove special purpose mul_mat_q8_0_r4_q8_1_128 (Zen4)
* FA: repack Q8_0 to Q8_0_R8 (NEON)
Very slightly faster than the general purpose gemm, slightly
slower than the D = 128 special case gemm mul_mat_q8_0_r4_q8_0_128.
Still removing mul_mat_q8_0_r4_q8_0_128 as we simply don't have
enough vector registers to hold 8 interleaved rows, so there is
no point to have the special purpose implementation.
---------
Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>1 parent 814d3e0 commit d9c4ea4
File tree
6 files changed
+437
-431
lines changed- ggml/src
- iqk
- src
6 files changed
+437
-431
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
236 | 236 | | |
237 | 237 | | |
238 | 238 | | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
239 | 244 | | |
240 | 245 | | |
241 | 246 | | |
| |||
534 | 539 | | |
535 | 540 | | |
536 | 541 | | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
541 | 546 | | |
542 | | - | |
| 547 | + | |
543 | 548 | | |
544 | 549 | | |
545 | 550 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
936 | 936 | | |
937 | 937 | | |
938 | 938 | | |
939 | | - | |
940 | 939 | | |
941 | 940 | | |
942 | 941 | | |
| |||
0 commit comments