Commit 82085eb
authored
Fix preemption handling (#524)
This PR fixes multitude of bugs we had in preemption handling:
- Fixed output token update of `CachedRequestState` - was updated twice
per iteration, resulting in doubled tokens - this broke preemption when
request was being re-added to input batch
- Batch preparation now uses input+output tokens in prefill for
preempted sequences (both non-unified and unified attention)
- Preempted sequences now get correctly recognized as prefills after
they exceed their original prefill length (e.g. prompt was 3 tokens,
generated 1024 before preemption - the sequence would get treated as
decode after first 3 tokens)
- Removed some incorrect assumptions about prefills (can have no
pre-existing output tokens)
Scenarios with preemptions yield proper accuracy, as can be tested with
very low `gpu_memory_utilization` and relatively high `max_num_seqs`:
```
PT_HPU_LAZY_MODE=1 VLLM_SKIP_WARMUP=true lm_eval --model vllm --model_args pretrained=/mnt/weka/data/pytorch/llama3.1/Meta-Llama-3.1-8B-Instruct/,enforce_eager=False,dtype=bfloat16,max_num_seqs=128,gpu_memory_utilization=0.05,max_model_len=4096,enable_prefix_caching=True,add_bos_token=false,tensor_parallel_size=1,max_gen_toks=2048 --tasks gsm8k_cot_llama --batch_size auto --trust_remote_code --apply_chat_template --fewshot_as_multiturn --num_fewshot 8
| Tasks |Version| Filter |n-shot| Metric | |Value | |Stderr|
|---------------|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k_cot_llama| 3|flexible-extract| 8|exact_match|↑ |0.8408|± |0.0101|
| | |strict-match | 8|exact_match|↑ |0.8415|± |0.0101|
```
---------
Signed-off-by: Konrad Zawora <kzawora@habana.ai>1 parent f4aeae8 commit 82085eb
File tree
2 files changed
+25
-18
lines changed- vllm_gaudi/v1/worker
2 files changed
+25
-18
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
256 | 256 | | |
257 | 257 | | |
258 | 258 | | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
259 | 268 | | |
260 | 269 | | |
261 | 270 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1489 | 1489 | | |
1490 | 1490 | | |
1491 | 1491 | | |
1492 | | - | |
1493 | 1492 | | |
1494 | 1493 | | |
1495 | 1494 | | |
| |||
1518 | 1517 | | |
1519 | 1518 | | |
1520 | 1519 | | |
1521 | | - | |
1522 | | - | |
1523 | | - | |
1524 | | - | |
1525 | | - | |
| 1520 | + | |
1526 | 1521 | | |
1527 | 1522 | | |
1528 | 1523 | | |
| |||
1678 | 1673 | | |
1679 | 1674 | | |
1680 | 1675 | | |
1681 | | - | |
1682 | | - | |
| 1676 | + | |
| 1677 | + | |
1683 | 1678 | | |
1684 | | - | |
| 1679 | + | |
| 1680 | + | |
1685 | 1681 | | |
1686 | | - | |
| 1682 | + | |
| 1683 | + | |
1687 | 1684 | | |
1688 | 1685 | | |
1689 | 1686 | | |
1690 | | - | |
1691 | | - | |
1692 | | - | |
1693 | | - | |
1694 | | - | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
1695 | 1693 | | |
1696 | 1694 | | |
1697 | 1695 | | |
1698 | 1696 | | |
1699 | | - | |
1700 | | - | |
| 1697 | + | |
| 1698 | + | |
1701 | 1699 | | |
1702 | 1700 | | |
1703 | 1701 | | |
| |||
3331 | 3329 | | |
3332 | 3330 | | |
3333 | 3331 | | |
3334 | | - | |
| 3332 | + | |
3335 | 3333 | | |
3336 | 3334 | | |
3337 | 3335 | | |
| |||
0 commit comments