Commit 1cdf9ff
[Bugfix] fix hang in async scheduling (#4233)
### What this PR does / why we need it?
After #4113, there is no
synchronization between steps. However, in async scheduling with
aclgraph, it is possible that the CPU's record event for the current
iteration completes before the previous iteration's graph execution has
finished.
If cpu is fast enough, device will hang on event_wait in interation i+1
(assume that event_record is executed immediately on update stream of
device):
<img width="1812" height="489" alt="image"
src="https://github.com/user-attachments/assets/373fe655-afe5-4d7d-807e-b0aacf24a543"
/>
after add synchonization, record is launched after graph replay:
<img width="1803" height="466" alt="image"
src="https://github.com/user-attachments/assets/a8a68053-bd7d-49f5-a79c-9a26ef1285cc"
/>
bubble time caused by synchronization is about 85 us on G8600:
<img width="1491" height="804" alt="image"
src="https://github.com/user-attachments/assets/968611ee-f39a-4329-8150-1c4adba25dd1"
/>
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- vLLM version: v0.11.0
- vLLM main:
vllm-project/vllm@2918c1b
---------
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
Co-authored-by: hwhaokun <haokun0405@163.com>1 parent 91b6ba8 commit 1cdf9ff
File tree
2 files changed
+29
-1
lines changed- tests/e2e/singlecard
- vllm_ascend/compilation
2 files changed
+29
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
131 | | - | |
| 131 | + | |
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| |||
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
186 | 186 | | |
187 | 187 | | |
188 | 188 | | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
189 | 195 | | |
190 | 196 | | |
191 | 197 | | |
| |||
0 commit comments