Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
304 commits
Select commit Hold shift + click to select a range
d381eb9
Multi turn benchmark progress bar for synthetic conversation generati…
segevido Nov 11, 2025
2e78150
[CI] Add mergify rules for `nvidia` label (#28417)
mgoin Nov 11, 2025
b30dfa0
[Attention] Refactor CUDA attention backend selection logic (#24794)
MatthewBonanni Nov 11, 2025
7dbe6d8
Fix Fused MoE LoRA Triton kernel bug (#28450)
chaojun-zhang Nov 11, 2025
afffd3c
[Model] Pass `mm_features` directly into `get_mrope_input_positions` …
DarkLight1337 Nov 11, 2025
3380543
Add request timeout override for multi-turn benchmarks (#28386)
segevido Nov 11, 2025
fa19702
[Docs] Fix grammar in CPU installation guide (#28461)
maryamtahhan Nov 11, 2025
a1448b4
[Kernels] Split up fused_moe/layer.py, isolate more modular kernel co…
bnellnm Nov 11, 2025
533b018
[BugFix] Fix Failing Ruff Check (#28469)
jvlunteren Nov 11, 2025
a90ad7d
Add @markmc to CODEOWNERS for Observability (#28457)
markmc Nov 11, 2025
b886068
[BugFix] Fix RuntimeError in PixtralHFAttention on CPU/XPU (#28444)
faaany Nov 11, 2025
3143eb2
[BugFix] Add test_outputs.py to CI pipeline (#28466)
usberkeley Nov 11, 2025
287bbbe
[Doc] Fix typo in serving docs (#28474)
the-codeboy Nov 11, 2025
f9a4087
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel …
mgoin Nov 11, 2025
a7ef3eb
[NIXL] Generalize block-first backend layouts (FlashInfer-like) (#28282)
NickLucche Nov 11, 2025
68c09ef
[Kernel][Perf] fuse QK Norm and RoPE into one cuda kernel for Qwen Mo…
izhuhaoran Nov 11, 2025
05576df
[ROCm][Quantization] extend AMD Quark to support mixed-precision quan…
xuebwang-amd Nov 11, 2025
5a1271d
[Quantization] fix attention quantization of gpt_oss model (#27334)
xuebwang-amd Nov 11, 2025
e553424
[CI/Build] Refactor Attention backend for test_prefix_prefill from xf…
zhewenl Nov 11, 2025
684f254
Prefer FlashAttention MLA as default over FlashMLA (#27363)
MatthewBonanni Nov 11, 2025
6c3c0f8
[Kernel] Optimize rms_norm kernel (#27931)
xyang16 Nov 11, 2025
d5edcb8
[BugFix] Fix Siglip2Attention on XPU (#28448)
faaany Nov 11, 2025
76e4dcf
[Misc] Remove unused attention prefix prefill ops functions (#26971)
lgeiger Nov 11, 2025
4228be7
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhea…
Jialin Nov 11, 2025
de120bc
[V0 deprecation] Clean up num_prefill_tokens logic for V0 (#28203)
gcanlin Nov 11, 2025
8c32c6e
[Misc] fix typo in DCP comment (#28389)
Livinfly Nov 11, 2025
9d1c474
[LoRA][1/N]Remove LoRA extra vocab (#28382)
jeejeelee Nov 11, 2025
df4d3a4
[TPU] Rename path to tpu platform (#28452)
kyuyeunk Nov 11, 2025
d4902ba
[Misc] Cleanup Executor interface (#28441)
wangxiyuan Nov 11, 2025
28534b9
Add Zurich vLLM Meetup (#28488)
mgoin Nov 11, 2025
e5f599d
[Bugfix] Disable shared expert overlap if Marlin MoE is used (#28410)
mgoin Nov 11, 2025
412e153
[Feature] Allow configuring FlashInfer workspace size (#28269)
maxyanghu Nov 11, 2025
d235395
Use FLASHINFER MLA backend when testing fp8_kv_scale_compile (#28491)
adabeyta Nov 12, 2025
1788aa1
[BugFix] Graceful handling of torch symm mem errors. (#27671)
ilmarkov Nov 12, 2025
48c8793
[Frontend] Change CompilationMode to a proper Enum (#28165)
gmagogsfm Nov 12, 2025
3f770f4
[Performance] Cache loaded custom logitsprocs to avoid overheads (#28…
Isotr0py Nov 12, 2025
e171039
[[V0 deprecation]]Remove VLLM_USE_V1 env (#28204)
wangxiyuan Nov 12, 2025
7f829be
[CPU] Refactor CPU attention backend (#27954)
bigPYJ1151 Nov 12, 2025
9f0247c
`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611)
AndreasKaratzas Nov 12, 2025
cbb799e
[Model][Qwen3VL] Simplify `get_mrope_input_positions` using numpy (#2…
lgeiger Nov 12, 2025
4ccffe5
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation (#…
fake0fan Nov 12, 2025
b9ce9a3
[BugFix] Add fallback path in `apply_rotary_pos_emb_flashattn` for no…
faaany Nov 12, 2025
f31419e
[Benchmark] Add retry support to fix workload bias in multi-turn benc…
ai-jz Nov 12, 2025
ac0bb2c
[Core] Cache `vllm_is_batch_invariant` (#28304)
lgeiger Nov 12, 2025
91864b7
[CI/Build] Fix crash due to removed VLLM_USE_V1 attribute in EPD (#28…
fake0fan Nov 12, 2025
c748355
[CI] Introduce autorun_on_main feature (#27836)
hl475 Nov 12, 2025
1761dea
[BugFix]: --enable-lora with model granite-4.0-micro crash (#27733)
yyzxw Nov 12, 2025
d3ade61
[Model] fix glm4_moe_mtp load weights with GLM-4.6 checkpoint. (#27597)
wuyaoxuehun Nov 12, 2025
a4730c1
[XPU]Fix crash due to removed VLLM_USE_V1 attribute (#28520)
chaojun-zhang Nov 12, 2025
d143152
[KVConnector] Enable get_block_ids_with_load_errors() in LMCache conn…
ziruiliu Nov 12, 2025
c5f10cc
add cpu option for p/d in nixl_connector (#28356)
ZhengHongming888 Nov 12, 2025
edb59a9
[ROCm] [Bugfix] Fix `fused_qknorm_rope_kernel` rocm compatibility (#2…
tjtanaa Nov 12, 2025
a9d18b5
[Bugfix] Fix gpt_oss packed_modules_mapping (#28536)
jeejeelee Nov 12, 2025
10138c9
[V0 deprecation] Deprecate use_v1 parameter (#28112)
wangxiyuan Nov 12, 2025
54aecd9
Fix pre-commit (and XPU) on `main` (#28556)
hmellor Nov 12, 2025
f76e85c
[Performance][Hopper] Avoid M dim padding to 4x for most cases (due t…
alexm-redhat Nov 12, 2025
bc5bd45
[Refactor] Remove redundant TP gather/split in split_qkv in QwenVL (#…
gcanlin Nov 12, 2025
728a9eb
[Misc] Refactor Attention kv transfer methods into decorator (#27816)
NickLucche Nov 12, 2025
a742134
Remove deprecated fields from `CompilationConfig` (#27593)
hmellor Nov 12, 2025
3044195
[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec…
benchislett Nov 12, 2025
bac9045
Implement ARC KV cache eviction policy for CPU offloader (#27039)
albertoperdomo2 Nov 12, 2025
a1e7fa3
[EPLB][ROCm]: support EPBL for ROCm backend (#27731)
PerryZhang01 Nov 12, 2025
64d57c3
[Model] [Config] Correctly identify granite-4.0-micro as non-hybrid m…
tdoublep Nov 12, 2025
319abd5
Remove dynamic shape
ilmarkov Nov 12, 2025
a39dd7b
[CI] Skip "Multi-Modal Models Test (Extended) 3" test that's broken i…
hmellor Nov 12, 2025
94a9ebc
[KV connector][WIP] KV cache proxy based on LMCache multi-process mod…
ApostaC Nov 12, 2025
58ce8d1
[BugFix] Priority scheduling and spec tokens preemption (#28558)
andylolu2 Nov 12, 2025
478ee51
[Misc]Fix typo in llm_engine.py (#28584)
frank-wei Nov 12, 2025
74a9a9f
[Performance][B200] Fix deepgemm prologue (#27897)
varun-sundar-rabindranath Nov 12, 2025
d8140b9
[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel na…
vllmellm Nov 12, 2025
3eb0c26
[TPU] Support GCS path in VLLM_TORCH_PROFILER_DIR (#28487)
QiliangCui Nov 12, 2025
10f01d5
[Bugfix] Adjust Marlin CUDA arch selection to 8.0+PTX;9.0+PTX (#28294)
mgoin Nov 12, 2025
4ca5cd5
[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#1…
HollowMan6 Nov 12, 2025
69d0e90
[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406)
alexm-redhat Nov 12, 2025
51c599f
Skip models that cannot currently init on Transformers v5 (#28471)
hmellor Nov 12, 2025
52eadce
[Docs] Update meetups.md description (#28583)
mgoin Nov 13, 2025
d75ad04
[ROCm][Bugfix] Revert removing setuptools version restriction (#28592)
gshtras Nov 13, 2025
2dacd57
[platform] Move get_cu_count to utils (#27005)
wangxiyuan Nov 13, 2025
a543e67
[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support…
mgoin Nov 13, 2025
8832fff
[BugFix] Fix `mm_encoder_attn_backend` arg type checking (#28599)
njhill Nov 13, 2025
3226283
[Docs] Add some details about what the MoE block needs for the Transf…
hmellor Nov 13, 2025
97d1c99
Rename clashing method names for vLLM model protocol (#27583)
hmellor Nov 13, 2025
a1d3866
[n-gen] DO NOT repeatedly return finished child requests (#28591)
Jialin Nov 13, 2025
7c38ed0
[Frontend] split append tool output (#28333)
qandrew Nov 13, 2025
1a0b157
[Frontend][responsesAPI][1/n] convert responses API tool input to cha…
qandrew Nov 13, 2025
7dca0c9
[BugFix][ROCm] Fix `get_cu_count` missing variable error (#28608)
ganyi1996ppo Nov 13, 2025
dbbe0c7
[XPU] Support Triton path for LoRA operations on XPU (#28511)
faaany Nov 13, 2025
7e082bc
Support DeepEP for Kimi-k2-thinking through enabling gemm selection f…
luccafong Nov 13, 2025
d44fbba
[build][cmake]: Bundle static ACL and torch libgomp for CPU extension…
Radu2k Nov 13, 2025
ca00b1b
[ROCm][BugFix] Remove the usage of `device_info` from aiter (#28383)
ganyi1996ppo Nov 13, 2025
4504e80
[Bugfix] Prevent crash on empty grammar string (#28210)
tjandy98 Nov 13, 2025
c33b87e
Use official xformers-0.0.33 built for PT 2.9 (#28600)
huydhn Nov 13, 2025
4ab34f6
Add NUMA node validation for CPU thread binding (#28555)
usberkeley Nov 13, 2025
fa183e9
[Bugfix] fix kimi-linear crash (#28445)
ZJY0516 Nov 13, 2025
5c9ad13
[Frontend] supports interleaved thinking (#28531)
chaunceyjiang Nov 13, 2025
11ac9dd
Support all interleaved layer types (#28485)
sarckk Nov 13, 2025
d168de0
Make ranges inclusive-inclusive
ilmarkov Nov 13, 2025
e63fd44
Fix: Correctly filter special tokens in benchmark_prefix_caching (#28…
dw2761 Nov 13, 2025
5e97320
[BugFix] Fix type error when assign a trition kernel tensor to a torc…
liuzijing2014 Nov 13, 2025
c428e8d
Fix io processor pooling #28273 (#28484)
baonudesifeizhai Nov 13, 2025
c47b6c8
[XPU] add sym params to IPEXConfig (#28611)
zufangzhu Nov 13, 2025
c9fe6ab
[Bugfix] Fix FPS value type for Qwen2.5-Omni video processing (#28630)
faaany Nov 13, 2025
86d15bf
[Hardware][PowerPC] Fix fp16 compilation error for Power in cpu atten…
Akashcodes732 Nov 13, 2025
8da2f28
[ROCm][BugFix]Fix `get_cu_count` in rocm_aiter_fa.py (#28618)
ganyi1996ppo Nov 13, 2025
a7791ea
[CI/Build] Install uv for AMD MI300: Language Models Tests (Hybrid) %…
amdfaa Nov 13, 2025
07a606a
[CI Failure] Fix backend selection for encoder-only models (#28534)
hl475 Nov 13, 2025
3035d1a
[BugFix] DeepSeek-OCR: apply NoRepeatNGramLogitsProcessor to greedy p…
YuanpingSong Nov 13, 2025
b230286
Fix `get_num_experts` when config sets it explicitly to `None` (#28652)
hmellor Nov 13, 2025
d338775
[Misc] Turn off encoder torch compile by default (#28634)
ywang96 Nov 13, 2025
06c4873
Rewrite C++ meta funcs to Python (#28595)
janeyx99 Nov 13, 2025
327c0a9
[BugFix] Ensure `EngineArgs.create_engine_config` is idempotent (#28515)
njhill Nov 13, 2025
fdfd507
[TPU] patch TPU wheel build script to resolve metadata issue (#27279)
jcyang43 Nov 13, 2025
fe1cd77
[Performance][B200] silu_mul_quant: pack scales in int32 (#28358)
varun-sundar-rabindranath Nov 13, 2025
119c492
[Bugfix] Fix validate model input for decoder models (#27099)
yannicks1 Nov 13, 2025
f9f3b59
[Attention][Bugfix] Fix FA sink support (#28660)
MatthewBonanni Nov 13, 2025
5d6ce2b
[Perf] Support stream interval for reducing host overhead (#27869)
elvischenv Nov 13, 2025
968060c
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long c…
pisceskkk Nov 13, 2025
262d263
[Bugfix] Eliminate tuple inputs to submodules in graph partitioning (…
gmagogsfm Nov 13, 2025
faed7bf
[Bugfix] [CPU] bump torch to 2.9.0 for Darwin to fix segmentation fau…
kebe7jun Nov 13, 2025
1b622de
[Misc] Update CODEOWNERS for simon-mo and comaniac (#28675)
simon-mo Nov 13, 2025
e64011f
[CI] Bug: Fix ci entrypoint pooling (#28684)
yewentao256 Nov 13, 2025
6e25b1c
[KV Connector] Test async mode in scheduler tests (#28550)
markmc Nov 13, 2025
f2b8e1c
Mirrored test group definitions for AMD (2025-11-11) (#28573)
Alexei-V-Ivanov-AMD Nov 14, 2025
4d5943b
[quantization][config] enable override existing quant_config (#28510)
ILikeIneine Nov 14, 2025
2aa75c7
[ROCm] Bump up the version of amd-smi to 6.4.3 (#28680)
SageMoore Nov 14, 2025
622e610
[CPU][Bugfix] Fix Apple Silicon M1 compilation failure (#28681)
mgoin Nov 14, 2025
b39a502
[ci][amd] fix basic models extra init test (#28676)
bradleyhd Nov 14, 2025
01bea11
[Misc] Remove `warn_for_unimplemented_methods` (#28613)
DarkLight1337 Nov 14, 2025
da14ae0
[XPU][CI]disable lm cache uts (#28696)
jikunshang Nov 14, 2025
0aecd91
[Misc] Update xformers to 0.33.0.post1 (#28678)
ywang96 Nov 14, 2025
0b25498
[Misc] add ignore mapper for quark quantization (#28275)
haoyangli-amd Nov 14, 2025
15ae8e0
[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_i…
rasmith Nov 14, 2025
9310357
[BugFix][CI/Build][ROCM] Fix import error and apply assert in appropr…
rasmith Nov 14, 2025
529cea3
use default CCL_ZE_IPC_EXCHANGE (#28700)
yma11 Nov 14, 2025
b65e752
Merge branch 'main' into imarkov/conditional_compilation_ranges
ilmarkov Nov 14, 2025
c36bcfe
[Bugfix] fix dots.ocr pp support (#28705)
ZJY0516 Nov 14, 2025
bc3e430
[BugFix] Fix multi-modal async scheduling race condition (#28706)
njhill Nov 14, 2025
c9a3a02
Add output token counting to gsm8k eval (#28594)
mgoin Nov 14, 2025
fd75d3e
[Minor] avoid register new custom and just import silly_attn (#28578)
BoyuanFeng Nov 14, 2025
8cfbe89
[Misc] fix comment in test_envs (#28529)
xingliu14 Nov 14, 2025
ecf8230
[Metrics] Log number of preempted requests (#28522)
610lyn Nov 14, 2025
360bd87
[Frontend] Added chat-style multimodal support to /classify. (#27516)
WorldExplored Nov 14, 2025
41b92f7
[Model][MM] Extract conv layer as CustomOp (#28455)
shen-shanshan Nov 14, 2025
4516d44
[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (…
gjc0824 Nov 14, 2025
9324e10
Fix KV sharing fast prefill with cudagraph enabled (#28537)
sarckk Nov 14, 2025
db56a59
[BugFix] Fix FA3 IMA with FULL_AND_PIECEWISE and cascade attention (d…
LucasWilkinson Nov 14, 2025
8d3748d
[Doc] Fix macOS installation dependency resolution issue (#26721)
shahfasal Nov 14, 2025
433c0f8
[Model] Fix bailing_moe accuracy problem (#28277)
zhaozx-cn Nov 14, 2025
96b23b8
[Bugfix][Nixl] Fix kernel physical<>logical block_size issue (#28677)
NickLucche Nov 14, 2025
511a6b6
[Config] Clean up SchedulerConfig initialization (#28665)
DarkLight1337 Nov 14, 2025
3f8a874
[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#2…
djmmoss Nov 14, 2025
c934cae
[Fix] improve aspect ratio in dummy image generation and add common …
dongbo910220 Nov 14, 2025
5f3cd7f
[Docs] Update the name of `Transformers backend` -> `Transformers mod…
hmellor Nov 14, 2025
d54a18a
[CI][CPU] Smoke test for Apple Silicon using GHA MacOS runner (#28688)
mgoin Nov 14, 2025
6f1e7f7
[DisaggEverything] Tokens in<>out `/generate` endpoint (#24261)
NickLucche Nov 14, 2025
8cc40f8
[Attention] Bump FA for removed method (#28429)
MatthewBonanni Nov 14, 2025
a17e36f
Fix typo in comment: existance -> existence (#28737)
OthmanMohammad Nov 14, 2025
0854248
Remove audio optional dependency for mistral-common (#28722)
juliendenize Nov 14, 2025
cdd7025
[kernel] Improve FP8 PTPC on Hopper for larger shapes (#28692)
czhu-cohere Nov 14, 2025
9261eb3
docs(lora_resolvers): clarify multi-resolver order and storage path r…
wangchen615 Nov 14, 2025
964d65d
LLaMA4 LoRA Adapter Enablement (#28602)
kfhfar Nov 14, 2025
a425dc2
[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with to…
tjtanaa Nov 14, 2025
6718755
[Docs] Enable some more markdown lint rules for the docs (#28731)
hmellor Nov 14, 2025
e2741f6
[Chore] Rename `SchedulerConfig.chunked_prefill_enabled` (#28735)
DarkLight1337 Nov 14, 2025
cec275e
[Bugfix] resolve Qwen3-VL GPTQModel quantized model loading failure (…
GuanH Nov 14, 2025
fd45550
[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728)
halyavin Nov 14, 2025
8977ffb
[ROCm][Bugfix] Fix compilation errors with fused_qknorm_rope_kernel.c…
SageMoore Nov 14, 2025
f08eab2
[CI] Fix macos smoke test uv cache issue (#28736)
mgoin Nov 14, 2025
0de4f21
[Bugfix] TypeError: 'NoneType' object is not callable (#27410)
mostrowskix Nov 14, 2025
5a84b76
[ROCm][CI/Build] Change install location of uv (#28741)
gshtras Nov 14, 2025
2e0ad62
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch…
laithsakka Nov 14, 2025
e5c7895
[Bugfix] Fix incorrect use of hidden_states for shared_experts due to…
alexm-redhat Nov 14, 2025
bf3ffb6
[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739)
benchislett Nov 14, 2025
e0c910b
[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 …
tdoublep Nov 14, 2025
ba041d9
[Log] Save profiler results to file instead of stdout (#28144)
rasmith Nov 14, 2025
75f01b9
[ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main (#28753)
gshtras Nov 14, 2025
58e61e5
[Test] Rework e2e async scheduling tests (#28744)
njhill Nov 15, 2025
186352b
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] f…
Jialin Nov 15, 2025
9fc81ec
[TPU] Fix import error in tpu launch (#28758)
QiliangCui Nov 15, 2025
f05d474
[Model][Qwen3VL] Use `mm_position` to compute mrope positions (#28730)
lgeiger Nov 15, 2025
edfe498
[Bugfix] Build hadacore kernels on >SM90 (#28748)
mgoin Nov 15, 2025
ac86bff
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list…
njhill Nov 15, 2025
363aaee
Fix IntermediateTensors initialization and add type hints (#28743)
OthmanMohammad Nov 15, 2025
c9e6658
[NIXL] heterogeneous block_size support (#26759)
xuechendi Nov 15, 2025
6965ef4
[Performance][DeepGEMM] Estimate expected_m (#28694)
varun-sundar-rabindranath Nov 15, 2025
98b4d38
[Redo] #26368 (#28771)
DarkLight1337 Nov 15, 2025
dd6ac1c
[RL] [V1] Remove unused device argument from reset_kv_cache (#28766)
zhuohan123 Nov 15, 2025
74b5267
Use narrow over indexing in `hadacore_transform` to prep for ABI stab…
janeyx99 Nov 15, 2025
1ec978c
[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#2…
zhewenl Nov 15, 2025
638e419
[Misc] Make `SchedulerConfig.max_model_len` init-only (#28733)
DarkLight1337 Nov 15, 2025
173b356
[PERF] Remove TRTLLM Gen attn kernel limitation `max_seq_len <=131072…
vadiklyutiy Nov 15, 2025
f36292d
[compile] Enable sequence parallelism matching w/o custom ops enabled…
angelayi Nov 15, 2025
cb15ee2
Allow Gemma3 to take image embeddings (#28483)
tingtingtangmeta Nov 15, 2025
89d3679
[Doc] Fix failing doc build (#28772)
DarkLight1337 Nov 15, 2025
085a525
[Model] Fix lmhead init bug of bailing_moe (#28777)
hwhaokun Nov 15, 2025
e439c78
Add support for Eagle with separate lm-head and embed_tokens layers (…
eldarkurtic Nov 15, 2025
637f292
[CI] Fix broken pipeline (#28781)
njhill Nov 15, 2025
07cadab
[Model][Qwen3VL] Cache positional embedding indices (#28475)
lgeiger Nov 15, 2025
2bb4435
[Doc]: fix typos in various files (#28567)
didier-durand Nov 15, 2025
be263f7
[BugFix] Fix `AssertionError: DCP not support reorder_batch_threshold…
LucasWilkinson Nov 15, 2025
f849ee7
Adding a benchmark for batch invariance (#28161)
bwasti Nov 16, 2025
d231876
[Benchmark] Fix client seed synchronization in multi-turn benchmark (…
ai-jz Nov 16, 2025
a55b646
[Model] Allow users to control skip reading cache per request. (#28194)
noooop Nov 16, 2025
b316ac6
[V1] Support MP Executor for multi node distributed inference (#23691)
luccafong Nov 16, 2025
af02c40
Fixed gpt-oss _load_weights_other() parameter position bug (#28715)
River12 Nov 16, 2025
3bc1175
[Bugfix] Fix host and port join for ipv6 in bench serve (#28679)
scottzh8 Nov 16, 2025
8d259fa
Fix gpt oss weight loading with EP + bf16 (#28765)
ashors1 Nov 16, 2025
63fed55
[Doc]: fix typos in various files (#28811)
didier-durand Nov 16, 2025
ac1daf3
fix comment typo (#28802)
andyxning Nov 16, 2025
5a87076
[Model][QwenVL] Optimize `Qwen2_5_VisionAttention` q,k preparation (#…
lgeiger Nov 16, 2025
03ee481
Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261)
amirkl94 Nov 16, 2025
80b6080
[BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
njhill Nov 16, 2025
561253b
[Performance][Fix] update nvfp4 code to support renorm routing (#28569)
jiahanc Nov 17, 2025
d64429b
[NIXL][XPU] update install script of NIXL (#28778)
zhenwei-intel Nov 17, 2025
60e089f
[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670)
sammysun0711 Nov 17, 2025
6f37419
[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser str…
jscaldwell55 Nov 17, 2025
3380ed5
[Doc] Add llama4 LoRA tag (#28825)
jeejeelee Nov 17, 2025
577bb34
[CPU][Bugfix] Fix _to_list in CPU model runner (#28824)
bigPYJ1151 Nov 17, 2025
ab01cd1
[BugFix] Fix glm4_moe_mtp load weights bug (#28805)
wuyaoxuehun Nov 17, 2025
d4acf51
[Metrics] Fix KV cache usage percent metric multiproc (#28792)
jaywonchung Nov 17, 2025
1b82fb0
[XPU] work around for sp, avoid custom op import error (#28822)
jikunshang Nov 17, 2025
64e39d6
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg (#28315)
LucasWilkinson Nov 17, 2025
7f06449
[Bugfix][Perf] Revert applying HF processor on text-only inputs for m…
ywang96 Nov 17, 2025
e42bd8c
Cast return value to int64_t for cache size (#28814)
tiehexue Nov 17, 2025
f8b19c0
[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816)
zhewenl Nov 17, 2025
d8874c6
[Core] Async Scheduling X Spec Decoding Compatibility (#24799)
Ronald1995 Nov 17, 2025
7765e5b
[BugFix] Fix PP performance and PP kv connector output regression (#…
njhill Nov 17, 2025
95ae50b
[Quantization] [Eagle] Add complete quantization support to the draft…
shreyas269 Nov 17, 2025
a289cc1
[Test] Batch Invariant: Rename and organize tests (#27421)
yewentao256 Nov 17, 2025
f77bce0
[Model] Add Afmoe architecture implementation (#28332)
pranav4501 Nov 17, 2025
6148584
[BugFix] Corner case that could cause out-of-sync with external launc…
bangshengtang Nov 17, 2025
552cac9
[Misc] Fix wrong comment in scheduler (#28880)
zhuohan123 Nov 17, 2025
b6e0439
[Bugfix] Fix Kimi-K2 tool parser concatenated tool calls parsing (#28…
bbartels Nov 18, 2025
88ab591
Run macos smoke test workflow on main commit (#28752)
mgoin Nov 18, 2025
d0a7362
[ROCm][Quantization] add apply_vllm_mapper in quark config for models…
xuebwang-amd Nov 18, 2025
3ddcf46
[Refactor] Remove Unused Func in Batch Invariant (#28881)
yewentao256 Nov 18, 2025
bf9e1e8
[Bugfix] Fix wrong CLI defaults for dynamic `SchedulerConfig` fields …
DarkLight1337 Nov 18, 2025
083cf32
[Doc]: fix typos in various files (#28863)
didier-durand Nov 18, 2025
0168f69
[Misc] Remove unnecessary parentheses from log statements (#28897)
andyxning Nov 18, 2025
5bdd155
[CI] Fix async scheduling + spec decoding test flake (#28902)
njhill Nov 18, 2025
5bb1da5
[MISC] Remove format.sh (#28906)
KuntaiDu Nov 18, 2025
896e41a
[CI/Build] Replace wikipedia url with local server ones (#28908)
Isotr0py Nov 18, 2025
4393684
[BugFix] Fix PP/async scheduling with pooling models (#28899)
njhill Nov 18, 2025
285eaa4
[Bugfix] Safeguard against missing backend in AttentionBackendEnum (#…
jesse996 Nov 18, 2025
b9489f5
[Model][Perf] Use cos and sin cache in QwenVL (#28798)
gcanlin Nov 18, 2025
184b12f
[Bugfix][NIXL] Fix `block_size_ratio` when logical !=physical blocks …
NickLucche Nov 18, 2025
f6aa122
[CI Sprint] Quantization CI Cleanup (#24130)
killershrimp Nov 18, 2025
49a986e
[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937)
segevido Nov 18, 2025
c261237
[Model] Add Gemma3 GGUF multimodal support (#27772)
lucianommartins Nov 18, 2025
af10400
Merge branch 'main' into imarkov/conditional_compilation_ranges
ilmarkov Nov 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .buildkite/release-pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ steps:
queue: cpu_queue_postmerge
commands:
- "aws ecr-public get-login-password --region us-east-1 | docker login --username AWS --password-stdin public.ecr.aws/q9t5s3a7"
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg GIT_REPO_CHECK=1 --build-arg VLLM_CPU_AVX512BF16=true --build-arg VLLM_CPU_AVX512VNNI=true --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest --progress plain --target vllm-openai -f docker/Dockerfile.cpu ."
- "DOCKER_BUILDKIT=1 docker build --build-arg max_jobs=16 --build-arg GIT_REPO_CHECK=1 --build-arg VLLM_CPU_AVX512BF16=true --build-arg VLLM_CPU_AVX512VNNI=true --build-arg VLLM_CPU_AMXBF16=true --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version) --tag public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest --progress plain --target vllm-openai -f docker/Dockerfile.cpu ."
- "docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:latest"
- "docker push public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:$(buildkite-agent meta-data get release-version)"
env:
Expand Down
18 changes: 7 additions & 11 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ while true; do
fi
done

echo "--- Pulling container"
echo "--- Pulling container"
image_name="rocm/vllm-ci:${BUILDKITE_COMMIT}"
container_name="rocm_${BUILDKITE_COMMIT}_$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10; echo)"
docker pull "${image_name}"
Expand All @@ -78,17 +78,13 @@ HF_MOUNT="/root/.cache/huggingface"
commands=$@
echo "Commands:$commands"

if [[ $commands == *"pytest -v -s basic_correctness/test_basic_correctness.py"* ]]; then
commands=${commands//"pytest -v -s basic_correctness/test_basic_correctness.py"/"VLLM_USE_TRITON_FLASH_ATTN=0 pytest -v -s basic_correctness/test_basic_correctness.py"}
fi
commands=${commands//"pytest -v -s basic_correctness/test_basic_correctness.py"/"pytest -v -s basic_correctness/test_basic_correctness.py"}

if [[ $commands == *"pytest -v -s models/test_registry.py"* ]]; then
commands=${commands//"pytest -v -s models/test_registry.py"/"pytest -v -s models/test_registry.py -k 'not BambaForCausalLM and not GritLM and not Mamba2ForCausalLM and not Zamba2ForCausalLM'"}
fi

if [[ $commands == *"pytest -v -s compile/test_basic_correctness.py"* ]]; then
commands=${commands//"pytest -v -s compile/test_basic_correctness.py"/"VLLM_USE_TRITON_FLASH_ATTN=0 pytest -v -s compile/test_basic_correctness.py"}
fi
commands=${commands//"pytest -v -s compile/test_basic_correctness.py"/"pytest -v -s compile/test_basic_correctness.py"}

if [[ $commands == *"pytest -v -s lora"* ]]; then
commands=${commands//"pytest -v -s lora"/"VLLM_ROCM_CUSTOM_PAGED_ATTN=0 pytest -v -s lora"}
Expand Down Expand Up @@ -181,13 +177,13 @@ if [[ -z "$render_gid" ]]; then
exit 1
fi

# check if the command contains shard flag, we will run all shards in parallel because the host have 8 GPUs.
# check if the command contains shard flag, we will run all shards in parallel because the host have 8 GPUs.
if [[ $commands == *"--shard-id="* ]]; then
# assign job count as the number of shards used
commands=${commands//"--num-shards= "/"--num-shards=${PARALLEL_JOB_COUNT} "}
# assign job count as the number of shards used
commands=$(echo "$commands" | sed -E "s/--num-shards[[:blank:]]*=[[:blank:]]*[0-9]*/--num-shards=${PARALLEL_JOB_COUNT} /g" | sed 's/ \\ / /g')
for GPU in $(seq 0 $(($PARALLEL_JOB_COUNT-1))); do
# assign shard-id for each shard
commands_gpu=${commands//"--shard-id= "/"--shard-id=${GPU} "}
commands_gpu=$(echo "$commands" | sed -E "s/--shard-id[[:blank:]]*=[[:blank:]]*[0-9]*/--shard-id=${GPU} /g" | sed 's/ \\ / /g')
echo "Shard ${GPU} commands:$commands_gpu"
echo "Render devices: $BUILDKITE_AGENT_META_DATA_RENDER_DEVICES"
docker run \
Expand Down
5 changes: 3 additions & 2 deletions .buildkite/scripts/hardware_ci/run-cpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ function cpu_tests() {
# Run kernel tests
docker exec cpu-test-"$NUMA_NODE" bash -c "
set -e
pytest -x -v -s tests/kernels/attention/test_cpu_attn.py
pytest -x -v -s tests/kernels/test_onednn.py"

# Run basic model test
Expand Down Expand Up @@ -76,7 +77,7 @@ function cpu_tests() {
# Run AWQ test
# docker exec cpu-test-"$NUMA_NODE" bash -c "
# set -e
# VLLM_USE_V1=0 pytest -x -s -v \
# pytest -x -s -v \
# tests/quantization/test_ipex_quant.py"

# Run multi-lora tests
Expand Down Expand Up @@ -116,4 +117,4 @@ function cpu_tests() {

# All of CPU tests are expected to be finished less than 40 mins.
export -f cpu_tests
timeout 2h bash -c "cpu_tests $CORE_RANGE $NUMA_NODE"
timeout 2.5h bash -c "cpu_tests $CORE_RANGE $NUMA_NODE"
2 changes: 1 addition & 1 deletion .buildkite/scripts/hardware_ci/run-xpu-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,6 @@ docker run \
pytest -v -s v1/worker --ignore=v1/worker/test_gpu_model_runner.py
pytest -v -s v1/structured_output
pytest -v -s v1/spec_decode --ignore=v1/spec_decode/test_max_len.py --ignore=v1/spec_decode/test_tree_attention.py --ignore=v1/spec_decode/test_speculators_eagle3.py
pytest -v -s v1/kv_connector/unit --ignore=v1/kv_connector/unit/test_multi_connector.py --ignore=v1/kv_connector/unit/test_nixl_connector.py --ignore=v1/kv_connector/unit/test_shared_storage_connector.py
pytest -v -s v1/kv_connector/unit --ignore=v1/kv_connector/unit/test_multi_connector.py --ignore=v1/kv_connector/unit/test_nixl_connector.py --ignore=v1/kv_connector/unit/test_shared_storage_connector.py --ignore=v1/kv_connector/unit/test_lmcache_integration.py
pytest -v -s v1/test_serial_utils.py
'
Loading