[GPT-OSS] Load MXFP4 and BF16 weights directly and enable online requantization #992

amishacorns · 2025-10-31T17:40:07Z

Description

Adds MXFP4 load support to enable loading original OpenAI checkpoints, e.g. openai/gpt-oss-120b. Includes another swap axes since the original openai checkpoints come with [output, input] dimensions, while previous BF16 checkpoints have [input, output]. Requantization can be performed on the weights after load with an addition-config passed for both the MXFP4 and BF16 models.

The sharding had to be passed to RMSNorm to avoid Qwix-related issues, as well as wrapping with PartitionSpecs. The expert combine op was also factored out of the the MoE forward directly to allow Qwix to ignore it. The alternative is not using an einsums and filtering the quantized modules to be einsum, but currently if the op is the same type and within the same module, it doesn't seem like Qwix can target that specific op.

Tests

I evaluated the accuracy using scripts/vllm/benchmarking/benchmark_serving.py for MMLU across the four variants, MXFP4, BF16, BF16 requantized (64 size subchannel), and MXFP4 requantized. All give similar accuracy on 10% of the MMLU benchmark.

kyuyeunk · 2025-11-01T03:08:32Z

@bzgoogle can you take a look?

bzgoogle

Thanks Jordan. It LGTM

tpu_inference/models/jax/gpt_oss.py

Signed-off-by: Jordan Dotzel <amishacorns@users.noreply.github.com>

bzgoogle approved these changes Nov 4, 2025

View reviewed changes

jrplatin reviewed Nov 5, 2025

View reviewed changes

tpu_inference/models/jax/gpt_oss.py Outdated Show resolved Hide resolved

tpu_inference/models/jax/gpt_oss.py Show resolved Hide resolved

tpu_inference/models/jax/gpt_oss.py Show resolved Hide resolved

amishacorns force-pushed the gpt-oss-mxfp4-load branch 3 times, most recently from 81954b4 to 2f2730f Compare November 11, 2025 23:51

amishacorns changed the title ~~[GPT-OSS] Load MXFP4 weights directly and dequantize online~~ [GPT-OSS] Load MXFP4 and BF16 weights directly and quantize online Nov 11, 2025

[GPT OSS] Add support for both BF16 and MXFP4

e9f570c

Signed-off-by: Jordan Dotzel <amishacorns@users.noreply.github.com>

amishacorns force-pushed the gpt-oss-mxfp4-load branch 2 times, most recently from 5b44f2f to fb33148 Compare November 12, 2025 03:32

amishacorns added 2 commits November 12, 2025 04:30

[GPT-OSS] Add sharding configs to support Qwix quantization

59029c9

Signed-off-by: Jordan Dotzel <amishacorns@users.noreply.github.com>

[GPT-OSS] Separate expert sum to not quantize with Qwix

b557388

Signed-off-by: Jordan Dotzel <amishacorns@users.noreply.github.com>

amishacorns force-pushed the gpt-oss-mxfp4-load branch from fb33148 to b557388 Compare November 12, 2025 04:31

amishacorns changed the title ~~[GPT-OSS] Load MXFP4 and BF16 weights directly and quantize online~~ [GPT-OSS] Load MXFP4 and BF16 weights directly and enable online requantization Nov 12, 2025

kyuyeunk mentioned this pull request Nov 12, 2025

[Torchax] Add initial support for loading mxfp4 #1080

Merged

jrplatin approved these changes Nov 12, 2025

View reviewed changes

jrplatin merged commit 446464d into vllm-project:main Nov 13, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[GPT-OSS] Load MXFP4 and BF16 weights directly and enable online requantization #992

[GPT-OSS] Load MXFP4 and BF16 weights directly and enable online requantization #992

Uh oh!

amishacorns commented Oct 31, 2025 •

edited

Loading

Uh oh!

kyuyeunk commented Nov 1, 2025

Uh oh!

bzgoogle left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[GPT-OSS] Load MXFP4 and BF16 weights directly and enable online requantization #992

[GPT-OSS] Load MXFP4 and BF16 weights directly and enable online requantization #992

Uh oh!

Conversation

amishacorns commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Uh oh!

kyuyeunk commented Nov 1, 2025

Uh oh!

bzgoogle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amishacorns commented Oct 31, 2025 •

edited

Loading