[`model_free_ptq`] NVFP4A16 #1988

kylesayrs · 2025-11-03T16:27:35Z

Purpose

Support NVFP4A16 for model_free_ptq

llmcompressor.reindex_fused_weights \
    unsloth/Kimi-K2-Thinking-BF16 \
    Kimi-K2-Thinking-BF16-reindexed \
    --num_workers=10

model_free_ptq(
    model_stub="Kimi-K2-Thinking-BF16-reindexed",
    save_directory="Kimi-K2-Thinking-BF16-NVFP4A16",
    scheme="NVFP4A16",
    ignore=[
        "re:.*gate$",
        "lm_head",
        "re:.*kv_a_proj_with_mqa$",
        "re:.*q_a_proj$",
        "model.embed_tokens",
    ],
    max_workers=15,
    device="cuda:0",
)

Changes

Restructure files
- Move validate_scheme to validate.py
- Move find_safetensors_index_path, find_config_path, find_safetensors_index_file to helpers.py
- Move process_file to process.py
- Move validate_scheme to validate.py
- Break calibrate_weights into calibrate_global_scale and calibrate_scale_zp
Add extra utility functions
- match_names_set_eager
- invert_mapping
Add microscale/fused module utility functions
- is_microscale_scheme
- get_fused_names
Add process_file_microscale_scheme to separate the fp4 lifecycle from the regular lifecycle (this script should be very trustworthy. By separating the functions, an FP8 user does not have to trust anything about FP4)
Add llm.compressor.reindex_fused_weights script which reindexes a model's weights so that fused modules are in the same files.
Fix bug where safetensors index metadata was not being saved correctly

Testing

Add NVFP4A16 to test_model_free_ptq_matches_oneshot
Regression tested large mistral model e2e with FP8_BLOCK
Tested large mistral model e2e with NVFP4A16

github-actions · 2025-11-03T16:27:45Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

loqs · 2025-11-26T20:20:11Z

@kylesayrs is it expected that models built using this pull request will fail currently to load in vllm due to an llvm bug vllm-project/vllm#21977?

kylesayrs · 2025-11-28T06:41:35Z

@loqs Yes, thank you for the clarification. I mentioned in the linked issue that this is relatively trivial to enable on vLLM, I'll make sure to share my branches soon.

kylesayrs changed the title ~~[Weights-only] NVFP4A16~~ [model_free_ptq] NVFP4A16 Nov 3, 2025

kylesayrs mentioned this pull request Nov 6, 2025

[Feature Request] datafree pipeline with layer-by-layer quantization #1970

Open

kylesayrs force-pushed the kylesayrs/weights-only-nvfp4 branch 2 times, most recently from 39e9956 to 0677428 Compare November 7, 2025 19:00

kylesayrs marked this pull request as ready for review November 7, 2025 19:06

kylesayrs added the ready When a PR is ready for review label Nov 7, 2025

kylesayrs marked this pull request as draft November 18, 2025 02:09

kylesayrs force-pushed the kylesayrs/weights-only-nvfp4 branch from cb2a755 to 4c7f43b Compare November 18, 2025 17:14

nvfp4a16

8db6c6b

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

kylesayrs force-pushed the kylesayrs/weights-only-nvfp4 branch from 882fed1 to 8db6c6b Compare November 18, 2025 23:57

kylesayrs marked this pull request as ready for review November 18, 2025 23:58

kylesayrs added 3 commits November 18, 2025 18:59

fix typo

a9298e4

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix optional arg

30bb1b6

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

fix typo

923a000

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`model_free_ptq`] NVFP4A16 #1988

[`model_free_ptq`] NVFP4A16 #1988

kylesayrs commented Nov 3, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

loqs commented Nov 26, 2025 •

edited

Loading

Uh oh!

kylesayrs commented Nov 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[model_free_ptq] NVFP4A16 #1988

Are you sure you want to change the base?

[model_free_ptq] NVFP4A16 #1988

Conversation

kylesayrs commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Changes

Testing

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

loqs commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kylesayrs commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[`model_free_ptq`] NVFP4A16 #1988

[`model_free_ptq`] NVFP4A16 #1988

kylesayrs commented Nov 3, 2025 •

edited

Loading

loqs commented Nov 26, 2025 •

edited

Loading

kylesayrs commented Nov 28, 2025 •

edited

Loading