-
Notifications
You must be signed in to change notification settings - Fork 294
[model_free_ptq] NVFP4A16
#1988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
39e9956 to
0677428
Compare
cb2a755 to
4c7f43b
Compare
882fed1 to
8db6c6b
Compare
|
@kylesayrs is it expected that models built using this pull request will fail currently to load in vllm due to an llvm bug vllm-project/vllm#21977? |
|
@loqs Yes, thank you for the clarification. I mentioned in the linked issue that this is relatively trivial to enable on vLLM, I'll make sure to share my branches soon. |
Purpose
model_free_ptqllmcompressor.reindex_fused_weights \ unsloth/Kimi-K2-Thinking-BF16 \ Kimi-K2-Thinking-BF16-reindexed \ --num_workers=10Changes
validate_schemetovalidate.pyfind_safetensors_index_path,find_config_path,find_safetensors_index_filetohelpers.pyprocess_filetoprocess.pyvalidate_schemetovalidate.pycalibrate_weightsintocalibrate_global_scaleandcalibrate_scale_zpmatch_names_set_eagerinvert_mappingis_microscale_schemeget_fused_namesprocess_file_microscale_schemeto separate the fp4 lifecycle from the regular lifecycle (this script should be very trustworthy. By separating the functions, an FP8 user does not have to trust anything about FP4)llm.compressor.reindex_fused_weightsscript which reindexes a model's weights so that fused modules are in the same files.Testing
test_model_free_ptq_matches_oneshot