Skip to content

Conversation

@yiliu30
Copy link
Contributor

@yiliu30 yiliu30 commented Nov 5, 2025

Resolve #1968

Highlights

  • Introduced AutoRoundModifier to enable AutoRound quantization for wNa16.
  • Added an end-to-end example and unit tests.
  • Verified functionality with local accuracy tests (GSM8K with a limit of 1000, the results may fluctuate due to non-determinism.)
- LLMC-AutoRound
vllm (pretrained=/storage/yiliu7/Meta-Llama-3-8B-Instruct-W4A16-G128-disbale-shuffule,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match||0.737|±  |0.0139|
|     |       |strict-match    |     5|exact_match||0.736|±  |0.0139|

- AutoRound result as ref
vllm (pretrained=/storage/yiliu7/meta-llama/Meta-Llama-3-8B-Instruct-ar/Meta-Llama-3-8B-Instruct-w4g128/,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match||0.739|±  |0.0139|
|     |       |strict-match    |     5|exact_match||0.740|±  |0.0139|

Attached eval cmd FYI.

Next stage (in later PRs)

  • Extend support for additional data types.
  • Add group-wise quantization recipes mapping between LLMC and AutoRound.
  • Add end-to-end tests.

cc @hshen14 @thuang6 @wenhuach21

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing my comments! A few more small things:

yiliu30 and others added 2 commits November 7, 2025 19:55
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com>
Signed-off-by: Yi Liu <yi4.liu@intel.com>
@yiliu30
Copy link
Contributor Author

yiliu30 commented Nov 8, 2025

Hi @yiliu30 , do you have an estimate for when the next version of autoround will release? Does it have the appropriate licensing to avoid issues like vllm-project/compressed-tensors#468?

Hi @brian-dellabetta , We're planning to release the next version within the next 1–2 weeks—hope that works for you!
As for AutoRound, it's licensed under Apache License 2.0, so I guess there shouldn't be any licensing concerns.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
kylesayrs
kylesayrs previously approved these changes Nov 11, 2025
Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really awesome job, thanks for the contribution!

Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! We can pin autoround in a follow-up then

@dsikka dsikka merged commit 63c175b into vllm-project:main Nov 13, 2025
9 checks passed
dsikka pushed a commit that referenced this pull request Nov 21, 2025
The follow-up on
#1994
The rendered version:

https://github.com/yiliu30/llm-compressor-fork/blob/autoround-doc/README.md

https://github.com/yiliu30/llm-compressor-fork/tree/autoround-doc/examples/autoround

https://github.com/yiliu30/llm-compressor-fork/blob/autoround-doc/docs/getting-started/compress.md

cc @hshen14 @thuang6 @wenhuach21

---------

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: HDCharles <39544797+HDCharles@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RFC: Add Intel AutoRound Quantization Algorithm Support

5 participants