Add Intel AutoRound algorithm support #1994

yiliu30 · 2025-11-05T07:38:47Z

Resolve #1968

Highlights

Introduced AutoRoundModifier to enable AutoRound quantization for wNa16.
Added an end-to-end example and unit tests.
Verified functionality with local accuracy tests (GSM8K with a limit of 1000, the results may fluctuate due to non-determinism.)

- LLMC-AutoRound
vllm (pretrained=/storage/yiliu7/Meta-Llama-3-8B-Instruct-W4A16-G128-disbale-shuffule,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.737|±  |0.0139|
|     |       |strict-match    |     5|exact_match|↑  |0.736|±  |0.0139|

- AutoRound result as ref
vllm (pretrained=/storage/yiliu7/meta-llama/Meta-Llama-3-8B-Instruct-ar/Meta-Llama-3-8B-Instruct-w4g128/,tensor_parallel_size=1,max_model_len=8192,max_num_batched_tokens=32768,max_num_seqs=128,add_bos_token=True,gpu_memory_utilization=0.8,dtype=bfloat16,max_gen_toks=2048,enable_prefix_caching=False), gen_kwargs: (None), limit: 1000.0, num_fewshot: None, batch_size: 128
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.739|±  |0.0139|
|     |       |strict-match    |     5|exact_match|↑  |0.740|±  |0.0139|

Attached eval cmd FYI.

Next stage (in later PRs)

Extend support for additional data types.
Add group-wise quantization recipes mapping between LLMC and AutoRound.
Add end-to-end tests.

cc @hshen14 @thuang6 @wenhuach21

Signed-off-by: yiliu30 <yi4.liu@intel.com>

brian-dellabetta

Thanks for addressing my comments! A few more small things:

src/llmcompressor/modifiers/autoround/base.py

setup.py

src/llmcompressor/modifiers/autoround/base.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Signed-off-by: Yi Liu <yi4.liu@intel.com>

yiliu30 · 2025-11-08T04:15:43Z

Hi @yiliu30 , do you have an estimate for when the next version of autoround will release? Does it have the appropriate licensing to avoid issues like vllm-project/compressed-tensors#468?

Hi @brian-dellabetta , We're planning to release the next version within the next 1–2 weeks—hope that works for you!
As for AutoRound, it's licensed under Apache License 2.0, so I guess there shouldn't be any licensing concerns.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

…mpressor-fork into autoround-support

Signed-off-by: yiliu30 <yi4.liu@intel.com>

kylesayrs

Really awesome job, thanks for the contribution!

tests/llmcompressor/transformers/autoround/test_oneshot.py

Signed-off-by: yiliu30 <yi4.liu@intel.com>

…mpressor-fork into autoround-support

brian-dellabetta

Thanks for adding this! We can pin autoround in a follow-up then

@hshen14

The follow-up on #1994 The rendered version: https://github.com/yiliu30/llm-compressor-fork/blob/autoround-doc/README.md https://github.com/yiliu30/llm-compressor-fork/tree/autoround-doc/examples/autoround https://github.com/yiliu30/llm-compressor-fork/blob/autoround-doc/docs/getting-started/compress.md cc @hshen14 @thuang6 @wenhuach21 --------- Signed-off-by: yiliu30 <yi4.liu@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: HDCharles <39544797+HDCharles@users.noreply.github.com>

yiliu30 added 30 commits November 2, 2025 20:49

add auto-round

80c92da

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into up-ar

75f7efd

add auto-round modifier

3266b79

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine code

9c537cc

Signed-off-by: yiliu30 <yi4.liu@intel.com>

disbale qac for auto-round

bebe0fa

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

dfb0ff8

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add compile after disable qac

513972c

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add iters and clean code

2291cc4

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

4028853

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add example

97ff9e0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine docs

cb7a5b4

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refine example

5a7500e

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add init

d02a355

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

cea9d2f

Signed-off-by: yiliu30 <yi4.liu@intel.com>

format

22be9b7

Signed-off-by: yiliu30 <yi4.liu@intel.com>

refactor

6cdb402

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add ut

e2814eb

Signed-off-by: yiliu30 <yi4.liu@intel.com>

test llama 3

3e4a9fc

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

aa34b65

Signed-off-by: yiliu30 <yi4.liu@intel.com>

parse layer-wise config

afe2ff7

Signed-off-by: yiliu30 <yi4.liu@intel.com>

format

8e9eccc

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add docstring

81f76af

Signed-off-by: yiliu30 <yi4.liu@intel.com>

add ar

afa6150

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update example

97217e7

Signed-off-by: yiliu30 <yi4.liu@intel.com>

align api

3dcb434

Signed-off-by: yiliu30 <yi4.liu@intel.com>

format

aef7707

Signed-off-by: yiliu30 <yi4.liu@intel.com>

clean code

97e1ca2

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix typo

c75c272

Signed-off-by: yiliu30 <yi4.liu@intel.com>

small iters for ut

3d8a0c8

Signed-off-by: yiliu30 <yi4.liu@intel.com>

format

6729a75

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 requested review from HDCharles, dsikka and kylesayrs November 7, 2025 11:05

brian-dellabetta reviewed Nov 7, 2025

View reviewed changes

src/llmcompressor/modifiers/autoround/base.py Outdated Show resolved Hide resolved

setup.py Show resolved Hide resolved

src/llmcompressor/modifiers/autoround/base.py Outdated Show resolved Hide resolved

yiliu30 and others added 2 commits November 7, 2025 19:55

fix import

5cd35a6

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Update src/llmcompressor/modifiers/autoround/base.py

678b123

Co-authored-by: Brian Dellabetta <brian-dellabetta@users.noreply.github.com> Signed-off-by: Yi Liu <yi4.liu@intel.com>

yiliu30 added 8 commits November 9, 2025 18:13

add qinput

a8c63d3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'autoround-support' of https://github.com/yiliu30/llm-co…

38634dc

…mpressor-fork into autoround-support

clean cache

fbc047a

Signed-off-by: yiliu30 <yi4.liu@intel.com>

align api

96b6490

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

d00d41b

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

d4a8fb0

Signed-off-by: yiliu30 <yi4.liu@intel.com>

update

487fcd2

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into autoround-support

baeea3f

kylesayrs previously approved these changes Nov 11, 2025

View reviewed changes

tests/llmcompressor/transformers/autoround/test_oneshot.py Show resolved Hide resolved

yiliu30 added 3 commits November 11, 2025 16:25

add requires_gpu for ut

3adc879

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Merge branch 'main' into autoround-support

ac10f7b

Merge branch 'autoround-support' of https://github.com/yiliu30/llm-co…

decb14f

…mpressor-fork into autoround-support

yiliu30 dismissed kylesayrs’s stale review via decb14f November 12, 2025 00:40

yiliu30 requested review from brian-dellabetta and kylesayrs November 12, 2025 00:42

Merge branch 'main' into autoround-support

f9dabc4

kylesayrs approved these changes Nov 12, 2025

View reviewed changes

brian-dellabetta approved these changes Nov 12, 2025

View reviewed changes

dsikka merged commit 63c175b into vllm-project:main Nov 13, 2025
9 checks passed

This was referenced Nov 13, 2025

RFC: Add Intel AutoRound Quantization Algorithm Support #1968

Closed

Add Intel AutoRound algorithm support yiliu30/llm-compressor-fork#8

Closed

yiliu30 mentioned this pull request Nov 20, 2025

[docs] Add autoround doc #2055

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Intel AutoRound algorithm support #1994

Add Intel AutoRound algorithm support #1994

Uh oh!

yiliu30 commented Nov 5, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Nov 8, 2025

Uh oh!

kylesayrs left a comment

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add Intel AutoRound algorithm support #1994

Add Intel AutoRound algorithm support #1994

Uh oh!

Conversation

yiliu30 commented Nov 5, 2025

Highlights

Next stage (in later PRs)

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yiliu30 commented Nov 8, 2025

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants