Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions src/diffusers/pipelines/pipeline_loading_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -867,6 +867,26 @@ def load_sub_model(
)
if model_quant_config is not None:
loading_kwargs["quantization_config"] = model_quant_config

Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace detected. Please remove the trailing spaces on this line.

Suggested change

Copilot uses AI. Check for mistakes.
# When using bitsandbytes quantization with device_map on transformers models,
# we must disable low_cpu_mem_usage to avoid meta tensors. Meta tensors cannot
# be materialized properly when bitsandbytes tries to move quantization state
# (which includes tensors like code and absmax) to the target device.
# This issue occurs because quantization state is created during model loading
# and needs actual tensors, not meta placeholders.
# See: https://github.com/huggingface/diffusers/issues/12719
if (
is_transformers_model
and device_map is not None
and hasattr(model_quant_config, "quant_method")
):
quant_method = getattr(model_quant_config.quant_method, "value", model_quant_config.quant_method)
if quant_method in ["llm_int8", "fp4", "nf4"]: # bitsandbytes quantization methods
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for detecting bitsandbytes quantization is incorrect. The quant_method attribute is set to QuantizationMethod.BITS_AND_BYTES (which has the value "bitsandbytes"), not to the specific quantization method strings ["llm_int8", "fp4", "nf4"].

This condition will always be False, meaning the workaround will never be applied.

The fix should check if quant_method equals "bitsandbytes" (or QuantizationMethod.BITS_AND_BYTES):

quant_method = getattr(model_quant_config.quant_method, "value", model_quant_config.quant_method)
if quant_method == "bitsandbytes":  # or quant_method == QuantizationMethod.BITS_AND_BYTES

Alternatively, if you want to check the specific quantization type, you should call the quantization_method() method instead:

if hasattr(model_quant_config, "quantization_method"):
    quant_method = model_quant_config.quantization_method()
    if quant_method in ["llm_int8", "fp4", "nf4"]:

Reference: The quant_method attribute is defined in BitsAndBytesConfig.__init__ at line 248 of quantization_config.py as self.quant_method = QuantizationMethod.BITS_AND_BYTES. The specific method names are returned by the quantization_method() method (lines 365-377).

Suggested change
if quant_method in ["llm_int8", "fp4", "nf4"]: # bitsandbytes quantization methods
if quant_method == "bitsandbytes": # bitsandbytes quantization

Copilot uses AI. Check for mistakes.
loading_kwargs["low_cpu_mem_usage"] = False
logger.info(
f"Disabling low_cpu_mem_usage for {name} because bitsandbytes quantization "
f"with device_map requires materialized tensors, not meta tensors."
)

# check if the module is in a subdirectory
if dduf_entries:
Expand Down