Skip to content

Commit 61ddcdf

Browse files
authored
Merge branch 'main' into unified-SP-attention
2 parents 9ebcff5 + 6290fdf commit 61ddcdf

File tree

214 files changed

+24111
-3411
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

214 files changed

+24111
-3411
lines changed

docs/source/en/_toctree.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -349,6 +349,8 @@
349349
title: DiTTransformer2DModel
350350
- local: api/models/easyanimate_transformer3d
351351
title: EasyAnimateTransformer3DModel
352+
- local: api/models/flux2_transformer
353+
title: Flux2Transformer2DModel
352354
- local: api/models/flux_transformer
353355
title: FluxTransformer2DModel
354356
- local: api/models/hidream_image_transformer
@@ -357,6 +359,8 @@
357359
title: HunyuanDiT2DModel
358360
- local: api/models/hunyuanimage_transformer_2d
359361
title: HunyuanImageTransformer2DModel
362+
- local: api/models/hunyuan_video15_transformer_3d
363+
title: HunyuanVideo15Transformer3DModel
360364
- local: api/models/hunyuan_video_transformer_3d
361365
title: HunyuanVideoTransformer3DModel
362366
- local: api/models/latte_transformer3d
@@ -371,6 +375,8 @@
371375
title: MochiTransformer3DModel
372376
- local: api/models/omnigen_transformer
373377
title: OmniGenTransformer2DModel
378+
- local: api/models/ovisimage_transformer2d
379+
title: OvisImageTransformer2DModel
374380
- local: api/models/pixart_transformer2d
375381
title: PixArtTransformer2DModel
376382
- local: api/models/prior_transformer
@@ -395,6 +401,8 @@
395401
title: WanAnimateTransformer3DModel
396402
- local: api/models/wan_transformer_3d
397403
title: WanTransformer3DModel
404+
- local: api/models/z_image_transformer2d
405+
title: ZImageTransformer2DModel
398406
title: Transformers
399407
- sections:
400408
- local: api/models/stable_cascade_unet
@@ -431,6 +439,8 @@
431439
title: AutoencoderKLHunyuanImageRefiner
432440
- local: api/models/autoencoder_kl_hunyuan_video
433441
title: AutoencoderKLHunyuanVideo
442+
- local: api/models/autoencoder_kl_hunyuan_video15
443+
title: AutoencoderKLHunyuanVideo15
434444
- local: api/models/autoencoderkl_ltx_video
435445
title: AutoencoderKLLTXVideo
436446
- local: api/models/autoencoderkl_magvit
@@ -525,6 +535,8 @@
525535
title: EasyAnimate
526536
- local: api/pipelines/flux
527537
title: Flux
538+
- local: api/pipelines/flux2
539+
title: Flux2
528540
- local: api/pipelines/control_flux_inpaint
529541
title: FluxControlInpaint
530542
- local: api/pipelines/hidream
@@ -541,6 +553,8 @@
541553
title: Kandinsky 2.2
542554
- local: api/pipelines/kandinsky3
543555
title: Kandinsky 3
556+
- local: api/pipelines/kandinsky5_image
557+
title: Kandinsky 5.0 Image
544558
- local: api/pipelines/kolors
545559
title: Kolors
546560
- local: api/pipelines/latent_consistency_models
@@ -559,6 +573,8 @@
559573
title: MultiDiffusion
560574
- local: api/pipelines/omnigen
561575
title: OmniGen
576+
- local: api/pipelines/ovis_image
577+
title: Ovis-Image
562578
- local: api/pipelines/pag
563579
title: PAG
564580
- local: api/pipelines/paint_by_example
@@ -634,6 +650,8 @@
634650
title: VisualCloze
635651
- local: api/pipelines/wuerstchen
636652
title: Wuerstchen
653+
- local: api/pipelines/z_image
654+
title: Z-Image
637655
title: Image
638656
- sections:
639657
- local: api/pipelines/allegro
@@ -648,6 +666,8 @@
648666
title: Framepack
649667
- local: api/pipelines/hunyuan_video
650668
title: HunyuanVideo
669+
- local: api/pipelines/hunyuan_video15
670+
title: HunyuanVideo1.5
651671
- local: api/pipelines/i2vgenxl
652672
title: I2VGen-XL
653673
- local: api/pipelines/kandinsky5_video

docs/source/en/api/cache.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,9 @@ Cache methods speedup diffusion transformers by storing and reusing intermediate
3434
[[autodoc]] FirstBlockCacheConfig
3535

3636
[[autodoc]] apply_first_block_cache
37+
38+
### TaylorSeerCacheConfig
39+
40+
[[autodoc]] TaylorSeerCacheConfig
41+
42+
[[autodoc]] apply_taylorseer_cache

docs/source/en/api/loaders/lora.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,9 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
3030
- [`CogView4LoraLoaderMixin`] provides similar functions for [CogView4](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogview4).
3131
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
3232
- [`HiDreamImageLoraLoaderMixin`] provides similar functions for [HiDream Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/hidream)
33-
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen)
33+
- [`QwenImageLoraLoaderMixin`] provides similar functions for [Qwen Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/qwen).
34+
- [`ZImageLoraLoaderMixin`] provides similar functions for [Z-Image](https://huggingface.co/docs/diffusers/main/en/api/pipelines/zimage).
35+
- [`Flux2LoraLoaderMixin`] provides similar functions for [Flux2](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux2).
3436
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3537

3638
> [!TIP]
@@ -56,6 +58,10 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
5658

5759
[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin
5860

61+
## Flux2LoraLoaderMixin
62+
63+
[[autodoc]] loaders.lora_pipeline.Flux2LoraLoaderMixin
64+
5965
## CogVideoXLoraLoaderMixin
6066

6167
[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin
@@ -107,6 +113,10 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
107113

108114
[[autodoc]] loaders.lora_pipeline.QwenImageLoraLoaderMixin
109115

116+
## ZImageLoraLoaderMixin
117+
118+
[[autodoc]] loaders.lora_pipeline.ZImageLoraLoaderMixin
119+
110120
## KandinskyLoraLoaderMixin
111121
[[autodoc]] loaders.lora_pipeline.KandinskyLoraLoaderMixin
112122

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanVideo15
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5) by Tencent.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanVideo15
20+
21+
vae = AutoencoderKLHunyuanVideo15.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v", subfolder="vae", torch_dtype=torch.float32)
22+
23+
# make sure to enable tiling to avoid OOM
24+
vae.enable_tiling()
25+
```
26+
27+
## AutoencoderKLHunyuanVideo15
28+
29+
[[autodoc]] AutoencoderKLHunyuanVideo15
30+
- decode
31+
- encode
32+
- all
33+
34+
## DecoderOutput
35+
36+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Flux2Transformer2DModel
14+
15+
A Transformer model for image-like data from [Flux2](https://hf.co/black-forest-labs/FLUX.2-dev).
16+
17+
## Flux2Transformer2DModel
18+
19+
[[autodoc]] Flux2Transformer2DModel
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HunyuanVideo15Transformer3DModel
13+
14+
A Diffusion Transformer model for 3D video-like data used in [HunyuanVideo1.5](https://github.com/Tencent/HunyuanVideo1-1.5).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HunyuanVideo15Transformer3DModel
20+
21+
transformer = HunyuanVideo15Transformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v" subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HunyuanVideo15Transformer3DModel
25+
26+
[[autodoc]] HunyuanVideo15Transformer3DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# OvisImageTransformer2DModel
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from diffusers import OvisImageTransformer2DModel
18+
19+
transformer = OvisImageTransformer2DModel.from_pretrained("AIDC-AI/Ovis-Image-7B", subfolder="transformer", torch_dtype=torch.bfloat16)
20+
```
21+
22+
## OvisImageTransformer2DModel
23+
24+
[[autodoc]] OvisImageTransformer2DModel
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# ZImageTransformer2DModel
14+
15+
A Transformer model for image-like data from [Z-Image](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo).
16+
17+
## ZImageTransformer2DModel
18+
19+
[[autodoc]] ZImageTransformer2DModel

docs/source/en/api/pipelines/bria_fibo.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,10 @@ With only 8 billion parameters, FIBO provides a new level of image quality, prom
2121
FIBO is trained exclusively on a structured prompt and will not work with freeform text prompts.
2222
you can use the [FIBO-VLM-prompt-to-JSON](https://huggingface.co/briaai/FIBO-VLM-prompt-to-JSON) model or the [FIBO-gemini-prompt-to-JSON](https://huggingface.co/briaai/FIBO-gemini-prompt-to-JSON) to convert your freeform text prompt to a structured JSON prompt.
2323

24-
its not recommended to use freeform text prompts directly with FIBO, as it will not produce the best results.
24+
> [!NOTE]
25+
> Avoid using freeform text prompts directly with FIBO because it does not produce the best results.
2526
26-
you can learn more about FIBO in [Bria Fibo Hugging Face page](https://huggingface.co/briaai/FIBO).
27+
Refer to the Bria Fibo Hugging Face [page](https://huggingface.co/briaai/FIBO) to learn more.
2728

2829

2930
## Usage
@@ -37,9 +38,8 @@ hf auth login
3738
```
3839

3940

40-
## BriaPipeline
41+
## BriaFiboPipeline
4142

42-
[[autodoc]] BriaPipeline
43+
[[autodoc]] BriaFiboPipeline
4344
- all
44-
- __call__
45-
45+
- __call__
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Flux2
14+
15+
<div class="flex flex-wrap space-x-1">
16+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
17+
<img alt="MPS" src="https://img.shields.io/badge/MPS-000000?style=flat&logo=apple&logoColor=white%22">
18+
</div>
19+
20+
Flux.2 is the recent series of image generation models from Black Forest Labs, preceded by the [Flux.1](./flux.md) series. It is an entirely new model with a new architecture and pre-training done from scratch!
21+
22+
Original model checkpoints for Flux can be found [here](https://huggingface.co/black-forest-labs). Original inference code can be found [here](https://github.com/black-forest-labs/flux2).
23+
24+
> [!TIP]
25+
> Flux2 can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. Additionally, Flux can benefit from quantization for memory efficiency with a trade-off in inference latency. Refer to [this blog post](https://huggingface.co/blog/quanto-diffusers) to learn more.
26+
>
27+
> [Caching](../../optimization/cache) may also speed up inference by storing and reusing intermediate outputs.
28+
29+
## Caption upsampling
30+
31+
Flux.2 can potentially generate better better outputs with better prompts. We can "upsample"
32+
an input prompt by setting the `caption_upsample_temperature` argument in the pipeline call arguments.
33+
The [official implementation](https://github.com/black-forest-labs/flux2/blob/5a5d316b1b42f6b59a8c9194b77c8256be848432/src/flux2/text_encoder.py#L140) recommends this value to be 0.15.
34+
35+
## Flux2Pipeline
36+
37+
[[autodoc]] Flux2Pipeline
38+
- all
39+
- __call__

0 commit comments

Comments
 (0)