Skip to content

[Bug] wan2.2 t2v run fail with vulkan #942

@wszgrcy

Description

@wszgrcy

Git commit

8f6c5c2
version https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-348-8f6c5c2

Operating System & Version

windows 10 22h2 19045.4717

GGML backends

Vulkan

Command-line arguments used

./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae ./wan2.2_vae.safetensors --t5xxl ./umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作 品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG 压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 -v

Steps to reproduce

  • run ./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae ./wan2.2_vae.safetensors --t5xxl ./umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作 品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG 压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 -v

throw error

[ERROR] ggml_extend.hpp:75 - ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 21773256964
[ERROR] ggml_extend.hpp:1588 - wan_vae: failed to allocate the compute buffer

What you expected to happen

success

What actually happened

throw error

Logs / error messages / stack trace

➜  vulkan ./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae ./wan2.2_vae.safetensors --t5xxl ./umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作 品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG 压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 -v
Option:
    n_threads:                         12
    mode:                              vid_gen
    model_path:
    wtype:                             unspecified
    clip_l_path:
    clip_g_path:
    clip_vision_path:
    t5xxl_path:                        ./umt5-xxl-encoder-Q8_0.gguf
    qwen2vl_path:
    qwen2vl_vision_path:
    diffusion_model_path:              ./Wan2.2-TI2V-5B-Q8_0.gguf
    high_noise_diffusion_model_path:
    vae_path:                          ./wan2.2_vae.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:
    photo_maker_path:
    pm_id_images_dir:
    pm_id_embed_path:
    pm_style_strength:                 20.00
    output_path:                       output.png
    init_image_path:
    end_image_path:
    mask_image_path:
    control_image_path:
    ref_images_paths:
    control_video_path:
    auto_resize_ref_image:             true
    increase_ref_index:                false
    offload_params_to_cpu:             true
    clip_on_cpu:                       false
    control_net_cpu:                   false
    vae_on_cpu:                        false
    diffusion flash attention:         true
    diffusion Conv2d direct:           false
    vae_conv_direct:                   false
    control_strength:                  0.90
    prompt:                            a lovely cat
    negative_prompt:                   色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作 品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG 压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走
    clip_skip:                         -1
    width:                             480
    height:                            832
    sample_params:                     (txt_cfg: 6.00, img_cfg: 6.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
    high_noise_sample_params:          (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
    moe_boundary:                      0.875
    prediction:                        default
    flow_shift:                        3.00
    strength(img2img):                 0.75
    rng:                               cuda
    seed:                              42
    batch_count:                       1
    vae_tiling:                        false
    force_sdxl_vae_conv_scale:         false
    upscale_repeats:                   1
    chroma_use_dit_mask:               true
    chroma_use_t5_mask:                false
    chroma_t5_mask_pad:                1
    video_frames:                      33
    vace_strength:                     1.00
    fps:                               16
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:147  - Using Vulkan backend
[DEBUG] ggml_extend.hpp:66   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:66   - ggml_vulkan: 0 = AMD Radeon RX 7900 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:203  - loading diffusion model from './Wan2.2-TI2V-5B-Q8_0.gguf'
[INFO ] model.cpp:1001 - load ./Wan2.2-TI2V-5B-Q8_0.gguf using gguf format
[DEBUG] model.cpp:1018 - init from './Wan2.2-TI2V-5B-Q8_0.gguf'
[ERROR] ggml_extend.hpp:75   - gguf_init_from_file_impl: tensor 'patch_embedding.weight' has invalid number of dimensions: 5 > 4
[ERROR] ggml_extend.hpp:75   - gguf_init_from_file_impl: failed to read tensor info
[ERROR] model.cpp:1027 - failed to open './Wan2.2-TI2V-5B-Q8_0.gguf' with gguf_init_from_file. Try to open it with GGUFReader.
[DEBUG] gguf_reader.hpp:198  - GGUF v3, tensor_count=825, metadata_kv_count=3
[DEBUG] model.cpp:1739 - patch_embedding_channels 147456
[INFO ] stable-diffusion.cpp:243  - loading t5xxl from './umt5-xxl-encoder-Q8_0.gguf'
[INFO ] model.cpp:1001 - load ./umt5-xxl-encoder-Q8_0.gguf using gguf format
[DEBUG] model.cpp:1018 - init from './umt5-xxl-encoder-Q8_0.gguf'
[INFO ] stable-diffusion.cpp:264  - loading vae from './wan2.2_vae.safetensors'
[INFO ] model.cpp:1004 - load ./wan2.2_vae.safetensors using safetensors format
[DEBUG] model.cpp:1109 - init from './wan2.2_vae.safetensors', prefix = 'vae.'
[DEBUG] model.cpp:1739 - patch_embedding_channels 147456
[INFO ] stable-diffusion.cpp:285  - Version: Wan 2.2 TI2V
[INFO ] stable-diffusion.cpp:312  - Weight type stat:                      f32: 74   |     f16: 720  |    q8_0: 469
[INFO ] stable-diffusion.cpp:313  - Conditioner weight type stat:          f32: 73   |    q8_0: 169
[INFO ] stable-diffusion.cpp:314  - Diffusion model weight type stat:      f32: 1    |     f16: 524  |    q8_0: 300
[INFO ] stable-diffusion.cpp:315  - VAE weight type stat:                  f16: 196
[DEBUG] stable-diffusion.cpp:317  - ggml tensor size = 400 bytes
[INFO ] wan.hpp:2123 - Wan2.2-TI2V-5B
[INFO ] stable-diffusion.cpp:451  - Using flash attention in the diffusion model
[DEBUG] ggml_extend.hpp:1783 - t5 params backend buffer size =  5757.05 MB(RAM) (242 tensors)
[DEBUG] ggml_extend.hpp:1783 - Wan2.2-TI2V-5B params backend buffer size =  5153.43 MB(RAM) (825 tensors)
[DEBUG] ggml_extend.hpp:1783 - wan_vae params backend buffer size =  1344.24 MB(RAM) (196 tensors)
[DEBUG] stable-diffusion.cpp:592  - loading weights
[DEBUG] model.cpp:1920 - using 12 threads for model loading
[DEBUG] model.cpp:1942 - loading tensors from ./Wan2.2-TI2V-5B-Q8_0.gguf
  |================================>                 | 825/1263 - 428.57it/s
[DEBUG] model.cpp:1942 - loading tensors from ./umt5-xxl-encoder-Q8_0.gguf
  |==========================================>       | 1067/1263 - 219.91it/s
[DEBUG] model.cpp:1942 - loading tensors from ./wan2.2_vae.safetensors
  |==================================================| 1263/1263 - 230.52it/s
[INFO ] model.cpp:2151 - loading tensors completed, taking 5.48s (process: 0.00s, read: 4.12s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:690  - total params memory size = 12254.72MB (VRAM 12254.72MB, RAM 0.00MB): text_encoders 5757.05MB(VRAM), diffusion_model 5153.43MB(VRAM), vae 1344.24MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:769  - running in FLOW mode
[DEBUG] stable-diffusion.cpp:801  - finished loaded file
[INFO ] stable-diffusion.cpp:2745 - generate_video 480x832x33
[INFO ] stable-diffusion.cpp:947  - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:967  - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:968  - prompt after extract and remove lora: "a lovely cat"
[DEBUG] conditioner.hpp:1415 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] t5.hpp:402  - token length: 512
[INFO ] ggml_extend.hpp:1698 - t5 offload params (5757.05 MB, 242 tensors) to runtime backend (Vulkan0), taking 1.35s
[DEBUG] ggml_extend.hpp:1598 - t5 compute buffer size: 297.00 MB(VRAM)
[DEBUG] conditioner.hpp:1515 - computing condition graph completed, taking 1728 ms
[DEBUG] conditioner.hpp:1415 - parse '色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作 品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG 压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走' to [['色调艳丽,过曝,静态,细节模糊不清,字幕,风格,作 品,画作,画面,静止,整体发灰,最差质量,低质量,JPEG 压缩残留,丑陋的,残缺的,多余的手指,画得不好的手部,画得不好的脸部,畸形的,毁容的,形态畸形的肢体,手指融合,静止不动的画面,杂乱的背景,三条腿,背景人很多,倒着走', 1], ]
[DEBUG] t5.hpp:402  - token length: 512
[INFO ] ggml_extend.hpp:1698 - t5 offload params (5757.05 MB, 242 tensors) to runtime backend (Vulkan0), taking 1.28s
[DEBUG] ggml_extend.hpp:1598 - t5 compute buffer size: 297.00 MB(VRAM)
[DEBUG] conditioner.hpp:1515 - computing condition graph completed, taking 1667 ms
[INFO ] stable-diffusion.cpp:2999 - get_learned_condition completed, taking 3412 ms
[DEBUG] stable-diffusion.cpp:3055 - sample 30x52x9
[INFO ] ggml_extend.hpp:1698 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (Vulkan0), taking 1.69s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 335.35 MB(VRAM)
  |==================================================| 20/20 - 4.17s/it
[INFO ] stable-diffusion.cpp:3082 - sampling completed, taking 83.52s
[INFO ] stable-diffusion.cpp:3103 - generating latent video completed, taking 84.18s
[INFO ] ggml_extend.hpp:1698 - wan_vae offload params (1344.24 MB, 196 tensors) to runtime backend (Vulkan0), taking 0.23s
ggml_vulkan: Device memory allocation of size 2760376320 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:75   - ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 21773256964
[ERROR] ggml_extend.hpp:1588 - wan_vae: failed to allocate the compute buffer
[1]    1506 segmentation fault  ./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae    -p

Additional context / environment details

cpu 5900x
gpu 7900xt (20G memory 32G share memeory)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions