[Bug] wan2.2 t2v run fail with vulkan

### Git commit

8f6c5c217b1f6f27a8aa5fb78d3390fa849fc96a
version https://github.com/leejet/stable-diffusion.cpp/releases/tag/master-348-8f6c5c2


### Operating System & Version

windows 10 22h2 19045.4717

### GGML backends

Vulkan

### Command-line arguments used

./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae ./wan2.2_vae.safetensors --t5xxl ./umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作 品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG 压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 -v

### Steps to reproduce

- run ./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae ./wan2.2_vae.safetensors --t5xxl ./umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作 品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG 压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 -v

throw error 

[ERROR] ggml_extend.hpp:75   - ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 21773256964
[ERROR] ggml_extend.hpp:1588 - wan_vae: failed to allocate the compute buffer

### What you expected to happen

success

### What actually happened

throw error

### Logs / error messages / stack trace

```log
➜  vulkan ./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae ./wan2.2_vae.safetensors --t5xxl ./umt5-xxl-encoder-Q8_0.gguf -p "a lovely cat" --cfg-scale 6.0 --sampling-method euler -v -n "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作 品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG 压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走" -W 480 -H 832 --diffusion-fa --offload-to-cpu --video-frames 33 --flow-shift 3.0 -v
Option:
    n_threads:                         12
    mode:                              vid_gen
    model_path:
    wtype:                             unspecified
    clip_l_path:
    clip_g_path:
    clip_vision_path:
    t5xxl_path:                        ./umt5-xxl-encoder-Q8_0.gguf
    qwen2vl_path:
    qwen2vl_vision_path:
    diffusion_model_path:              ./Wan2.2-TI2V-5B-Q8_0.gguf
    high_noise_diffusion_model_path:
    vae_path:                          ./wan2.2_vae.safetensors
    taesd_path:
    esrgan_path:
    control_net_path:
    embedding_dir:
    photo_maker_path:
    pm_id_images_dir:
    pm_id_embed_path:
    pm_style_strength:                 20.00
    output_path:                       output.png
    init_image_path:
    end_image_path:
    mask_image_path:
    control_image_path:
    ref_images_paths:
    control_video_path:
    auto_resize_ref_image:             true
    increase_ref_index:                false
    offload_params_to_cpu:             true
    clip_on_cpu:                       false
    control_net_cpu:                   false
    vae_on_cpu:                        false
    diffusion flash attention:         true
    diffusion Conv2d direct:           false
    vae_conv_direct:                   false
    control_strength:                  0.90
    prompt:                            a lovely cat
    negative_prompt:                   色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作 品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG 压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走
    clip_skip:                         -1
    width:                             480
    height:                            832
    sample_params:                     (txt_cfg: 6.00, img_cfg: 6.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: euler, sample_steps: 20, eta: 0.00, shifted_timestep: 0)
    high_noise_sample_params:          (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 3, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: default, sample_method: default, sample_steps: -1, eta: 0.00, shifted_timestep: 0)
    moe_boundary:                      0.875
    prediction:                        default
    flow_shift:                        3.00
    strength(img2img):                 0.75
    rng:                               cuda
    seed:                              42
    batch_count:                       1
    vae_tiling:                        false
    force_sdxl_vae_conv_scale:         false
    upscale_repeats:                   1
    chroma_use_dit_mask:               true
    chroma_use_t5_mask:                false
    chroma_t5_mask_pad:                1
    video_frames:                      33
    vace_strength:                     1.00
    fps:                               16
System Info:
    SSE3 = 1
    AVX = 1
    AVX2 = 1
    AVX512 = 0
    AVX512_VBMI = 0
    AVX512_VNNI = 0
    FMA = 1
    NEON = 0
    ARM_FMA = 0
    F16C = 1
    FP16_VA = 0
    WASM_SIMD = 0
    VSX = 0
[DEBUG] stable-diffusion.cpp:147  - Using Vulkan backend
[DEBUG] ggml_extend.hpp:66   - ggml_vulkan: Found 1 Vulkan devices:
[DEBUG] ggml_extend.hpp:66   - ggml_vulkan: 0 = AMD Radeon RX 7900 XT (AMD proprietary driver) | uma: 0 | fp16: 1 | bf16: 1 | warp size: 64 | shared memory: 32768 | int dot: 1 | matrix cores: KHR_coopmat
[INFO ] stable-diffusion.cpp:203  - loading diffusion model from './Wan2.2-TI2V-5B-Q8_0.gguf'
[INFO ] model.cpp:1001 - load ./Wan2.2-TI2V-5B-Q8_0.gguf using gguf format
[DEBUG] model.cpp:1018 - init from './Wan2.2-TI2V-5B-Q8_0.gguf'
[ERROR] ggml_extend.hpp:75   - gguf_init_from_file_impl: tensor 'patch_embedding.weight' has invalid number of dimensions: 5 > 4
[ERROR] ggml_extend.hpp:75   - gguf_init_from_file_impl: failed to read tensor info
[ERROR] model.cpp:1027 - failed to open './Wan2.2-TI2V-5B-Q8_0.gguf' with gguf_init_from_file. Try to open it with GGUFReader.
[DEBUG] gguf_reader.hpp:198  - GGUF v3, tensor_count=825, metadata_kv_count=3
[DEBUG] model.cpp:1739 - patch_embedding_channels 147456
[INFO ] stable-diffusion.cpp:243  - loading t5xxl from './umt5-xxl-encoder-Q8_0.gguf'
[INFO ] model.cpp:1001 - load ./umt5-xxl-encoder-Q8_0.gguf using gguf format
[DEBUG] model.cpp:1018 - init from './umt5-xxl-encoder-Q8_0.gguf'
[INFO ] stable-diffusion.cpp:264  - loading vae from './wan2.2_vae.safetensors'
[INFO ] model.cpp:1004 - load ./wan2.2_vae.safetensors using safetensors format
[DEBUG] model.cpp:1109 - init from './wan2.2_vae.safetensors', prefix = 'vae.'
[DEBUG] model.cpp:1739 - patch_embedding_channels 147456
[INFO ] stable-diffusion.cpp:285  - Version: Wan 2.2 TI2V
[INFO ] stable-diffusion.cpp:312  - Weight type stat:                      f32: 74   |     f16: 720  |    q8_0: 469
[INFO ] stable-diffusion.cpp:313  - Conditioner weight type stat:          f32: 73   |    q8_0: 169
[INFO ] stable-diffusion.cpp:314  - Diffusion model weight type stat:      f32: 1    |     f16: 524  |    q8_0: 300
[INFO ] stable-diffusion.cpp:315  - VAE weight type stat:                  f16: 196
[DEBUG] stable-diffusion.cpp:317  - ggml tensor size = 400 bytes
[INFO ] wan.hpp:2123 - Wan2.2-TI2V-5B
[INFO ] stable-diffusion.cpp:451  - Using flash attention in the diffusion model
[DEBUG] ggml_extend.hpp:1783 - t5 params backend buffer size =  5757.05 MB(RAM) (242 tensors)
[DEBUG] ggml_extend.hpp:1783 - Wan2.2-TI2V-5B params backend buffer size =  5153.43 MB(RAM) (825 tensors)
[DEBUG] ggml_extend.hpp:1783 - wan_vae params backend buffer size =  1344.24 MB(RAM) (196 tensors)
[DEBUG] stable-diffusion.cpp:592  - loading weights
[DEBUG] model.cpp:1920 - using 12 threads for model loading
[DEBUG] model.cpp:1942 - loading tensors from ./Wan2.2-TI2V-5B-Q8_0.gguf
  |================================>                 | 825/1263 - 428.57it/s
[DEBUG] model.cpp:1942 - loading tensors from ./umt5-xxl-encoder-Q8_0.gguf
  |==========================================>       | 1067/1263 - 219.91it/s
[DEBUG] model.cpp:1942 - loading tensors from ./wan2.2_vae.safetensors
  |==================================================| 1263/1263 - 230.52it/s
[INFO ] model.cpp:2151 - loading tensors completed, taking 5.48s (process: 0.00s, read: 4.12s, memcpy: 0.00s, convert: 0.00s, copy_to_backend: 0.00s)
[INFO ] stable-diffusion.cpp:690  - total params memory size = 12254.72MB (VRAM 12254.72MB, RAM 0.00MB): text_encoders 5757.05MB(VRAM), diffusion_model 5153.43MB(VRAM), vae 1344.24MB(VRAM), controlnet 0.00MB(VRAM), pmid 0.00MB(VRAM)
[INFO ] stable-diffusion.cpp:769  - running in FLOW mode
[DEBUG] stable-diffusion.cpp:801  - finished loaded file
[INFO ] stable-diffusion.cpp:2745 - generate_video 480x832x33
[INFO ] stable-diffusion.cpp:947  - attempting to apply 0 LoRAs
[INFO ] stable-diffusion.cpp:967  - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:968  - prompt after extract and remove lora: "a lovely cat"
[DEBUG] conditioner.hpp:1415 - parse 'a lovely cat' to [['a lovely cat', 1], ]
[DEBUG] t5.hpp:402  - token length: 512
[INFO ] ggml_extend.hpp:1698 - t5 offload params (5757.05 MB, 242 tensors) to runtime backend (Vulkan0), taking 1.35s
[DEBUG] ggml_extend.hpp:1598 - t5 compute buffer size: 297.00 MB(VRAM)
[DEBUG] conditioner.hpp:1515 - computing condition graph completed, taking 1728 ms
[DEBUG] conditioner.hpp:1415 - parse '色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作 品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG 压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走' to [['色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作 品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG 压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走', 1], ]
[DEBUG] t5.hpp:402  - token length: 512
[INFO ] ggml_extend.hpp:1698 - t5 offload params (5757.05 MB, 242 tensors) to runtime backend (Vulkan0), taking 1.28s
[DEBUG] ggml_extend.hpp:1598 - t5 compute buffer size: 297.00 MB(VRAM)
[DEBUG] conditioner.hpp:1515 - computing condition graph completed, taking 1667 ms
[INFO ] stable-diffusion.cpp:2999 - get_learned_condition completed, taking 3412 ms
[DEBUG] stable-diffusion.cpp:3055 - sample 30x52x9
[INFO ] ggml_extend.hpp:1698 - Wan2.2-TI2V-5B offload params (5153.43 MB, 825 tensors) to runtime backend (Vulkan0), taking 1.69s
[DEBUG] ggml_extend.hpp:1598 - Wan2.2-TI2V-5B compute buffer size: 335.35 MB(VRAM)
  |==================================================| 20/20 - 4.17s/it
[INFO ] stable-diffusion.cpp:3082 - sampling completed, taking 83.52s
[INFO ] stable-diffusion.cpp:3103 - generating latent video completed, taking 84.18s
[INFO ] ggml_extend.hpp:1698 - wan_vae offload params (1344.24 MB, 196 tensors) to runtime backend (Vulkan0), taking 0.23s
ggml_vulkan: Device memory allocation of size 2760376320 failed.
ggml_vulkan: Requested buffer size exceeds device buffer size limit: ErrorOutOfDeviceMemory
[ERROR] ggml_extend.hpp:75   - ggml_gallocr_reserve_n: failed to allocate Vulkan0 buffer of size 21773256964
[ERROR] ggml_extend.hpp:1588 - wan_vae: failed to allocate the compute buffer
[1]    1506 segmentation fault  ./sd.exe -M vid_gen --diffusion-model ./Wan2.2-TI2V-5B-Q8_0.gguf --vae    -p
```

### Additional context / environment details

cpu 5900x
gpu 7900xt (20G memory 32G share memeory)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] wan2.2 t2v run fail with vulkan #942

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] wan2.2 t2v run fail with vulkan #942

Description

Git commit

Operating System & Version

GGML backends

Command-line arguments used

Steps to reproduce

What you expected to happen

What actually happened

Logs / error messages / stack trace

Additional context / environment details

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions