-
Notifications
You must be signed in to change notification settings - Fork 440
Description
Feature Summary
Add toggle that forces each model to be in memory just when needed.
Detailed Description
Hello, as you may know Apple Silicon MacBooks have unified memory architecture. Which basically means kicking models out of VRAM doesnt do anything, when they "return to RAM" on this devices those models still occupy unified memory.
Considering how fast SSD's are on this device it would be great to be able to load CLIP's, encode text, then unload CLIP, load for example WAN, create latent image, unload WAN, then finally load VAE, and generate output.
Unloading models completely changes game for MacBooks, instead of worrying that all you models combined + theirs context must take less space than is available your only limit would be current model + it's context + input from previous model.
I would tackle it, but i don't really use / know C++ i'm afraid, you can still point me in the right direction and i can try to see if i can do something alone if nobody else has time for this.
Alternatives you considered
No response
Additional context
No response