🖼️ AI Image generation

Jump to bottom

Marcos Junior edited this page Oct 3, 2025 · 7 revisions

Tools to run existing models

AUTOMATIC1111 stable-diffusion-webui
- Really simple to use, friendly interface but very limited
ComfyUI
- Way complex, hard configuration but large community and content

Sites to get existing workflows and loras

Good models

Wan 2.2
- I2V = Image to Video
- T2V = Text to Video
Flux dev
- T2I = Text to Image

Configurations that can really improve the generation

Steps = Number of denoising iterations.
- The number could variate depending the model this is how it works
  - Low -> fast, but less detail.
  - Medium -> sweet spot.
  - High -> can improve realism, but diminishing returns (and slower).
- The number can vary between (1 and N)

Too high sometimes makes images look “overcooked.”

Sampler = The algorithm that controls how noise is removed step by step
- Euler a -> fast, creative, less deterministic.
- DPM++ 2M Karras -> smooth, detailed, highly recommended.
- DDIM -> older, predictable, fast for testing.
CFG = Classifier-Free Guidance
- Used to control how strongly the model follows your text prompt versus just producing random plausible images.
- The number could variate depending the model this is how it works
  - Low CFG -> The model has a lot of freedom.
    - Output may look more "natural" but not match your text closely.
  - Medium CFG -> Balanced. This is the sweet spot for most models.
  - High CFG -> Model is forced to follow the prompt exactly.
    - Can cause overbaked, distorted, or ugly results (like too many extra fingers, harsh edges).
- The number can vary between (1 and N)

How generative models work

Overview
- https://www.youtube.com/watch?v=F1X4fHzF4mQ
Diffusion model
- https://www.youtube.com/watch?v=iv-5mZ_9CPY

Next step

Deep understanding about diffusion model
- https://github.com/CompVis/stable-diffusion/blob/main/main.py