ComfyUI Tutorial: Generate Your First Image & Understand Core Nodes

In this tutorial, you will learn how to generate your first image in ComfyUI and understand the core nodes that power all workflows. By the end, you’ll know how to customize your own pipelines and reproduce consistent results.


Step 1: Load a Text-to-Image Workflow

In the new ComfyUI version, you can start with the built-in templates:

Template → Getting Started → Text-to-Image Workflow (New)

Tip: If you don’t see this template, update your ComfyUI to the latest version. Alternatively, you can use the purple Image-to-Image workflow; the steps are similar.

templates


Step 2: Resolve Missing Model Errors

On first launch, ComfyUI usually shows a “missing model” error. This is normal because it doesn’t include models by default.

How to fix it:

  1. Click the Download button next to the missing model.
  2. If your GPU has 8 GB VRAM or less, use the FP8 version of the model (introduced later).
  3. Wait for ComfyUI to download the required files automatically.

missing-model-error

Notes:

  • If auto-download fails, manually place the models in the correct folders:
    • Diffusion models → ComfyUI/models/diffusion
    • CLIP models → ComfyUI/models/clip
    • VAE models → ComfyUI/models/vae

download-models


Step 3: Run Your Workflow

  1. Select the Diffusion model
  2. Select the VAE model
  3. Select the CLIP model

Click the blue “Run” button to generate your first image.


Step 4: Access Generated Images

ComfyUI saves images automatically in two locations:

  • Assets panel in the left-side navigator
  • Local folder: ComfyUI/output

Quick Tips for Beginners

  • Run button is grayed out: Check that all required models are loaded and there are no red errors.
  • Speed: NVIDIA GPUs with CUDA are fastest. AMD GPUs need ROCm. CPU-only generation is very slow.
  • FP8 models: Recommended for low VRAM GPUs. They reduce memory usage without a big loss in quality.

Understanding Core Nodes

Now that you’ve generated your first image, let’s explore core nodes and their functions.

Nodes are the rectangular blocks on the canvas. Each node contains:

  • Inputs: Data or parameters (prompts, model, vectors)
  • Outputs: Processed data sent to other nodes
  • Parameters: Settings like image resolution or CFG

Nodes are connected using colored lines to form a pipeline.


Core Nodes and Their Functions

NodeFunction
Load Diffusion ModelLoads the main AI model (Diffusion) used for image generation.
Load CLIP ModelLoads the CLIP text encoder to convert prompts into vectors.
Load VAE ModelLoads the VAE decoder for turning latent vectors into visible images.
EmptySD3LatentImageCreates a blank latent canvas. Set width/height (e.g., 512×512, 1024×1024).
CLIP Text Encoder (Prompt)Converts human-readable prompts into numeric vectors for the model.
Conditioning ZeroOutProvides an empty negative prompt input when required.
ModelSamplingAuraFlowSampler using AuraFlow algorithm to generate images.
KSamplerRuns the diffusion process to generate image vectors.
VAE DecoderConverts latent vectors into visible images.
Save ImagesSaves the final image locally.

How Nodes Work Together (Simplified)

  1. Load Diffusion Model → provides the AI model
  2. Load CLIP Model → provides the text encoder
  3. Load VAE Model → provides the decoder
  4. CLIP Text Encoder → converts prompts to vectors
  5. Empty Latent Image → creates blank latent canvas
  6. KSampler → generates latent vectors
  7. VAE Decoder → converts latent vectors to an image
  8. Save Images → stores the image locally

Dive Deeper: KSampler Node

Seed

  • Determines starting noise.
  • Same seed → reproducible image.
  • Increment/Decrement → small variations for creative exploration.

Steps

  • Number of iterations for image generation.
  • Higher steps → more detail, slower generation
  • Lower steps → faster, less detail
  • Modern models may generate high-quality images with fewer steps.

CFG (Classifier-Free Guidance)

  • Controls how strongly the model follows your prompt.
  • High CFG → closely matches prompt
  • Low CFG → more creative variation
  • Distilled models may behave differently; experiment is key.

Sampler

  • Defines denoising algorithm (e.g., Euler, Euler Ancestral, DPM++ 2M SDE, LCM, Turbo).
  • Affects speed, stability, and detail quality.

Scheduler

  • Controls noise progression per step.
  • Popular choice: Karras, stable and reliable.

Denoise

  • Strength of image regeneration (0–1)
  • 1 → full generation (text-to-image)
  • 0 → no change
  • 0–1 → partial redraw (image-to-image, refinement, or style transfer)

FAQ (For Google & Readers)

Q: What models are required for ComfyUI? A: Diffusion, CLIP, and VAE models. Auto-download works for most templates, or place manually in the ComfyUI/models folder.

Q: Why is the Run button disabled? A: Make sure all required models are loaded and no red error messages are present.

Q: Can I reproduce the same image? A: Yes, set the same seed in KSampler. Use Increment/Decrement for slight variations.

Q: Can I run ComfyUI without NVIDIA GPU? A: Yes, but CPU-only generation is very slow. AMD GPUs require ROCm setup.