recent updates

2026-06-09 18:31:59 +02:00
parent 398b94965c
commit 94f7a1f72a
42 changed files with 8686 additions and 0 deletions
@@ -0,0 +1,100 @@
+# 01 — ComfyUI Setup
+
+ComfyUI is installed at `~/ComfyUI/` on the Arising Media workstation.
+Python venv is at `~/ComfyUI/venv/`.
+
+## Starting ComfyUI
+
+```bash
+tmux new-session -d -s comfyui \
+  "cd ~/ComfyUI && HSA_OVERRIDE_GFX_VERSION=10.3.0 venv/bin/python main.py --listen 0.0.0.0 --port 8188 2>&1 | tee ~/comfyui.log"
+```
+
+**Do NOT use `--cpu`.** The GPU is an AMD Ryzen 9 9950X integrated graphics
+(gfx1036, RDNA 2 iGPU) with 30,942 MB unified VRAM (shares system RAM).
+All models fit: FLUX (12GB), Wan 2.2 (3.2GB), T5-XXL (4.6GB).
+
+`HSA_OVERRIDE_GFX_VERSION=10.3.0` is required — gfx1036 (iGPU) is not in
+the PyTorch ROCm kernel list, but gfx1030 (RDNA 2 dGPU) is compatible.
+Without the override: `HIP error: invalid device function` on first compute op.
+
+Previous SOP said 2GB VRAM — that was wrong. It was reading the dedicated
+VRAM pool, not the full unified memory PyTorch allocates via ROCm.
+
+Verify it's up:
+```bash
+curl -s -o /dev/null -w "%{http_code}" http://localhost:8188/system_stats
+# should return 200 within 30 seconds
+```
+
+Check the log for node load errors:
+```bash
+tmux attach -t comfyui
+```
+
+## Required custom nodes
+
+Both installed at `~/ComfyUI/custom_nodes/`:
+
+- `ComfyUI-GGUF` — loads GGUF quantized models (FLUX, Wan 2.2)
+- `ComfyUI-Detail-Daemon` — optional, detail enhancement
+
+If `ComfyUI-GGUF` fails to load, check for missing Python packages:
+```bash
+~/ComfyUI/venv/bin/pip install gguf sqlalchemy
+```
+
+## Known dependency gaps (fix if ComfyUI fails to start)
+
+```bash
+~/ComfyUI/venv/bin/pip install sqlalchemy gguf
+```
+
+Audio nodes (`nodes_audio.py`, `nodes_lt_audio.py`) will fail to import
+because `torchaudio` is not installed. This is safe to ignore — audio
+nodes are not used in this pipeline.
+
+## GPU note
+
+GPU: AMD Ryzen 9 9950X integrated graphics (gfx1036, RDNA 2 iGPU)
+Unified memory: 30,942 MB available to PyTorch via ROCm (shares system RAM)
+
+```bash
+# Verify ROCm sees the GPU
+~/ComfyUI/venv/bin/python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"
+# returns True / AMD Ryzen 9 9950X 16-Core Processor
+
+# Verify arch override works
+HSA_OVERRIDE_GFX_VERSION=10.3.0 ~/ComfyUI/venv/bin/python -c "
+import torch; x=torch.tensor([1.0]).cuda(); print('GPU OK:', x.device)
+"
+```
+
+gfx1036 requires `HSA_OVERRIDE_GFX_VERSION=10.3.0` — always set this env var
+before starting ComfyUI or running any Python that loads GPU tensors.
+Without it: `HIP error: invalid device function` immediately on first op.
+
+## Model folder structure
+
+```
+~/ComfyUI/models/
+├── unet/
+│   └── flux1-schnell-Q8_0.gguf          (12GB, FLUX image)
+├── clip/
+│   ├── clip_l.safetensors               (235MB, FLUX CLIP-L)
+│   ├── t5xxl_fp8_e4m3fn.safetensors     (4.6GB, FLUX T5-XXL)
+│   └── umt5_xxl_fp8_e4m3fn_scaled.safetensors  (6.3GB, Wan text encoder)
+├── vae/
+│   ├── ae.safetensors                   (108MB, FLUX VAE)
+│   └── wan_2.1_vae.safetensors          (243MB, Wan VAE)
+└── diffusion_models/
+    └── Wan2.2-TI2V-5B-Q4_K_M.gguf      (3.2GB, Wan 2.2 video)
+```
+
+## Stopping ComfyUI
+
+```bash
+tmux send-keys -t comfyui C-c
+# or kill the session:
+tmux kill-session -t comfyui
+```
@@ -0,0 +1,99 @@
+# 02 — FLUX.1 Schnell Image Pipeline
+
+## Why FLUX over SDXL
+
+FLUX is a 12B-parameter transformer model. SDXL (RealVisXL) is 3.5B.
+FLUX has significantly better:
+- Spatial depth and perspective (lens simulation)
+- Scene geometry (vanishing points, depth-of-field)
+- Prompt following (T5-XXL understands long, detailed prompts)
+
+SDXL was tested on lahrcarpetcleaning.com and rejected: flat angles, no depth,
+poor spatial coherence. FLUX replaced it entirely.
+
+## Model stack
+
+| File | Size | Notes |
+|---|---|---|
+| flux1-schnell-Q8_0.gguf | 12GB | GGUF Q8, needs ComfyUI-GGUF node |
+| t5xxl_fp8_e4m3fn.safetensors | 4.6GB | T5-XXL text encoder, fp8 quantized |
+| clip_l.safetensors | 235MB | CLIP-L, short prompt encoder |
+| ae.safetensors | 108MB | Official FLUX VAE from Black Forest Labs |
+
+## Download (one-time)
+
+FLUX GGUF (public, no auth):
+```bash
+wget "https://huggingface.co/city96/FLUX.1-schnell-gguf/resolve/main/flux1-schnell-Q8_0.gguf" \
+  -O ~/ComfyUI/models/unet/flux1-schnell-Q8_0.gguf
+
+wget "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors" \
+  -O ~/ComfyUI/models/clip/t5xxl_fp8_e4m3fn.safetensors
+
+wget "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors" \
+  -O ~/ComfyUI/models/clip/clip_l.safetensors
+```
+
+FLUX VAE (gated — requires HF login and license acceptance):
+```bash
+hf auth login   # paste read token
+HF_TOKEN=$(cat ~/.cache/huggingface/token)
+wget --header="Authorization: Bearer $HF_TOKEN" \
+  "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors" \
+  -O ~/ComfyUI/models/vae/ae.safetensors
+```
+
+## ComfyUI workflow (what gen-images-flux.py sends)
+
+```
+UnetLoaderGGUF    → flux1-schnell-Q8_0.gguf
+DualCLIPLoader    → t5xxl_fp8_e4m3fn + clip_l (type=flux)
+VAELoader         → ae.safetensors
+CLIPTextEncode    → prompt
+EmptyLatentImage  → 1024×576, batch=1
+KSampler          → steps=4, cfg=1.0, euler, simple
+VAEDecode
+SaveImage
+```
+
+## Settings
+
+| Setting | Value | Why |
+|---|---|---|
+| Steps | 4 | Schnell is distilled — 4 steps is optimal |
+| CFG | 1.0 | Distilled model, higher CFG degrades quality |
+| Sampler | euler | Best for FLUX |
+| Scheduler | simple | Matches FLUX training |
+| Negative prompt | none | Distilled model ignores it |
+| Resolution | 1024×576 | 16:9 hero format |
+
+## Running generation
+
+```bash
+# ComfyUI must be running first (see 01-comfyui-setup.md)
+cd /home/sirdrez/arisingmedia-websites/{domain}
+python3 tools/gen-images-flux.py 2>&1 | tee tools/flux-gen.log
+```
+
+Monitor:
+```bash
+tmux attach -t comfyui     # step progress bars
+tail -f tools/flux-gen.log  # per-image OK/FAIL
+```
+
+Speed: ~4 min/image on CPU (2GB VRAM insufficient for GPU). 28 images = ~1h50m.
+
+## After generation
+
+```bash
+python3 tools/convert-to-webp.py          # resize + convert to WebP
+rm assets/images/**/*.jpg                  # delete source JPGs
+docker compose build --no-cache web        # bake WebP into image
+docker compose up -d
+```
+
+Verify:
+```bash
+curl -s -o /dev/null -w "%{http_code}" http://localhost:{port}/assets/images/hero/hero-carpet-cleaning.webp
+# must return 200
+```
@@ -0,0 +1,159 @@
+# 03 — Wan 2.2 Video Pipeline (Image-to-Video)
+
+## Default policy: local generation
+
+Video generation is done locally with Wan 2.2 by default. Google Veo (via
+Vertex AI / Gemini API) is NOT used unless the client has explicit budget
+allocated for it. Reasons:
+
+- Google Veo costs money per second of video generated (billed per request)
+- Local Wan 2.2 is free after one-time model download (~10GB total)
+- Quality from Wan 2.2 at 832x480 is sufficient for hero reels
+- No API key, no quota limits, no vendor dependency
+
+Use Google Veo only when: client approves a paid media budget, OR the local
+workstation is unavailable and a deadline cannot wait for CPU generation time.
+
+## Purpose
+
+Takes FLUX-generated hero stills and animates each into a 3-5 second clip.
+Clips are stitched with ffmpeg into a marketing reel for the hero section.
+
+## Model stack
+
+| File | Size | Notes |
+|---|---|---|
+| Wan2.2-TI2V-5B-Q4_K_M.gguf | 3.2GB | Text+Image to Video, 5B Q4 GGUF |
+| umt5_xxl_fp8_e4m3fn_scaled.safetensors | 6.3GB | UMT5-XXL text encoder, fp8 |
+| wan_2.1_vae.safetensors | 243MB | Wan VAE (compatible with 2.2) |
+
+## Download (one-time, all public)
+
+```bash
+# Wan 2.2 model
+wget "https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q4_K_M.gguf" \
+  -O ~/ComfyUI/models/diffusion_models/Wan2.2-TI2V-5B-Q4_K_M.gguf
+
+# Text encoder
+wget "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors" \
+  -O ~/ComfyUI/models/clip/umt5_xxl_fp8_e4m3fn_scaled.safetensors
+
+# VAE
+wget "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors" \
+  -O ~/ComfyUI/models/vae/wan_2.1_vae.safetensors
+```
+
+## Critical: WanImageToVideo is a conditioning node, NOT a sampler
+
+This is the most important thing to understand about the Wan pipeline. The node
+name is misleading. `WanImageToVideo` does NOT run diffusion — it sets up the
+conditioning and empty latent. A separate `KSampler` runs the actual diffusion.
+
+Wrong mental model (what most tutorials imply):
+```
+LoadImage → WanImageToVideo → SaveAnimatedWEBP
+```
+
+Correct node graph:
+```
+UnetLoaderGGUF  ─────────────────────────────────────→ KSampler.model
+CLIPLoader ──→ CLIPTextEncode (positive) ─→ WanImageToVideo.positive ──→ KSampler.positive
+           └→ CLIPTextEncode (negative) ─→ WanImageToVideo.negative ──→ KSampler.negative
+VAELoader ──→ WanImageToVideo.vae                  WanImageToVideo.latent ──→ KSampler.latent_image
+LoadImage ──→ WanImageToVideo.start_image (optional)
+                                                   KSampler.samples ──→ VAEDecode ──→ SaveAnimatedWEBP
+```
+
+WanImageToVideo outputs three things (in order):
+- output[0] = positive CONDITIONING (enhanced with image)
+- output[1] = negative CONDITIONING
+- output[2] = latent LATENT (sized for video: width × height × frames)
+
+The `start_image` input (optional IMAGE) anchors the first frame. Without it,
+video starts from noise. Always pass it for image-to-video.
+
+## Workflow
+
+Correct ComfyUI API node graph (as sent by `gen-video-wan.py`):
+
+```
+node 1: UnetLoaderGGUF    → Wan2.2-TI2V-5B-Q4_K_M.gguf
+node 2: CLIPLoader        → umt5_xxl_fp8_e4m3fn_scaled.safetensors (type=wan)
+node 3: VAELoader         → wan_2.1_vae.safetensors
+node 4: LoadImage         → FLUX hero still (.webp)
+node 5: CLIPTextEncode    → motion prompt text (positive)
+node 6: CLIPTextEncode    → negative prompt text
+node 7: WanImageToVideo   → positive=[5,0], negative=[6,0], vae=[3,0],
+                            start_image=[4,0], width=832, height=480,
+                            length=25 (or 49), batch_size=1
+node 8: KSampler          → model=[1,0], positive=[7,0], negative=[7,1],
+                            latent_image=[7,2], steps=20, cfg=6.0,
+                            sampler_name=uni_pc, scheduler=simple, denoise=1.0
+node 9: VAEDecode         → samples=[8,0], vae=[3,0]
+node 10: SaveAnimatedWEBP → images=[9,0], fps=12
+```
+
+## Settings
+
+| Setting | Value |
+|---|---|
+| Resolution | 832×480 (16:9 ~480p) |
+| Frames | 49 (~4 seconds at 12fps) |
+| Steps | 20 |
+| CFG | 6.0 |
+| Sampler | uni_pc |
+
+**Frame count constraint:** `length` must follow the pattern 1, 5, 9, 13, 17, 21, 25, 29 ... (step of 4).
+ComfyUI enforces this. 49 is valid (1 + 4×12). 50 is not.
+
+**CPU speed on Arising Media workstation (2GB VRAM, CPU inference):**
+- ~415 seconds per diffusion step
+- 20 steps × 415s = ~2.3 hours per clip
+- 6 clips = ~14 hours total for a full reel
+- Use 25 frames (not 49) for test runs to halve generation time
+- Full reel generation: start before leaving for the day, check next morning
+
+**CLIPVision note:** No CLIPVision models are installed at `~/ComfyUI/models/clip_vision/`.
+The `clip_vision_output` input on WanImageToVideo is optional and currently unused.
+Image conditioning comes from `start_image` only (VAE-encoded first frame).
+This is sufficient for smooth motion — CLIPVision would add semantic image
+understanding but is not required.
+
+## Running video generation
+
+```bash
+# ComfyUI must be running, FLUX images must be converted to WebP first
+cd /home/sirdrez/arisingmedia-websites/{domain}
+python3 tools/gen-video-wan.py 2>&1 | tee tools/wan-gen.log
+```
+
+Output goes to `assets/videos/clips/` as `.webp` animation files.
+
+## Stitching the reel
+
+```bash
+# Create file list
+ls assets/videos/clips/*.webp | sort | while read f; do echo "file '$PWD/$f'"; done > tools/clip-list.txt
+
+# Convert webp animations to mp4 first (if needed)
+for f in assets/videos/clips/*.webp; do
+  ffmpeg -i "$f" "${f%.webp}.mp4" -y
+done
+
+# Stitch
+ls assets/videos/clips/*.mp4 | sort | while read f; do echo "file '$PWD/$f'"; done > tools/clip-list.txt
+ffmpeg -f concat -safe 0 -i tools/clip-list.txt -c copy assets/videos/hero/hero-reel-flux.mp4
+```
+
+## Reel shot list (lahrcarpetcleaning.com)
+
+| Clip | Source still | Motion prompt |
+|---|---|---|
+| clip-01 | hero-carpet-cleaning | slow dolly forward across carpet |
+| clip-02 | hero-stairs | slow pan upward along staircase |
+| clip-03 | hero-upholstery | gentle push in toward sofa |
+| clip-04 | hero-commercial | tracking shot down lobby |
+| clip-05 | hero-floors | floor-level drift forward |
+| clip-06 | hero-clean-result | rack focus across carpet fibers |
+
+6 clips × ~4s = ~24 seconds total reel.
@@ -0,0 +1,105 @@
+# 04 — Prompt Guide (Interior / Carpet Photography)
+
+## The core pattern
+
+All image prompts follow this structure:
+
+```
+{camera angle} {lens} {subject description},
+{foreground detail} sharp in foreground, {background} receding into bokeh,
+{lighting description}, {style tag}, no people, ultra-realistic {type} photography
+```
+
+## Why this works
+
+FLUX.1 Schnell uses T5-XXL as the primary text encoder (6GB model) which
+understands natural language photography concepts deeply. Specifying lens
+focal length, depth of field, and spatial relationships produces images with
+correct depth, perspective, and scene geometry.
+
+SDXL models lack this — their text encoders (CLIP-L/CLIP-G) top out at
+77 tokens and don't understand spatial concepts reliably.
+
+## Lens vocabulary
+
+| Lens | Effect | Use for |
+|---|---|---|
+| 24mm wide-angle | Strong perspective distortion, exaggerated depth | Corridors, lobbies, open spaces |
+| 35mm | Natural perspective, slight depth emphasis | Most interior shots |
+| 50mm prime | Near-natural perspective, shallow DoF | Close-ups, furniture, details |
+| macro | Extreme close-up, very shallow DoF | Carpet fiber detail, texture |
+
+## Camera position vocabulary
+
+- `low-angle` / `low 35mm angle` — camera near floor level, looking across surface
+- `floor level` — pressed to the floor, extreme low angle
+- `corner angle` — shot from room corner, wide coverage
+- `looking up` — camera below subject, looking upward
+- `looking down` — camera above, bird's-eye (avoid for carpet — looks flat)
+
+## Depth of field vocabulary
+
+- `shallow depth of field` — subject sharp, background blurred
+- `razor sharp in foreground ... receding into bokeh` — specific foreground/background split
+- `raking light` — light hitting surface at low angle, reveals texture
+- `vanishing point perspective` — strong linear convergence (corridors, offices)
+
+## Lighting vocabulary
+
+- `warm afternoon window light` — residential, golden hour feel
+- `raking natural light` — reveals carpet texture and pile height
+- `recessed ceiling lights creating depth` — commercial/corporate
+- `warm wall sconces` — hotel corridors
+- `crisp morning light` — bedrooms, bright and clean
+
+## Full prompt examples
+
+Carpet hero (residential):
+```
+low-angle 35mm lens perspective looking across thick plush cream carpet
+in an upstate New York living room, carpet fibers razor sharp in foreground,
+couch and coffee table receding into shallow bokeh background,
+warm afternoon window light raking across carpet texture,
+Finger Lakes farmhouse interior, no people,
+ultra-realistic architectural photography, 16:9
+```
+
+Hotel corridor:
+```
+low 24mm lens looking down a long hotel corridor from floor level,
+patterned burgundy carpet runner sharp in extreme foreground receding to vanishing point,
+warm wall sconces lining white walls, numbered doors converging in perspective,
+no people, ultra-realistic hospitality photography, 16:9
+```
+
+Hardwood floor:
+```
+low 24mm angle pressed to gleaming light oak hardwood floor,
+floor grain razor sharp in extreme foreground receding to hallway vanishing point,
+white walls, natural light streaming in, shallow depth of field,
+no people, ultra-realistic interior photography, 16:9
+```
+
+## What NOT to include
+
+- No people, no faces, no hands, no feet, no shoes/boots
+- No cleaning machines, vacuums, steam equipment, hoses
+- No text, logos, watermarks in the scene
+- No "before state" (dirty carpet, stains) — only clean result
+- No "wide shot" without camera angle qualifier — produces flat frontal views
+
+## Video motion prompts (Wan 2.2)
+
+For animating stills, describe the camera motion, not the scene content:
+
+```
+slow dolly forward across {subject}, gentle camera push toward the far wall,
+{lighting}, cinematic, smooth motion
+```
+
+Motion types:
+- `slow dolly forward` — push toward subject
+- `slow pan {direction}` — lateral camera rotation
+- `tracking shot moving forward` — camera travels through space
+- `rack focus` — lens focus shifts from foreground to background
+- `gentle push in` — subtle zoom/move toward subject
@@ -0,0 +1,87 @@
+# 05 — Quality Improvement Levers
+
+Three levers control FLUX output quality, in order of impact:
+
+## 1. Prompt (highest impact, zero cost)
+
+Incoherent objects in the frame are almost always prompt bleed — the model fills
+empty or ambiguous space with training-data defaults. Fix by naming every part of
+the frame explicitly.
+
+**Background** — name it, don't imply it:
+- Bad: "living room" (model invents furniture, decor, wall art)
+- Good: "plain cream painted wall with a single frosted sliding glass door"
+
+**Floor material** — always explicit:
+- "plush cream berber carpet" or "light oak hardwood floor"
+- Ambiguous floor → random floor type generated
+
+**Ceiling** — if visible, name it; if not wanted, push it out of frame with a
+lower camera angle:
+- "white drop ceiling with recessed can lights"
+- Or: lower the angle until ceiling exits the frame entirely
+
+**Negative scene elements** — add inline, not as a separate negative prompt
+(FLUX Schnell ignores negative prompts):
+- "no furniture clutter, no decorative objects, no picture frames, no signage"
+- "no cleaning equipment, no machines, no people"
+
+**What not to use:**
+- "wide shot" without a camera angle qualifier — produces flat frontal views
+- Vague room names ("office", "lobby") without specifying what fills the space
+
+## 2. Steps (marginal gain, 2x slower)
+
+FLUX Schnell is distilled to 4 steps. The distillation process compresses
+a full diffusion model's quality into very few steps.
+
+| Steps | Quality change | Time impact |
+|---|---|---|
+| 4 (default) | Baseline | ~4 min/image |
+| 6 | Slightly sharper edges, cleaner fine detail | ~6 min/image |
+| 8 | Diminishing returns past 6 | ~8 min/image |
+
+Not recommended as a first fix. The distillation ceiling is the constraint,
+not step count. Step increases help texture detail but will not fix scene
+incoherence — that requires prompt changes.
+
+KSampler in `gen-images-flux.py`:
+```python
+"steps": 4,   # increase to 6 for detail passes
+```
+
+## 3. Model size (real quality jump, 6x slower on CPU)
+
+| Model | Steps | Quality | CPU time/image |
+|---|---|---|---|
+| FLUX.1 Schnell (current) | 4 | Good depth, some coherence gaps | ~4 min |
+| FLUX.1 Dev (full, non-distilled) | 20-30 | Better coherence, sharper geometry | ~20-30 min |
+
+FLUX Dev would fix most coherence issues. At current CPU-only speed (2GB VRAM
+insufficient), a full 28-image batch would take 9+ hours.
+
+**Practical path to FLUX Dev:**
+- Cloud GPU: RunPod or Vast.ai A100 runs FLUX Dev in ~90 seconds/image
+- Same prompts, same ComfyUI workflow — only model file and step count change
+- Switch `flux1-schnell-Q8_0.gguf` → FLUX Dev GGUF, set steps to 20, cfg to 3.5
+
+## Decision matrix
+
+| Issue | Fix |
+|---|---|
+| Objects that shouldn't be in frame | Prompt: name every surface explicitly |
+| Wrong floor/wall material | Prompt: be specific about material |
+| Flat angle despite prompt | Prompt: add "low-angle", lens mm, "foreground sharp" |
+| Soft edges on carpet fibers | Steps: increase 4 → 6 |
+| Incoherent room geometry | Model: switch to FLUX Dev on cloud GPU |
+| Overall composition wrong | Prompt: camera position + lens + foreground/bokeh split |
+
+## Re-running specific images
+
+To re-run only the problem frames without regenerating all 28:
+
+1. Edit `tools/gen-images-flux.py`
+2. Change the `IMAGES` list to include only the failed image keys
+3. Run: `python3 tools/gen-images-flux.py 2>&1 | tee tools/flux-gen.log`
+4. Run: `python3 tools/convert-to-webp.py` (converts only new JPGs)
+5. Rebuild: `docker compose build --no-cache web && docker compose up -d`
@@ -0,0 +1,46 @@
+# Local Image Generation — SOPs
+
+Complete reference for generating site images locally using ComfyUI.
+No cloud API required. No per-image cost. Runs on the Arising Media workstation.
+
+## Index
+
+1. [01-comfyui-setup.md](01-comfyui-setup.md) — Installing ComfyUI, venv, GGUF node
+2. [02-flux-images.md](02-flux-images.md) — FLUX.1 Schnell image generation pipeline
+3. [03-wan-video.md](03-wan-video.md) — Wan 2.2 image-to-video pipeline
+4. [04-prompt-guide.md](04-prompt-guide.md) — Prompt patterns for interior/carpet photography
+5. [05-quality-levers.md](05-quality-levers.md) — Prompt, steps, model size: what to adjust and when
+
+## Quick start (images already set up)
+
+```bash
+# 1. Start ComfyUI
+tmux new-session -d -s comfyui \
+  "cd ~/ComfyUI && venv/bin/python main.py --listen 0.0.0.0 --port 8188 --cpu 2>&1 | tee ~/comfyui.log"
+
+# 2. Wait ~30s, then generate images
+cd /home/sirdrez/arisingmedia-websites/{domain}
+python3 tools/gen-images-flux.py 2>&1 | tee tools/flux-gen.log
+
+# 3. Convert to WebP and deploy
+python3 tools/convert-to-webp.py
+rm assets/images/**/*.jpg
+docker compose build --no-cache web && docker compose up -d
+```
+
+## Model files (installed at ~/ComfyUI/models/)
+
+| Purpose | File | Size | Location |
+|---|---|---|---|
+| FLUX image UNet | flux1-schnell-Q8_0.gguf | 12GB | models/unet/ |
+| FLUX T5 encoder | t5xxl_fp8_e4m3fn.safetensors | 4.6GB | models/clip/ |
+| FLUX CLIP-L | clip_l.safetensors | 235MB | models/clip/ |
+| FLUX VAE | ae.safetensors | 108MB | models/vae/ |
+| Wan 2.2 video | Wan2.2-TI2V-5B-Q4_K_M.gguf | 3.2GB | models/diffusion_models/ |
+| Wan UMT5 encoder | umt5_xxl_fp8_e4m3fn_scaled.safetensors | 6.3GB | models/clip/ |
+| Wan VAE | wan_2.1_vae.safetensors | 243MB | models/vae/ |
+
+## Reference project
+
+`lahrcarpetcleaning.com` — first project using this full pipeline.
+Scripts: `tools/gen-images-flux.py`, `tools/gen-video-wan.py`, `tools/convert-to-webp.py`