recent updates
This commit is contained in:
@@ -0,0 +1,100 @@
|
||||
# 01 — ComfyUI Setup
|
||||
|
||||
ComfyUI is installed at `~/ComfyUI/` on the Arising Media workstation.
|
||||
Python venv is at `~/ComfyUI/venv/`.
|
||||
|
||||
## Starting ComfyUI
|
||||
|
||||
```bash
|
||||
tmux new-session -d -s comfyui \
|
||||
"cd ~/ComfyUI && HSA_OVERRIDE_GFX_VERSION=10.3.0 venv/bin/python main.py --listen 0.0.0.0 --port 8188 2>&1 | tee ~/comfyui.log"
|
||||
```
|
||||
|
||||
**Do NOT use `--cpu`.** The GPU is an AMD Ryzen 9 9950X integrated graphics
|
||||
(gfx1036, RDNA 2 iGPU) with 30,942 MB unified VRAM (shares system RAM).
|
||||
All models fit: FLUX (12GB), Wan 2.2 (3.2GB), T5-XXL (4.6GB).
|
||||
|
||||
`HSA_OVERRIDE_GFX_VERSION=10.3.0` is required — gfx1036 (iGPU) is not in
|
||||
the PyTorch ROCm kernel list, but gfx1030 (RDNA 2 dGPU) is compatible.
|
||||
Without the override: `HIP error: invalid device function` on first compute op.
|
||||
|
||||
Previous SOP said 2GB VRAM — that was wrong. It was reading the dedicated
|
||||
VRAM pool, not the full unified memory PyTorch allocates via ROCm.
|
||||
|
||||
Verify it's up:
|
||||
```bash
|
||||
curl -s -o /dev/null -w "%{http_code}" http://localhost:8188/system_stats
|
||||
# should return 200 within 30 seconds
|
||||
```
|
||||
|
||||
Check the log for node load errors:
|
||||
```bash
|
||||
tmux attach -t comfyui
|
||||
```
|
||||
|
||||
## Required custom nodes
|
||||
|
||||
Both installed at `~/ComfyUI/custom_nodes/`:
|
||||
|
||||
- `ComfyUI-GGUF` — loads GGUF quantized models (FLUX, Wan 2.2)
|
||||
- `ComfyUI-Detail-Daemon` — optional, detail enhancement
|
||||
|
||||
If `ComfyUI-GGUF` fails to load, check for missing Python packages:
|
||||
```bash
|
||||
~/ComfyUI/venv/bin/pip install gguf sqlalchemy
|
||||
```
|
||||
|
||||
## Known dependency gaps (fix if ComfyUI fails to start)
|
||||
|
||||
```bash
|
||||
~/ComfyUI/venv/bin/pip install sqlalchemy gguf
|
||||
```
|
||||
|
||||
Audio nodes (`nodes_audio.py`, `nodes_lt_audio.py`) will fail to import
|
||||
because `torchaudio` is not installed. This is safe to ignore — audio
|
||||
nodes are not used in this pipeline.
|
||||
|
||||
## GPU note
|
||||
|
||||
GPU: AMD Ryzen 9 9950X integrated graphics (gfx1036, RDNA 2 iGPU)
|
||||
Unified memory: 30,942 MB available to PyTorch via ROCm (shares system RAM)
|
||||
|
||||
```bash
|
||||
# Verify ROCm sees the GPU
|
||||
~/ComfyUI/venv/bin/python -c "import torch; print(torch.cuda.is_available()); print(torch.cuda.get_device_name(0))"
|
||||
# returns True / AMD Ryzen 9 9950X 16-Core Processor
|
||||
|
||||
# Verify arch override works
|
||||
HSA_OVERRIDE_GFX_VERSION=10.3.0 ~/ComfyUI/venv/bin/python -c "
|
||||
import torch; x=torch.tensor([1.0]).cuda(); print('GPU OK:', x.device)
|
||||
"
|
||||
```
|
||||
|
||||
gfx1036 requires `HSA_OVERRIDE_GFX_VERSION=10.3.0` — always set this env var
|
||||
before starting ComfyUI or running any Python that loads GPU tensors.
|
||||
Without it: `HIP error: invalid device function` immediately on first op.
|
||||
|
||||
## Model folder structure
|
||||
|
||||
```
|
||||
~/ComfyUI/models/
|
||||
├── unet/
|
||||
│ └── flux1-schnell-Q8_0.gguf (12GB, FLUX image)
|
||||
├── clip/
|
||||
│ ├── clip_l.safetensors (235MB, FLUX CLIP-L)
|
||||
│ ├── t5xxl_fp8_e4m3fn.safetensors (4.6GB, FLUX T5-XXL)
|
||||
│ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors (6.3GB, Wan text encoder)
|
||||
├── vae/
|
||||
│ ├── ae.safetensors (108MB, FLUX VAE)
|
||||
│ └── wan_2.1_vae.safetensors (243MB, Wan VAE)
|
||||
└── diffusion_models/
|
||||
└── Wan2.2-TI2V-5B-Q4_K_M.gguf (3.2GB, Wan 2.2 video)
|
||||
```
|
||||
|
||||
## Stopping ComfyUI
|
||||
|
||||
```bash
|
||||
tmux send-keys -t comfyui C-c
|
||||
# or kill the session:
|
||||
tmux kill-session -t comfyui
|
||||
```
|
||||
@@ -0,0 +1,99 @@
|
||||
# 02 — FLUX.1 Schnell Image Pipeline
|
||||
|
||||
## Why FLUX over SDXL
|
||||
|
||||
FLUX is a 12B-parameter transformer model. SDXL (RealVisXL) is 3.5B.
|
||||
FLUX has significantly better:
|
||||
- Spatial depth and perspective (lens simulation)
|
||||
- Scene geometry (vanishing points, depth-of-field)
|
||||
- Prompt following (T5-XXL understands long, detailed prompts)
|
||||
|
||||
SDXL was tested on lahrcarpetcleaning.com and rejected: flat angles, no depth,
|
||||
poor spatial coherence. FLUX replaced it entirely.
|
||||
|
||||
## Model stack
|
||||
|
||||
| File | Size | Notes |
|
||||
|---|---|---|
|
||||
| flux1-schnell-Q8_0.gguf | 12GB | GGUF Q8, needs ComfyUI-GGUF node |
|
||||
| t5xxl_fp8_e4m3fn.safetensors | 4.6GB | T5-XXL text encoder, fp8 quantized |
|
||||
| clip_l.safetensors | 235MB | CLIP-L, short prompt encoder |
|
||||
| ae.safetensors | 108MB | Official FLUX VAE from Black Forest Labs |
|
||||
|
||||
## Download (one-time)
|
||||
|
||||
FLUX GGUF (public, no auth):
|
||||
```bash
|
||||
wget "https://huggingface.co/city96/FLUX.1-schnell-gguf/resolve/main/flux1-schnell-Q8_0.gguf" \
|
||||
-O ~/ComfyUI/models/unet/flux1-schnell-Q8_0.gguf
|
||||
|
||||
wget "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/t5xxl_fp8_e4m3fn.safetensors" \
|
||||
-O ~/ComfyUI/models/clip/t5xxl_fp8_e4m3fn.safetensors
|
||||
|
||||
wget "https://huggingface.co/comfyanonymous/flux_text_encoders/resolve/main/clip_l.safetensors" \
|
||||
-O ~/ComfyUI/models/clip/clip_l.safetensors
|
||||
```
|
||||
|
||||
FLUX VAE (gated — requires HF login and license acceptance):
|
||||
```bash
|
||||
hf auth login # paste read token
|
||||
HF_TOKEN=$(cat ~/.cache/huggingface/token)
|
||||
wget --header="Authorization: Bearer $HF_TOKEN" \
|
||||
"https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/ae.safetensors" \
|
||||
-O ~/ComfyUI/models/vae/ae.safetensors
|
||||
```
|
||||
|
||||
## ComfyUI workflow (what gen-images-flux.py sends)
|
||||
|
||||
```
|
||||
UnetLoaderGGUF → flux1-schnell-Q8_0.gguf
|
||||
DualCLIPLoader → t5xxl_fp8_e4m3fn + clip_l (type=flux)
|
||||
VAELoader → ae.safetensors
|
||||
CLIPTextEncode → prompt
|
||||
EmptyLatentImage → 1024×576, batch=1
|
||||
KSampler → steps=4, cfg=1.0, euler, simple
|
||||
VAEDecode
|
||||
SaveImage
|
||||
```
|
||||
|
||||
## Settings
|
||||
|
||||
| Setting | Value | Why |
|
||||
|---|---|---|
|
||||
| Steps | 4 | Schnell is distilled — 4 steps is optimal |
|
||||
| CFG | 1.0 | Distilled model, higher CFG degrades quality |
|
||||
| Sampler | euler | Best for FLUX |
|
||||
| Scheduler | simple | Matches FLUX training |
|
||||
| Negative prompt | none | Distilled model ignores it |
|
||||
| Resolution | 1024×576 | 16:9 hero format |
|
||||
|
||||
## Running generation
|
||||
|
||||
```bash
|
||||
# ComfyUI must be running first (see 01-comfyui-setup.md)
|
||||
cd /home/sirdrez/arisingmedia-websites/{domain}
|
||||
python3 tools/gen-images-flux.py 2>&1 | tee tools/flux-gen.log
|
||||
```
|
||||
|
||||
Monitor:
|
||||
```bash
|
||||
tmux attach -t comfyui # step progress bars
|
||||
tail -f tools/flux-gen.log # per-image OK/FAIL
|
||||
```
|
||||
|
||||
Speed: ~4 min/image on CPU (2GB VRAM insufficient for GPU). 28 images = ~1h50m.
|
||||
|
||||
## After generation
|
||||
|
||||
```bash
|
||||
python3 tools/convert-to-webp.py # resize + convert to WebP
|
||||
rm assets/images/**/*.jpg # delete source JPGs
|
||||
docker compose build --no-cache web # bake WebP into image
|
||||
docker compose up -d
|
||||
```
|
||||
|
||||
Verify:
|
||||
```bash
|
||||
curl -s -o /dev/null -w "%{http_code}" http://localhost:{port}/assets/images/hero/hero-carpet-cleaning.webp
|
||||
# must return 200
|
||||
```
|
||||
@@ -0,0 +1,159 @@
|
||||
# 03 — Wan 2.2 Video Pipeline (Image-to-Video)
|
||||
|
||||
## Default policy: local generation
|
||||
|
||||
Video generation is done locally with Wan 2.2 by default. Google Veo (via
|
||||
Vertex AI / Gemini API) is NOT used unless the client has explicit budget
|
||||
allocated for it. Reasons:
|
||||
|
||||
- Google Veo costs money per second of video generated (billed per request)
|
||||
- Local Wan 2.2 is free after one-time model download (~10GB total)
|
||||
- Quality from Wan 2.2 at 832x480 is sufficient for hero reels
|
||||
- No API key, no quota limits, no vendor dependency
|
||||
|
||||
Use Google Veo only when: client approves a paid media budget, OR the local
|
||||
workstation is unavailable and a deadline cannot wait for CPU generation time.
|
||||
|
||||
## Purpose
|
||||
|
||||
Takes FLUX-generated hero stills and animates each into a 3-5 second clip.
|
||||
Clips are stitched with ffmpeg into a marketing reel for the hero section.
|
||||
|
||||
## Model stack
|
||||
|
||||
| File | Size | Notes |
|
||||
|---|---|---|
|
||||
| Wan2.2-TI2V-5B-Q4_K_M.gguf | 3.2GB | Text+Image to Video, 5B Q4 GGUF |
|
||||
| umt5_xxl_fp8_e4m3fn_scaled.safetensors | 6.3GB | UMT5-XXL text encoder, fp8 |
|
||||
| wan_2.1_vae.safetensors | 243MB | Wan VAE (compatible with 2.2) |
|
||||
|
||||
## Download (one-time, all public)
|
||||
|
||||
```bash
|
||||
# Wan 2.2 model
|
||||
wget "https://huggingface.co/QuantStack/Wan2.2-TI2V-5B-GGUF/resolve/main/Wan2.2-TI2V-5B-Q4_K_M.gguf" \
|
||||
-O ~/ComfyUI/models/diffusion_models/Wan2.2-TI2V-5B-Q4_K_M.gguf
|
||||
|
||||
# Text encoder
|
||||
wget "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors" \
|
||||
-O ~/ComfyUI/models/clip/umt5_xxl_fp8_e4m3fn_scaled.safetensors
|
||||
|
||||
# VAE
|
||||
wget "https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/resolve/main/split_files/vae/wan_2.1_vae.safetensors" \
|
||||
-O ~/ComfyUI/models/vae/wan_2.1_vae.safetensors
|
||||
```
|
||||
|
||||
## Critical: WanImageToVideo is a conditioning node, NOT a sampler
|
||||
|
||||
This is the most important thing to understand about the Wan pipeline. The node
|
||||
name is misleading. `WanImageToVideo` does NOT run diffusion — it sets up the
|
||||
conditioning and empty latent. A separate `KSampler` runs the actual diffusion.
|
||||
|
||||
Wrong mental model (what most tutorials imply):
|
||||
```
|
||||
LoadImage → WanImageToVideo → SaveAnimatedWEBP
|
||||
```
|
||||
|
||||
Correct node graph:
|
||||
```
|
||||
UnetLoaderGGUF ─────────────────────────────────────→ KSampler.model
|
||||
CLIPLoader ──→ CLIPTextEncode (positive) ─→ WanImageToVideo.positive ──→ KSampler.positive
|
||||
└→ CLIPTextEncode (negative) ─→ WanImageToVideo.negative ──→ KSampler.negative
|
||||
VAELoader ──→ WanImageToVideo.vae WanImageToVideo.latent ──→ KSampler.latent_image
|
||||
LoadImage ──→ WanImageToVideo.start_image (optional)
|
||||
KSampler.samples ──→ VAEDecode ──→ SaveAnimatedWEBP
|
||||
```
|
||||
|
||||
WanImageToVideo outputs three things (in order):
|
||||
- output[0] = positive CONDITIONING (enhanced with image)
|
||||
- output[1] = negative CONDITIONING
|
||||
- output[2] = latent LATENT (sized for video: width × height × frames)
|
||||
|
||||
The `start_image` input (optional IMAGE) anchors the first frame. Without it,
|
||||
video starts from noise. Always pass it for image-to-video.
|
||||
|
||||
## Workflow
|
||||
|
||||
Correct ComfyUI API node graph (as sent by `gen-video-wan.py`):
|
||||
|
||||
```
|
||||
node 1: UnetLoaderGGUF → Wan2.2-TI2V-5B-Q4_K_M.gguf
|
||||
node 2: CLIPLoader → umt5_xxl_fp8_e4m3fn_scaled.safetensors (type=wan)
|
||||
node 3: VAELoader → wan_2.1_vae.safetensors
|
||||
node 4: LoadImage → FLUX hero still (.webp)
|
||||
node 5: CLIPTextEncode → motion prompt text (positive)
|
||||
node 6: CLIPTextEncode → negative prompt text
|
||||
node 7: WanImageToVideo → positive=[5,0], negative=[6,0], vae=[3,0],
|
||||
start_image=[4,0], width=832, height=480,
|
||||
length=25 (or 49), batch_size=1
|
||||
node 8: KSampler → model=[1,0], positive=[7,0], negative=[7,1],
|
||||
latent_image=[7,2], steps=20, cfg=6.0,
|
||||
sampler_name=uni_pc, scheduler=simple, denoise=1.0
|
||||
node 9: VAEDecode → samples=[8,0], vae=[3,0]
|
||||
node 10: SaveAnimatedWEBP → images=[9,0], fps=12
|
||||
```
|
||||
|
||||
## Settings
|
||||
|
||||
| Setting | Value |
|
||||
|---|---|
|
||||
| Resolution | 832×480 (16:9 ~480p) |
|
||||
| Frames | 49 (~4 seconds at 12fps) |
|
||||
| Steps | 20 |
|
||||
| CFG | 6.0 |
|
||||
| Sampler | uni_pc |
|
||||
|
||||
**Frame count constraint:** `length` must follow the pattern 1, 5, 9, 13, 17, 21, 25, 29 ... (step of 4).
|
||||
ComfyUI enforces this. 49 is valid (1 + 4×12). 50 is not.
|
||||
|
||||
**CPU speed on Arising Media workstation (2GB VRAM, CPU inference):**
|
||||
- ~415 seconds per diffusion step
|
||||
- 20 steps × 415s = ~2.3 hours per clip
|
||||
- 6 clips = ~14 hours total for a full reel
|
||||
- Use 25 frames (not 49) for test runs to halve generation time
|
||||
- Full reel generation: start before leaving for the day, check next morning
|
||||
|
||||
**CLIPVision note:** No CLIPVision models are installed at `~/ComfyUI/models/clip_vision/`.
|
||||
The `clip_vision_output` input on WanImageToVideo is optional and currently unused.
|
||||
Image conditioning comes from `start_image` only (VAE-encoded first frame).
|
||||
This is sufficient for smooth motion — CLIPVision would add semantic image
|
||||
understanding but is not required.
|
||||
|
||||
## Running video generation
|
||||
|
||||
```bash
|
||||
# ComfyUI must be running, FLUX images must be converted to WebP first
|
||||
cd /home/sirdrez/arisingmedia-websites/{domain}
|
||||
python3 tools/gen-video-wan.py 2>&1 | tee tools/wan-gen.log
|
||||
```
|
||||
|
||||
Output goes to `assets/videos/clips/` as `.webp` animation files.
|
||||
|
||||
## Stitching the reel
|
||||
|
||||
```bash
|
||||
# Create file list
|
||||
ls assets/videos/clips/*.webp | sort | while read f; do echo "file '$PWD/$f'"; done > tools/clip-list.txt
|
||||
|
||||
# Convert webp animations to mp4 first (if needed)
|
||||
for f in assets/videos/clips/*.webp; do
|
||||
ffmpeg -i "$f" "${f%.webp}.mp4" -y
|
||||
done
|
||||
|
||||
# Stitch
|
||||
ls assets/videos/clips/*.mp4 | sort | while read f; do echo "file '$PWD/$f'"; done > tools/clip-list.txt
|
||||
ffmpeg -f concat -safe 0 -i tools/clip-list.txt -c copy assets/videos/hero/hero-reel-flux.mp4
|
||||
```
|
||||
|
||||
## Reel shot list (lahrcarpetcleaning.com)
|
||||
|
||||
| Clip | Source still | Motion prompt |
|
||||
|---|---|---|
|
||||
| clip-01 | hero-carpet-cleaning | slow dolly forward across carpet |
|
||||
| clip-02 | hero-stairs | slow pan upward along staircase |
|
||||
| clip-03 | hero-upholstery | gentle push in toward sofa |
|
||||
| clip-04 | hero-commercial | tracking shot down lobby |
|
||||
| clip-05 | hero-floors | floor-level drift forward |
|
||||
| clip-06 | hero-clean-result | rack focus across carpet fibers |
|
||||
|
||||
6 clips × ~4s = ~24 seconds total reel.
|
||||
@@ -0,0 +1,105 @@
|
||||
# 04 — Prompt Guide (Interior / Carpet Photography)
|
||||
|
||||
## The core pattern
|
||||
|
||||
All image prompts follow this structure:
|
||||
|
||||
```
|
||||
{camera angle} {lens} {subject description},
|
||||
{foreground detail} sharp in foreground, {background} receding into bokeh,
|
||||
{lighting description}, {style tag}, no people, ultra-realistic {type} photography
|
||||
```
|
||||
|
||||
## Why this works
|
||||
|
||||
FLUX.1 Schnell uses T5-XXL as the primary text encoder (6GB model) which
|
||||
understands natural language photography concepts deeply. Specifying lens
|
||||
focal length, depth of field, and spatial relationships produces images with
|
||||
correct depth, perspective, and scene geometry.
|
||||
|
||||
SDXL models lack this — their text encoders (CLIP-L/CLIP-G) top out at
|
||||
77 tokens and don't understand spatial concepts reliably.
|
||||
|
||||
## Lens vocabulary
|
||||
|
||||
| Lens | Effect | Use for |
|
||||
|---|---|---|
|
||||
| 24mm wide-angle | Strong perspective distortion, exaggerated depth | Corridors, lobbies, open spaces |
|
||||
| 35mm | Natural perspective, slight depth emphasis | Most interior shots |
|
||||
| 50mm prime | Near-natural perspective, shallow DoF | Close-ups, furniture, details |
|
||||
| macro | Extreme close-up, very shallow DoF | Carpet fiber detail, texture |
|
||||
|
||||
## Camera position vocabulary
|
||||
|
||||
- `low-angle` / `low 35mm angle` — camera near floor level, looking across surface
|
||||
- `floor level` — pressed to the floor, extreme low angle
|
||||
- `corner angle` — shot from room corner, wide coverage
|
||||
- `looking up` — camera below subject, looking upward
|
||||
- `looking down` — camera above, bird's-eye (avoid for carpet — looks flat)
|
||||
|
||||
## Depth of field vocabulary
|
||||
|
||||
- `shallow depth of field` — subject sharp, background blurred
|
||||
- `razor sharp in foreground ... receding into bokeh` — specific foreground/background split
|
||||
- `raking light` — light hitting surface at low angle, reveals texture
|
||||
- `vanishing point perspective` — strong linear convergence (corridors, offices)
|
||||
|
||||
## Lighting vocabulary
|
||||
|
||||
- `warm afternoon window light` — residential, golden hour feel
|
||||
- `raking natural light` — reveals carpet texture and pile height
|
||||
- `recessed ceiling lights creating depth` — commercial/corporate
|
||||
- `warm wall sconces` — hotel corridors
|
||||
- `crisp morning light` — bedrooms, bright and clean
|
||||
|
||||
## Full prompt examples
|
||||
|
||||
Carpet hero (residential):
|
||||
```
|
||||
low-angle 35mm lens perspective looking across thick plush cream carpet
|
||||
in an upstate New York living room, carpet fibers razor sharp in foreground,
|
||||
couch and coffee table receding into shallow bokeh background,
|
||||
warm afternoon window light raking across carpet texture,
|
||||
Finger Lakes farmhouse interior, no people,
|
||||
ultra-realistic architectural photography, 16:9
|
||||
```
|
||||
|
||||
Hotel corridor:
|
||||
```
|
||||
low 24mm lens looking down a long hotel corridor from floor level,
|
||||
patterned burgundy carpet runner sharp in extreme foreground receding to vanishing point,
|
||||
warm wall sconces lining white walls, numbered doors converging in perspective,
|
||||
no people, ultra-realistic hospitality photography, 16:9
|
||||
```
|
||||
|
||||
Hardwood floor:
|
||||
```
|
||||
low 24mm angle pressed to gleaming light oak hardwood floor,
|
||||
floor grain razor sharp in extreme foreground receding to hallway vanishing point,
|
||||
white walls, natural light streaming in, shallow depth of field,
|
||||
no people, ultra-realistic interior photography, 16:9
|
||||
```
|
||||
|
||||
## What NOT to include
|
||||
|
||||
- No people, no faces, no hands, no feet, no shoes/boots
|
||||
- No cleaning machines, vacuums, steam equipment, hoses
|
||||
- No text, logos, watermarks in the scene
|
||||
- No "before state" (dirty carpet, stains) — only clean result
|
||||
- No "wide shot" without camera angle qualifier — produces flat frontal views
|
||||
|
||||
## Video motion prompts (Wan 2.2)
|
||||
|
||||
For animating stills, describe the camera motion, not the scene content:
|
||||
|
||||
```
|
||||
slow dolly forward across {subject}, gentle camera push toward the far wall,
|
||||
{lighting}, cinematic, smooth motion
|
||||
```
|
||||
|
||||
Motion types:
|
||||
- `slow dolly forward` — push toward subject
|
||||
- `slow pan {direction}` — lateral camera rotation
|
||||
- `tracking shot moving forward` — camera travels through space
|
||||
- `rack focus` — lens focus shifts from foreground to background
|
||||
- `gentle push in` — subtle zoom/move toward subject
|
||||
@@ -0,0 +1,87 @@
|
||||
# 05 — Quality Improvement Levers
|
||||
|
||||
Three levers control FLUX output quality, in order of impact:
|
||||
|
||||
## 1. Prompt (highest impact, zero cost)
|
||||
|
||||
Incoherent objects in the frame are almost always prompt bleed — the model fills
|
||||
empty or ambiguous space with training-data defaults. Fix by naming every part of
|
||||
the frame explicitly.
|
||||
|
||||
**Background** — name it, don't imply it:
|
||||
- Bad: "living room" (model invents furniture, decor, wall art)
|
||||
- Good: "plain cream painted wall with a single frosted sliding glass door"
|
||||
|
||||
**Floor material** — always explicit:
|
||||
- "plush cream berber carpet" or "light oak hardwood floor"
|
||||
- Ambiguous floor → random floor type generated
|
||||
|
||||
**Ceiling** — if visible, name it; if not wanted, push it out of frame with a
|
||||
lower camera angle:
|
||||
- "white drop ceiling with recessed can lights"
|
||||
- Or: lower the angle until ceiling exits the frame entirely
|
||||
|
||||
**Negative scene elements** — add inline, not as a separate negative prompt
|
||||
(FLUX Schnell ignores negative prompts):
|
||||
- "no furniture clutter, no decorative objects, no picture frames, no signage"
|
||||
- "no cleaning equipment, no machines, no people"
|
||||
|
||||
**What not to use:**
|
||||
- "wide shot" without a camera angle qualifier — produces flat frontal views
|
||||
- Vague room names ("office", "lobby") without specifying what fills the space
|
||||
|
||||
## 2. Steps (marginal gain, 2x slower)
|
||||
|
||||
FLUX Schnell is distilled to 4 steps. The distillation process compresses
|
||||
a full diffusion model's quality into very few steps.
|
||||
|
||||
| Steps | Quality change | Time impact |
|
||||
|---|---|---|
|
||||
| 4 (default) | Baseline | ~4 min/image |
|
||||
| 6 | Slightly sharper edges, cleaner fine detail | ~6 min/image |
|
||||
| 8 | Diminishing returns past 6 | ~8 min/image |
|
||||
|
||||
Not recommended as a first fix. The distillation ceiling is the constraint,
|
||||
not step count. Step increases help texture detail but will not fix scene
|
||||
incoherence — that requires prompt changes.
|
||||
|
||||
KSampler in `gen-images-flux.py`:
|
||||
```python
|
||||
"steps": 4, # increase to 6 for detail passes
|
||||
```
|
||||
|
||||
## 3. Model size (real quality jump, 6x slower on CPU)
|
||||
|
||||
| Model | Steps | Quality | CPU time/image |
|
||||
|---|---|---|---|
|
||||
| FLUX.1 Schnell (current) | 4 | Good depth, some coherence gaps | ~4 min |
|
||||
| FLUX.1 Dev (full, non-distilled) | 20-30 | Better coherence, sharper geometry | ~20-30 min |
|
||||
|
||||
FLUX Dev would fix most coherence issues. At current CPU-only speed (2GB VRAM
|
||||
insufficient), a full 28-image batch would take 9+ hours.
|
||||
|
||||
**Practical path to FLUX Dev:**
|
||||
- Cloud GPU: RunPod or Vast.ai A100 runs FLUX Dev in ~90 seconds/image
|
||||
- Same prompts, same ComfyUI workflow — only model file and step count change
|
||||
- Switch `flux1-schnell-Q8_0.gguf` → FLUX Dev GGUF, set steps to 20, cfg to 3.5
|
||||
|
||||
## Decision matrix
|
||||
|
||||
| Issue | Fix |
|
||||
|---|---|
|
||||
| Objects that shouldn't be in frame | Prompt: name every surface explicitly |
|
||||
| Wrong floor/wall material | Prompt: be specific about material |
|
||||
| Flat angle despite prompt | Prompt: add "low-angle", lens mm, "foreground sharp" |
|
||||
| Soft edges on carpet fibers | Steps: increase 4 → 6 |
|
||||
| Incoherent room geometry | Model: switch to FLUX Dev on cloud GPU |
|
||||
| Overall composition wrong | Prompt: camera position + lens + foreground/bokeh split |
|
||||
|
||||
## Re-running specific images
|
||||
|
||||
To re-run only the problem frames without regenerating all 28:
|
||||
|
||||
1. Edit `tools/gen-images-flux.py`
|
||||
2. Change the `IMAGES` list to include only the failed image keys
|
||||
3. Run: `python3 tools/gen-images-flux.py 2>&1 | tee tools/flux-gen.log`
|
||||
4. Run: `python3 tools/convert-to-webp.py` (converts only new JPGs)
|
||||
5. Rebuild: `docker compose build --no-cache web && docker compose up -d`
|
||||
@@ -0,0 +1,46 @@
|
||||
# Local Image Generation — SOPs
|
||||
|
||||
Complete reference for generating site images locally using ComfyUI.
|
||||
No cloud API required. No per-image cost. Runs on the Arising Media workstation.
|
||||
|
||||
## Index
|
||||
|
||||
1. [01-comfyui-setup.md](01-comfyui-setup.md) — Installing ComfyUI, venv, GGUF node
|
||||
2. [02-flux-images.md](02-flux-images.md) — FLUX.1 Schnell image generation pipeline
|
||||
3. [03-wan-video.md](03-wan-video.md) — Wan 2.2 image-to-video pipeline
|
||||
4. [04-prompt-guide.md](04-prompt-guide.md) — Prompt patterns for interior/carpet photography
|
||||
5. [05-quality-levers.md](05-quality-levers.md) — Prompt, steps, model size: what to adjust and when
|
||||
|
||||
## Quick start (images already set up)
|
||||
|
||||
```bash
|
||||
# 1. Start ComfyUI
|
||||
tmux new-session -d -s comfyui \
|
||||
"cd ~/ComfyUI && venv/bin/python main.py --listen 0.0.0.0 --port 8188 --cpu 2>&1 | tee ~/comfyui.log"
|
||||
|
||||
# 2. Wait ~30s, then generate images
|
||||
cd /home/sirdrez/arisingmedia-websites/{domain}
|
||||
python3 tools/gen-images-flux.py 2>&1 | tee tools/flux-gen.log
|
||||
|
||||
# 3. Convert to WebP and deploy
|
||||
python3 tools/convert-to-webp.py
|
||||
rm assets/images/**/*.jpg
|
||||
docker compose build --no-cache web && docker compose up -d
|
||||
```
|
||||
|
||||
## Model files (installed at ~/ComfyUI/models/)
|
||||
|
||||
| Purpose | File | Size | Location |
|
||||
|---|---|---|---|
|
||||
| FLUX image UNet | flux1-schnell-Q8_0.gguf | 12GB | models/unet/ |
|
||||
| FLUX T5 encoder | t5xxl_fp8_e4m3fn.safetensors | 4.6GB | models/clip/ |
|
||||
| FLUX CLIP-L | clip_l.safetensors | 235MB | models/clip/ |
|
||||
| FLUX VAE | ae.safetensors | 108MB | models/vae/ |
|
||||
| Wan 2.2 video | Wan2.2-TI2V-5B-Q4_K_M.gguf | 3.2GB | models/diffusion_models/ |
|
||||
| Wan UMT5 encoder | umt5_xxl_fp8_e4m3fn_scaled.safetensors | 6.3GB | models/clip/ |
|
||||
| Wan VAE | wan_2.1_vae.safetensors | 243MB | models/vae/ |
|
||||
|
||||
## Reference project
|
||||
|
||||
`lahrcarpetcleaning.com` — first project using this full pipeline.
|
||||
Scripts: `tools/gen-images-flux.py`, `tools/gen-video-wan.py`, `tools/convert-to-webp.py`
|
||||
Reference in New Issue
Block a user