recent updates

This commit is contained in:
2026-06-09 18:31:59 +02:00
parent 398b94965c
commit 94f7a1f72a
42 changed files with 8686 additions and 0 deletions
@@ -0,0 +1,94 @@
# 00 — WP + Divi to AM Stack A Pipeline — Overview
Converts a .wpress archive (All-in-One WP Migration) into a Stack A deployment:
PHP router + SQLite databases + vanilla JS/CSS. Never a 1:1 Divi copy.
Every migration is a content extraction and redesign, not a port.
## Stack A output (what this pipeline produces)
```
src/api/router.php URL dispatcher
src/api/contact.php form handler (Resend via curl)
src/api/templates/*.php home | static | classes | schedule | glossary | blog
src/api/components/_header.php nav from nav.sqlite
src/api/components/_footer.php
src/api/data/*.sqlite one DB per content domain (see 09-stack-a-output.md)
build/seed_databases.py creates + seeds all SQLite DBs — THE source of truth
assets/ vanilla CSS/JS/images
infra/nginx.conf, supervisord.conf, php-fpm-pool.conf
Dockerfile (php:8.3-fpm-alpine)
docker-compose.yml
```
## Why NOT static HTML
Any site with a glossary, blog, schedule, or recurring content model gets Stack A.
Editing content = edit seed_databases.py → reseed → rebuild. No PHP file edits.
## Divi is the data source, not the design target
Extract from Divi:
- Page content (headings, body copy, CTAs)
- Navigation menus (wp_terms + wp_termmeta)
- Header logo + tagline (wp_options: blogname, blogdescription, et_divi)
- Media (uploads/ → WebP → assets/images/)
- Design tokens (colors, fonts → tokens.css)
- SEO (Yoast wp_postmeta → pages.sqlite meta_description)
- Blog posts (wp_posts where post_type=post)
- Custom post types (testimonials, FAQs, glossary terms if present)
Do NOT replicate:
- Divi section/row/column grid structure
- Divi module types (blurbs, toggles, CTAs, pricing tables)
- WordPress page slugs (map to clean slugs per nginx.conf pattern)
- WordPress menu item IDs
## Pipeline phases
```
Phase 0 Setup Point pipeline at .wpress file; create working dirs
Phase 1 Extract Unpack .wpress → wpress-extract/
Phase 2 DB Analysis Parse SQL dump; detect Divi version; inventory pages, posts, menus
Phase 3 Content Extract page sections + nav menus + blog posts from Divi
Phase 4 Design Pull colors + fonts → tokens.css draft
Phase 5 Media Catalog uploads/; convert to WebP; build media-manifest.json
Phase 6 Staging Map extracted JSON → seed_databases.py skeleton (content on standby)
Phase 7 Fill Agent fills each SQLite table row by row from staged JSON
Phase 8 Templates Scaffold PHP templates + components from AM reference
Phase 9 SEO Port titles, metas, canonicals, schema.org, redirect map
Phase 10 Build docker compose build && docker compose up -d
Phase 11 QA Lighthouse, protection check, grep for Divi residue
```
## CLI launcher
```
python3 scripts/migrate.py --wpress /path/to/backup.wpress --domain example.com
```
Runs phases 0-6 automatically, then prints agent breadcrumbs for phases 7-11.
## Key missed items from prior migrations (REQUIRED fixes)
1. **NAV MENUS**: Must extract wp_terms (taxonomy=nav_menu) + wp_termmeta for label/URL/order.
Output: nav.json → seeded into nav.sqlite (label, href, display_order, is_cta).
2. **DIVI HEADER**: Must extract et_divi options from wp_options for logo, header layout, colors.
The _header.php must be written from scratch using AM design tokens, not copied from Divi.
3. **MEDIA**: All uploads/ files must be: cataloged → copied to assets/images/ → converted to WebP.
Every image reference in content JSON must be updated to /assets/images/{filename}.webp.
4. **SECTION REMAPPING**: Divi modules must be remapped to AM section types.
- blurb_module → feature_cards item
- toggle_module → accordion item
- cta_module → cta_band section
- pricing_module → booking_options section
- testimonial_mod → testimonials.sqlite row
- text_module → text_block section
## Related SOPs
- **09-stack-a-output.md** — SQLite schema + sections_json spec
- **10-agent-breadcrumbs.md** — Step-by-step ordered checklist for agent execution
- **00-stack-philosophy.md** — Stack A vs Stack B decision rationale
@@ -0,0 +1,120 @@
# 01 — .wpress Extraction
Unpack the All-in-One WP Migration `.wpress` archive into the project's
`.planning/wpress-extract/` directory.
## .wpress binary format
NOT a standard zip or tar. Custom sequential binary format:
```
[HEADER 4377 bytes] [FILE DATA n bytes] [HEADER] [FILE DATA] ...
```
Header breakdown:
```
Offset Length Field
0 255 Filename (null-padded)
255 14 File size in bytes (ASCII decimal, null-padded)
269 12 mtime unix timestamp (ASCII decimal, null-padded)
281 4096 Relative path (null-padded)
4377 n Raw file bytes (size from header)
```
The archive ends when a header of all null bytes is encountered, or EOF.
## Extraction script
Script: `.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py`
```bash
python3 ~/.am-webdesign-sops-path/scripts/extract_wpress.py \
.planning/vibrantyou-yoga-YYYYMMDD-*.wpress \
.planning/wpress-extract/
```
Or from the SOP scripts directory directly:
```bash
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py \
/home/sirdrez/arisingmedia-websites/{domain}/.planning/{file}.wpress \
/home/sirdrez/arisingmedia-websites/{domain}/.planning/wpress-extract/
```
Progress prints every 200 files. A 300-400MB archive typically extracts in
2-5 minutes and produces 1,000-5,000 files.
## Expected archive contents
After extraction, `wpress-extract/` contains:
```
wpress-extract/
├── package.json ← archive metadata (domain, WP version, plugin list)
├── database.sql ← full MySQL dump (the most important file)
└── wp-content/
├── uploads/ ← all media (images, PDFs, videos)
│ └── YYYY/MM/ ← WordPress date-organized subdirs
├── themes/
│ ├── Divi/ ← Divi 4 theme files (if Divi 4)
│ └── divi-5/ ← Divi 5 theme files (if Divi 5)
└── plugins/ ← installed plugins (useful for form schema)
├── gravityforms/
└── contact-form-7/
```
## Verify extraction
After the script completes, confirm the key files exist:
```bash
# Database dump present?
ls -lh .planning/wpress-extract/database.sql
# Uploads present?
find .planning/wpress-extract/wp-content/uploads -name "*.jpg" | wc -l
find .planning/wpress-extract/wp-content/uploads -name "*.png" | wc -l
# Archive metadata
cat .planning/wpress-extract/package.json
```
`package.json` contains the site URL, WordPress version, Divi version, and
plugin list — read it before proceeding to Phase 2.
## Common issues
**"Not a zip file" error** — Expected. The .wpress format is not zip.
The `extract_wpress.py` script handles it correctly.
**Missing database.sql** — The archive may name it differently. Check:
```bash
find .planning/wpress-extract -name "*.sql" 2>/dev/null
```
**Partial extraction** — If the script stops early, check disk space:
```bash
df -h .planning/wpress-extract/
```
A 378MB .wpress typically expands to 1-3GB uncompressed.
**Path traversal in filenames** — The script strips leading `/` and `.` from
paths. If files land in unexpected locations, check the raw path field with:
```bash
python3 -c "
import sys
HEADER_SIZE=4377; NAME_LEN=255; SIZE_LEN=14; MTIME_LEN=12; PATH_LEN=4096
with open(sys.argv[1],'rb') as f:
for i in range(5):
h = f.read(HEADER_SIZE)
name = h[:NAME_LEN].split(b'\x00',1)[0].decode(errors='replace')
size = int(h[NAME_LEN:NAME_LEN+SIZE_LEN].split(b'\x00',1)[0] or 0)
path = h[NAME_LEN+SIZE_LEN+MTIME_LEN:].split(b'\x00',1)[0].decode(errors='replace')
print(f' [{i}] path={repr(path)} name={repr(name)} size={size}')
f.seek(size, 1)
" .planning/file.wpress
```
## Next step
Proceed to `02-database-analysis.md` to inventory pages and detect Divi version.
@@ -0,0 +1,151 @@
# 02 — Database Analysis
Parse the WordPress MySQL dump to inventory pages, detect Divi version,
extract design settings, and build the data JSON files that drive the AM build.
## Script
```bash
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/analyze_db.py \
{domain}/.planning/wpress-extract/ \
{domain}/.planning/data/
```
Outputs three files into `.planning/data/`:
- `pages.json` — all published pages/posts with content and SEO meta
- `design-system.json` — colors, fonts, Divi settings
- `site-info.json` — domain, plugin list, WP version, Divi version
## Divi version detection
The script auto-detects Divi version by scanning `database.sql`:
| Signal in SQL | Divi version |
|---------------|-------------|
| `wp:divi/` in post_content | Divi 5 |
| `[et_pb_section` in post_content | Divi 4 |
**This determines the content extraction path.** Divi 4 → use `extract_divi4.py`.
Divi 5 → use `extract_divi5.py`. See `03-divi-content-extraction.md`.
## Key WordPress tables
| Table | Contents | Used for |
|-------|----------|---------|
| `wp_posts` | All pages, posts, attachments, layouts | Page inventory, content |
| `wp_postmeta` | Per-post metadata | ACF fields, Rank Math SEO, Divi layout JSON |
| `wp_options` | Site-wide settings | Divi theme settings, colors, fonts |
| `wp_gf_forms` | Gravity Forms definitions | Form field schema |
| `wp_gf_entries` | Gravity Form submissions | Not needed for migration |
| `wp_rank_math_seo_meta` | Rank Math SEO per page | SEO titles, descriptions |
## Reading pages.json
Each entry in `pages.json`:
```json
{
"id": "42",
"post_type": "page",
"slug": "about",
"title": "About VibrantYou Yoga",
"status": "publish",
"date": "2026-03-15",
"modified": "2026-04-10",
"content_raw": "<!-- wp:divi/section ... -->...",
"excerpt": "",
"parent_id": "0",
"menu_order": "3",
"seo_title": "About VibrantYou Yoga | Mindful Movement in [City]",
"seo_description": "...",
"seo_keywords": "yoga studio, mindful movement",
"acf": {
"vyy_hero_headline": "Move With Intention",
"vyy_hero_subhead": "..."
}
}
```
`content_raw` holds the raw Divi block markup. Pass it to the extractor scripts.
`acf` holds Advanced Custom Fields values — often cleaner than block content.
## Reading design-system.json
Contains extracted Divi theme settings. Key fields:
```json
{
"primary_color": "#1a8a7a",
"body_font": "DM Sans",
"header_font": "DM Serif Display",
"body_font_size": "16",
"body_line_height": "1.7",
"divi_version": "5",
"wp_version": "6.9.4",
"site_url": "https://vibrantyou.yoga",
"site_name": "VibrantYou Yoga"
}
```
Use these values to seed the AM `main.css` CSS custom properties block.
## Manual inspection (when script output is sparse)
Sometimes the Divi theme options are stored as PHP-serialized data.
Use grep to find and eyeball the raw values:
```bash
DB=.planning/wpress-extract/database.sql
# Divi global colors
grep -o "'et_divi[^']*','[^']*'" $DB | head -30
# Site name + URL
grep -E "'(siteurl|blogname|admin_email)','[^']*'" $DB
# Rank Math SEO meta for a specific post
grep "rank_math_title\|rank_math_description" $DB | head -20
# All published page slugs
grep -o "post_name','[^']*'" $DB | grep -v "revision\|auto-draft" | sort | uniq
```
## Gravity Forms schema (for form replacement)
Find form field definitions:
```bash
grep "INSERT INTO \`wp_gf_forms\`" .planning/wpress-extract/database.sql | \
python3 -c "
import sys, json, re
for line in sys.stdin:
m = re.search(r\"'([^']+)'\s*\)\s*;\", line)
if m:
try: print(json.dumps(json.loads(m.group(1).replace('\\\\\"','\"')), indent=2)[:2000])
except: pass
" 2>/dev/null | head -100
```
Field types seen in Gravity Forms: text, email, phone, textarea, select, checkbox, radio, name, address, fileupload. Map each to a plain HTML input equivalent.
## Archive directory layout note
The AIOIM .wpress format extracts flat — no `wp-content/` wrapper:
```
wpress-extract/
├── database.sql ← NOT in wp-content/
├── package.json
├── uploads/ ← NOT wp-content/uploads/
├── themes/ ← NOT wp-content/themes/
├── plugins/ ← NOT wp-content/plugins/
└── et-cache/
```
Scripts must reference `uploads/`, `themes/`, `plugins/` directly under
`wpress-extract/`, not `wpress-extract/wp-content/`.
## Next step
Once `pages.json` is written, proceed to `03-divi-content-extraction.md`
to parse `content_raw` for each page into structured AM-ready HTML.
@@ -0,0 +1,157 @@
# 03 — Divi Content Extraction
Parse raw Divi page content from `pages.json` into clean, structured HTML
sections ready to map into AM templates.
## Divi 4 vs Divi 5 — critical difference
### Divi 4 (shortcode-based)
Content is stored as shortcodes in `wp_posts.post_content`:
```
[et_pb_section fb_built="1" admin_label="Hero" _builder_version="4.27.4"
background_color="#0f5f53" ...]
[et_pb_row ...]
[et_pb_column type="4_4" ...]
[et_pb_text ...]<h1>Move With Intention</h1>[/et_pb_text]
[et_pb_button button_url="/contact" button_text="Book a Class" /]
[/et_pb_column]
[/et_pb_row]
[/et_pb_section]
```
Use `extract_divi4.py` → parses shortcode tree into section/row/module JSON.
### Divi 5 (block-based)
Content is stored as Gutenberg-style block comments:
```html
<!-- wp:divi/section {"id":"section-abc123","attrs":{"backgroundColor":{"value":"#0f5f53"}}} -->
<div class="et_pb_section ...">
<!-- wp:divi/row ... -->
<!-- wp:divi/column ... -->
<!-- wp:divi/text ... -->
<div class="et_pb_text_inner"><h1>Move With Intention</h1></div>
<!-- /wp:divi/text -->
<!-- /wp:divi/column -->
<!-- /wp:divi/row -->
</div>
<!-- /wp:divi/section -->
```
Use `extract_divi5.py` → strips block wrapper, extracts inner HTML per module.
## Divi 5 extraction script
```bash
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_divi5.py \
{domain}/.planning/data/pages.json \
{domain}/.planning/data/content/
```
Produces one JSON file per page: `content/{slug}.json`
```json
{
"slug": "about",
"title": "About VibrantYou Yoga",
"seo_title": "About VibrantYou Yoga | ...",
"seo_description": "...",
"sections": [
{
"type": "hero",
"background_color": "#0f5f53",
"modules": [
{ "module": "text", "html": "<h1>Move With Intention</h1>" },
{ "module": "button", "text": "Book a Class", "url": "/contact/" }
]
},
{
"type": "standard",
"modules": [
{ "module": "text", "html": "<h2>Our Story</h2><p>...</p>" },
{ "module": "image", "src": "/assets/images/studio.webp", "alt": "..." }
]
}
]
}
```
## ACF fields take priority
If a page has ACF fields (in `pages.json[].acf`), use those over block content.
ACF fields are typically cleaner, pre-authored copy without Divi wrapper noise.
Convention for VYY-specific ACF keys:
- `vyy_hero_headline``<h1>` in hero section
- `vyy_hero_subhead``<p class="hero-lead">` in hero
- `vyy_hero_cta_text` → primary CTA button label
- `vyy_hero_cta_url` → primary CTA button href
Always check `acf` keys before parsing `content_raw`.
## Stripping Divi class/attribute noise
After extraction, run every HTML snippet through the `clean_divi_html()`
function from `divi_to_html.py`:
```python
from divi_to_html import clean_divi_html, rewrite_internal_links
cleaned = clean_divi_html(raw_html)
cleaned = rewrite_internal_links(cleaned, staging_hosts=("vibrantyou.yoga",))
```
This removes:
- `<!-- wp:divi/... -->` block comments
- `data-et-*`, `data-builder-*` attributes
- `et_pb_*`, `divi-builder-*`, `d5_*` class tokens
- Empty `class=""` attributes
## What to extract per section type
| Divi module | Extract | Map to AM element |
|-------------|---------|-------------------|
| `divi/text` | inner HTML | `<section>`, `<p>`, headings as-is |
| `divi/button` | `text`, `url` | `<a class="btn-primary">` |
| `divi/image` | `src`, `alt`, `title` | `<img>` → rewrite to WebP path |
| `divi/blurb` | icon, title, body | `.am-card` component |
| `divi/testimonial` | quote, author, company | `.am-testimonial` component |
| `divi/video` | `src`, poster | `<video>` or YouTube embed |
| `divi/contact_form` | field list | → replace with AM form, see `08` |
| `divi/accordion` | Q+A pairs | `<details><summary>` |
| `divi/fullwidth_header` | title, subhead, CTA | hero section |
## Section background colors → AM section modifiers
Divi 5 stores `backgroundColor` in the block `attrs` JSON.
Map to AM CSS modifier classes:
| Divi background | AM class modifier |
|----------------|------------------|
| `#0f5f53` (dark teal) | `.section--dark` |
| `#1a8a7a` (mid teal) | `.section--brand` |
| `#f5f5f5` / `#fafafa` | `.section--light` |
| `#ffffff` / none | `.section--white` |
## Content quality pass (required before HTML build)
After extraction, review every page's content for:
1. **Cut bloated copy** — WordPress sites often have 3x more text than needed.
Target 30-50% reduction. One clear idea per paragraph.
2. **Remove stale metrics** — "Over 500 students" only stays if it's verifiable.
Otherwise remove or mark `DRAFT NEEDED`.
3. **Remove plugin artifacts** — Gravity Forms shortcodes `[gravityforms id="1"]`,
Events Manager tags, Divi shortcode residue that survived extraction.
4. **Improve CTAs** — Replace generic "Learn More" with action-specific text:
"Book a Free Class", "View the Schedule", "Start Your Practice".
5. **Flag images** — Note every `<img>` that needs a real photo vs stock.
## Next step
Proceed to `04-design-system-extraction.md` to convert Divi theme settings
into AM CSS custom properties, then `05-content-migration.md` to build the
HTML templates.
@@ -0,0 +1,172 @@
# 04 — Design System Extraction
Convert Divi theme settings into AM CSS custom properties.
The goal is to ENHANCE the design — cleaner, more modern — not replicate it.
## Input
`design-system.json` produced by `analyze_db.py`. Key fields:
```json
{
"primary_color": "#1a8a7a",
"body_font": "DM Sans",
"header_font": "DM Serif Display",
"body_font_size": "16",
"body_line_height": "1.7",
"site_name": "VibrantYou Yoga"
}
```
## Color palette strategy
Never lift the Divi palette 1:1. Use extracted colors as the base and build a
full 5-step scale around the primary hue:
| Token | Derived from | Role |
|-------|-------------|------|
| `--color-primary` | Divi accent_color | Buttons, links, active states |
| `--color-primary-dark` | Darken primary 15% | Hover states, section backgrounds |
| `--color-primary-light` | Lighten primary 40% | Subtle tints, borders |
| `--color-surface` | Always `#fafafa` | Page background |
| `--color-surface-alt` | `#f3f3f3` | Alternating sections |
| `--color-text` | Always `#1a1a1a` | Body copy |
| `--color-text-muted` | `#666` | Subheadings, captions |
| `--color-border` | 10% primary or `#e0e0e0` | Dividers, inputs |
| `--color-white` | `#ffffff` | Card backgrounds, hero text |
For VibrantYou Yoga (primary `#1a8a7a`, dark `#0f5f53`):
```css
:root {
--color-primary: #1a8a7a;
--color-primary-dark: #0f5f53;
--color-primary-light: #d4f0eb;
--color-surface: #fafafa;
--color-surface-alt: #f0f7f6;
--color-text: #1a1a1a;
--color-text-muted: #5a6e6b;
--color-border: #c8dedd;
--color-white: #ffffff;
}
```
## Typography strategy
Use the extracted fonts but upgrade the type scale.
Divi's default type scale is too small and too flat. Aim for 1.251.333 modular ratio.
```css
:root {
/* Fonts from design-system.json */
--font-body: 'DM Sans', system-ui, sans-serif;
--font-heading: 'DM Serif Display', Georgia, serif;
/* Modular scale (1.25 ratio from 16px base) */
--text-xs: 0.75rem; /* 12px */
--text-sm: 0.875rem; /* 14px */
--text-base: 1rem; /* 16px */
--text-lg: 1.125rem; /* 18px */
--text-xl: 1.25rem; /* 20px */
--text-2xl: 1.5rem; /* 24px */
--text-3xl: 1.875rem; /* 30px */
--text-4xl: 2.25rem; /* 36px */
--text-5xl: 3rem; /* 48px */
--text-6xl: 3.75rem; /* 60px */
/* Line heights */
--leading-tight: 1.2;
--leading-normal: 1.6;
--leading-loose: 1.8;
/* Font weights */
--weight-normal: 400;
--weight-medium: 500;
--weight-semibold: 600;
--weight-bold: 700;
}
```
## Spacing and layout
Divi uses pixel-based margins/paddings that must be converted to a consistent
rem-based spacing scale:
```css
:root {
--space-1: 0.25rem; /* 4px */
--space-2: 0.5rem; /* 8px */
--space-3: 0.75rem; /* 12px */
--space-4: 1rem; /* 16px */
--space-5: 1.25rem; /* 20px */
--space-6: 1.5rem; /* 24px */
--space-8: 2rem; /* 32px */
--space-10: 2.5rem; /* 40px */
--space-12: 3rem; /* 48px */
--space-16: 4rem; /* 64px */
--space-20: 5rem; /* 80px */
--space-24: 6rem; /* 96px */
--space-32: 8rem; /* 128px */
/* Section vertical padding */
--section-py: var(--space-20); /* 80px default */
--section-py-sm: var(--space-12); /* 48px mobile */
/* Container */
--container-max: 1200px;
--container-px: var(--space-6);
/* Border radius */
--radius-sm: 4px;
--radius-md: 8px;
--radius-lg: 12px;
--radius-xl: 20px;
--radius-full: 9999px;
/* Shadows */
--shadow-sm: 0 1px 3px rgba(0,0,0,.08);
--shadow-md: 0 4px 16px rgba(0,0,0,.1);
--shadow-lg: 0 12px 40px rgba(0,0,0,.12);
}
```
## Google Fonts import
For DM Sans + DM Serif Display:
```html
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=DM+Sans:ital,opsz,wght@0,9..40,300;0,9..40,400;0,9..40,500;0,9..40,600;0,9..40,700;1,9..40,400&family=DM+Serif+Display:ital@0;1&display=swap" rel="stylesheet">
```
## Enhancement rules (required)
These upgrades apply to every AM migration regardless of source:
1. **Increase contrast** — body text must be #1a1a1a on white (WCAG AA minimum).
Never use the grey-on-grey color schemes that Divi themes commonly use.
2. **Whitespace is content** — section padding must be at minimum 80px vertical
on desktop. Divi often uses 40-60px which feels cramped.
3. **One weight per heading level** — h1 at 700, h2 at 600, h3 at 500.
Divi often leaves all headings at the same weight.
4. **Max-width prose** — body copy containers max 680px wide. Divi stretches
copy to full column width on 1200px screens, which is unreadable.
5. **Brand color is a highlight, not a wallpaper** — primary color should
appear on buttons, links, and 1-2 hero sections only. Divi sites often
paint every other section in the primary color.
## Output: main.css variables block
Write the complete `:root {}` block into `src/assets/css/main.css` as the
first section. All other CSS rules reference only `var(--token-name)`.
Never hard-code a color, font, or spacing value outside of `:root`.
## Next step
Proceed to `05-content-migration.md` to map extracted content into AM HTML
templates using this design system.
@@ -0,0 +1,246 @@
# 05 — Content Migration
Map extracted Divi content into AM HTML templates. This is the build phase.
Follow `01-project-structure.md` for directory layout and `03-build-pipeline.md`
for JSON + template stamping.
## Source files
After running Phase 2-4 scripts, `.planning/data/` contains:
```
.planning/data/
├── pages.json ← all published pages (from analyze_db.py)
├── site-info.json ← domain, plugin list, Divi version
├── design-system.json ← colors, fonts, spacing tokens
└── content/
├── home.json ← parsed sections for home page
├── about.json ← parsed sections for about page
├── services.json
└── ... ← one file per published page
```
## Information architecture for yoga sites
Standard AM structure for a yoga studio / wellness site:
```
/ home (hero, classes preview, testimonials, CTA)
/about/ about / story / instructors
/classes/ class schedule index
/classes/{slug}.html one page per class type (hatha, vinyasa, yin, etc.)
/private-sessions/ 1:1 session offerings
/workshops/ workshops + retreats index
/contact/ contact + booking form
/blog/ optional blog index
/blog/{slug}.html individual blog posts
/404.html
/500.html
/robots.txt
/sitemap.xml
```
Map every WP page slug to this structure first. Some WP slugs may need to be
consolidated, renamed, or dropped. Document the redirect map in
`.planning/redirect-map.txt` (old slug → new path).
## Build order
Build in this sequence. Each page uses the previous as a reference:
1. `src/assets/css/main.css` — design tokens, reset, typography, layout grid
2. `src/assets/css/components.css` — header, footer, hero, cards, forms, nav
3. `src/components/header.html` — navigation
4. `src/components/footer.html` — footer links, contact info
5. `src/assets/js/components.js` — fetch + inject header/footer
6. `src/assets/js/main.js` — scroll animations, intersection observer
7. `src/index.html` — home page (this IS the design system in working form)
8. `src/about/index.html`
9. `src/classes/index.html` + individual class pages (from JSON template if 4+)
10. `src/contact/index.html` + AM form
11. `src/blog/index.html` + individual posts
12. `src/robots.txt`, `src/sitemap.xml`, `src/404.html`, `src/500.html`
## HTML page skeleton
Every page uses the same skeleton. Copy from 06-seo-meta.md for the full
`<head>` requirements. Shell:
```html
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="site-root" content="/">
<title>{{seo_title}}</title>
<meta name="description" content="{{seo_description}}">
<link rel="canonical" href="{{canonical}}">
<!-- og, twitter, schema — see 06-seo-meta.md -->
<link rel="stylesheet" href="/assets/css/main.css">
<link rel="stylesheet" href="/assets/css/components.css">
</head>
<body>
<div id="header-placeholder"></div>
<main>
<!-- page sections go here -->
</main>
<div id="footer-placeholder"></div>
<script src="/assets/js/components.js"></script>
<script src="/assets/js/main.js"></script>
</body>
</html>
```
## Section HTML patterns
Map each `content/{slug}.json` section to one of these AM patterns:
### Hero (role: "hero")
```html
<section class="hero hero--dark">
<div class="container">
<div class="hero__content">
<h1 class="hero__title">Move With Intention</h1>
<p class="hero__lead">Discover yoga classes for all levels in [City].</p>
<div class="hero__actions">
<a href="/classes/" class="btn btn--primary">Explore Classes</a>
<a href="/contact/" class="btn btn--outline">Book a Session</a>
</div>
</div>
</div>
</section>
```
### Feature grid (4-col blurb modules)
```html
<section class="section section--light">
<div class="container">
<h2 class="section__title text-center">Why VibrantYou Yoga</h2>
<div class="grid grid--4">
<div class="feature-card">
<div class="feature-card__icon"><!-- SVG icon --></div>
<h3 class="feature-card__title">All Levels Welcome</h3>
<p class="feature-card__body">From first-timers to advanced practitioners.</p>
</div>
<!-- repeat -->
</div>
</div>
</section>
```
### Testimonials (3-col)
```html
<section class="section section--white">
<div class="container">
<h2 class="section__title text-center">What Students Say</h2>
<div class="grid grid--3">
<blockquote class="testimonial">
<p class="testimonial__quote">"..."</p>
<footer class="testimonial__author">
<strong>Jane D.</strong>
<span>Student since 2024</span>
</footer>
</blockquote>
</div>
</div>
</section>
```
### CTA section
```html
<section class="section section--brand">
<div class="container text-center">
<h2 class="section__title">Ready to Begin?</h2>
<p class="section__lead">Your first class is on us.</p>
<a href="/contact/" class="btn btn--white btn--lg">Book a Free Class</a>
</div>
</section>
```
## Class pages — JSON template build
If there are 4+ class types (Hatha, Vinyasa, Yin, Meditation, etc.), use the
build pipeline:
```
src/classes/
├── _template.html ← class detail page template
├── hatha.html ← generated from classes.json
├── vinyasa.html
├── yin.html
└── meditation.html
.planning/data/
└── classes.json ← array of class objects
```
`classes.json` schema:
```json
[
{
"slug": "hatha",
"name": "Hatha Yoga",
"title": "Hatha Yoga Classes | VibrantYou Yoga",
"meta_description": "...",
"canonical": "https://vibrantyou.yoga/classes/hatha.html",
"hero_h1": "Hatha Yoga",
"hero_lead": "A grounding practice for all experience levels.",
"description": "<p>...</p>",
"duration": "60 min",
"level": "All levels",
"schedule": "Mon, Wed, Fri — 9:00 AM",
"instructor": "Sarah M.",
"faqs": [
{ "q": "Do I need prior experience?", "a": "No." }
]
}
]
```
## Events Manager → static schedule
The site uses Events Manager plugin. For static migration:
- Extract recurring class schedule from the database (`wp_em_events` table)
- Convert to a static schedule table / cards in `src/classes/index.html`
- Do NOT recreate a dynamic booking system unless explicitly requested
- Link the "Book" button to the contact form or an external booking URL
## Image remapping
Every `<img src="...">` extracted from Divi content will have a WordPress
upload URL like `/wp-content/uploads/2026/03/image.jpg`.
Remap to AM path:
- Source: `wpress-extract/uploads/2026/03/image.jpg`
- AM dest: `src/assets/images/image.webp` (after WebP conversion)
- HTML: `<img src="/assets/images/image.webp" alt="..." loading="lazy" width="800" height="600">`
Always include `width`, `height`, `loading="lazy"`, and `alt` on every `<img>`.
## After build — verify
```bash
# Zero unreplaced template placeholders
grep -rn "{{" src/**/*.html
# All pages have canonical
grep -rL 'rel="canonical"' src/**/*.html
# All images have alt text
grep -rn '<img' src/**/*.html | grep -v 'alt="[^"]'
# Protection check (after deploy)
bash /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/tools/verify-protection.sh https://{domain}
```
## Next step
Proceed to `06-media-assets.md` for image migration and WebP conversion,
then `07-seo-preservation.md` for redirect map and meta tag audit.
@@ -0,0 +1,177 @@
# 06 — Media Assets
Migrate WordPress uploads to AM `/assets/images/`, convert to WebP, and
generate a media manifest for URL remapping during HTML build.
## Source location in wpress-extract
AIOIM extracts flat — uploads are at:
```
wpress-extract/uploads/ NOT wpress-extract/wp-content/uploads/
```
Organized by WordPress date-upload subdirs:
```
uploads/
├── 2026/
│ ├── 03/
│ │ ├── VibrantYouYogaLogo.png
│ │ ├── hero-studio.jpg
│ │ └── ...
│ └── 04/
│ └── ...
└── woocommerce-placeholder.png ← skip
```
## Step 1 — Catalog all media
```bash
find .planning/wpress-extract/uploads -type f \
\( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" -o -name "*.gif" -o -name "*.webp" -o -name "*.svg" \) \
| sort > .planning/data/media-raw-list.txt
wc -l .planning/data/media-raw-list.txt
```
## Step 2 — Skip WordPress-generated size variants
WordPress auto-generates resized variants: `-150x150`, `-300x200`, `-768x512`, etc.
Skip these — they are redundant once we have the originals.
```bash
grep -v -E "\-[0-9]+x[0-9]+\.(jpg|jpeg|png|webp)$" \
.planning/data/media-raw-list.txt > .planning/data/media-originals.txt
echo "Originals: $(wc -l < .planning/data/media-originals.txt)"
```
## Step 3 — Copy originals to src/assets/images/
Flatten the date-organized subdirs into a single flat directory.
Preserve filenames exactly (except extension will change to .webp).
```bash
mkdir -p src/assets/images/
while IFS= read -r src_path; do
filename=$(basename "$src_path")
cp "$src_path" "src/assets/images/$filename"
done < .planning/data/media-originals.txt
echo "Copied: $(ls src/assets/images/ | wc -l) files"
```
## Step 4 — Convert to WebP
Use the project's standard WebP conversion script (see `12-image-assets.md`).
If cwebp is available:
```bash
cd src/assets/images/
for img in *.jpg *.jpeg *.png; do
[ -f "$img" ] || continue
base="${img%.*}"
cwebp -q 82 "$img" -o "${base}.webp" 2>/dev/null && rm "$img"
done
echo "WebP conversion done. Count: $(ls *.webp | wc -l)"
```
Or use the Python Pillow batch script if cwebp is not installed:
```bash
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/convert_images.py \
src/assets/images/
```
## Step 5 — Generate media manifest
After conversion, build the URL remap table used during HTML build:
```bash
python3 -c "
import os, json
from pathlib import Path
uploads_dir = Path('.planning/wpress-extract/uploads')
site_url = 'https://vibrantyou.yoga'
am_path = '/assets/images'
manifest = []
for root, dirs, files in os.walk(uploads_dir):
for f in files:
full = Path(root) / f
rel = full.relative_to(uploads_dir)
# WordPress URL for this file
wp_url = f'{site_url}/wp-content/uploads/{rel}'
# Strip size variants from slug
stem = Path(f).stem
import re
stem_clean = re.sub(r'-\d+x\d+$', '', stem)
am_url = f'{am_path}/{stem_clean}.webp'
manifest.append({'wp_url': wp_url, 'am_url': am_url, 'original': f})
Path('.planning/data/media-manifest.json').write_text(
json.dumps(manifest, indent=2))
print(f'Manifest: {len(manifest)} entries')
"
```
## Step 6 — Apply manifest during HTML build
When writing HTML from extracted content, use the manifest to rewrite
every WordPress upload URL:
```python
import json, re
manifest = json.loads(open('.planning/data/media-manifest.json').read())
url_map = {m['wp_url']: m['am_url'] for m in manifest}
def rewrite_media_urls(html: str) -> str:
for wp_url, am_url in url_map.items():
html = html.replace(wp_url, am_url)
# Also rewrite relative /wp-content/uploads/ paths
html = re.sub(
r'/wp-content/uploads/\d{4}/\d{2}/([^"\'>\s]+)',
lambda m: f"/assets/images/{m.group(1).split('/')[-1].rsplit('.',1)[0]}.webp",
html
)
return html
```
## Files to skip
Do not migrate these WordPress system images to `src/assets/images/`:
- `woocommerce-placeholder.png` and variants
- `wp-includes/` images (WordPress core UI)
- Plugin admin icons (anything from `plugins/` in uploads)
- Files in `wc-logs/`, `ithemes-security/`, `amcu-chunks/` subdirs
## Logo handling
The logo is typically at:
```
uploads/YYYY/MM/VibrantYouYogaLogo.png
```
Place the logo at:
- `src/assets/images/logo.webp` — standard WebP version
- `src/assets/svg/logo.svg` — if an SVG version exists (preferred)
- `src/assets/images/logo.png` — keep PNG fallback for email/OG use
Reference in header.html:
```html
<a href="/" class="nav__logo">
<img src="/assets/images/logo.webp" alt="VibrantYou Yoga" width="160" height="48">
</a>
```
## OG image
Generate one 1200×630px OG image per `06-seo-meta.md` requirements.
Place at: `src/assets/images/og-default.jpg`
## Next step
Proceed to `07-seo-preservation.md` to build the redirect map and audit
every page's title, description, and canonical before the HTML build.
@@ -0,0 +1,182 @@
# 07 — SEO Preservation
Before building HTML, map every WordPress page URL to its new AM URL and
ensure title, description, canonical, and schema.org are preserved or improved.
## Step 1 — Inventory all WP URLs
Extract every published page slug from `pages.json`:
```bash
python3 -c "
import json
pages = json.load(open('.planning/data/pages.json'))
for p in pages:
slug = p['slug']
ptype = p['post_type']
print(f'/{slug}/ ({ptype}) title={p[\"title\"]!r}')
" | tee .planning/data/wp-url-inventory.txt
```
## Step 2 — Build redirect map
Map each WP URL to the new AM URL. Write to `.planning/data/redirect-map.txt`:
Format: `OLD_PATH -> NEW_PATH`
Common mapping patterns for yoga sites:
| Old WP URL | New AM URL | Action |
|-----------|-----------|--------|
| `/` | `/` | Same |
| `/about/` | `/about/` | Same |
| `/classes/` | `/classes/` | Same |
| `/yoga-class-name/` | `/classes/yoga-class-name.html` | Restructure |
| `/private-yoga-sessions/` | `/private-sessions/` | Rename |
| `/contact-us/` | `/contact/` | Simplify |
| `/?page_id=42` | `/about/` | WP ID → slug |
| `/blog/post-title/` | `/blog/post-title.html` | Flatten |
| `/events/event-name/` | `/classes/` | Consolidate into schedule |
Redirects go into `infra/nginx.conf`:
```nginx
# Exact-match redirects
location = /contact-us/ { return 301 /contact/; }
location = /private-yoga-sessions/ { return 301 /private-sessions/; }
# WP page ID redirects
location = / {
if ($arg_page_id = "42") { return 301 /about/; }
if ($arg_p) { return 301 /blog/; }
}
# WP upload URLs → AM asset paths (catch-all)
location ^~ /wp-content/uploads/ {
return 301 /assets/images/$uri;
}
# Block all WP URLs
location ~ ^/wp-(admin|login|json|cron|includes|content/plugins|content/themes) {
return 410;
}
```
## Step 3 — Rank Math SEO extraction
Rank Math stores titles and descriptions in `wp_postmeta`.
`analyze_db.py` already extracts these into `pages.json` as `seo_title` and `seo_description`.
For each page, the priority order for SEO fields:
1. `seo_title` from Rank Math (if not empty and not a template like `%title% - %sitename%`)
2. `post_title` with AM format appended: `{Title} | VibrantYou Yoga`
3. Never leave title as the raw WP default
Rank Math title templates use `%` tokens — strip them and rebuild:
```python
import re
def clean_rm_title(rm_title: str, post_title: str, site_name: str) -> str:
if not rm_title or "%" in rm_title:
return f"{post_title} | {site_name}"
return rm_title
def clean_rm_desc(rm_desc: str) -> str:
# Strip %token% placeholders
return re.sub(r"%[a-z_]+%", "", rm_desc).strip(" -|")
```
## Step 4 — Per-page SEO checklist
For every page in `pages.json`, fill in this record before writing HTML:
```json
{
"slug": "about",
"new_path": "/about/",
"canonical": "https://vibrantyou.yoga/about/",
"title": "About VibrantYou Yoga | Mindful Movement in [City], [State]",
"description": "Meet the instructors and story behind VibrantYou Yoga. [150-160 chars, include city]",
"keywords": "yoga studio [city], yoga instructor, mindful movement",
"og_image": "/assets/images/about-studio.webp",
"schema_type": "AboutPage",
"h1": "Our Story"
}
```
Write to `.planning/data/seo-map.json`. The HTML build reads this file to
stamp `<head>` tags.
## Step 5 — Schema.org per page type
| Page | Schema type | Required fields |
|------|------------|----------------|
| Home | `LocalBusiness` | name, url, telephone, address, areaServed, openingHours |
| About | `AboutPage` + `Organization` | name, description, founders |
| Classes index | `ItemList` of `Course` | name, url, description per class |
| Class detail | `Course` | name, description, provider, educationalLevel |
| Contact | `ContactPage` | name, url, telephone, email, address |
| Blog post | `Article` | headline, datePublished, author, image |
| 404 | none | — |
LocalBusiness schema for vibrantyou.yoga (seed from `site-info.json`):
```json
{
"@context": "https://schema.org",
"@type": ["LocalBusiness", "HealthAndBeautyBusiness"],
"@id": "https://vibrantyou.yoga/#business",
"name": "VibrantYou Yoga",
"url": "https://vibrantyou.yoga",
"telephone": "",
"priceRange": "$$",
"servesCuisine": null,
"currenciesAccepted": "USD",
"paymentAccepted": "Cash, Credit Card",
"address": {
"@type": "PostalAddress",
"streetAddress": "",
"addressLocality": "",
"addressRegion": "",
"postalCode": "",
"addressCountry": "US"
}
}
```
Mark address fields `DRAFT NEEDED` — do not fabricate. Pull from `wp_options`
(`admin_email`, Events Manager location settings) or ask client.
## Step 6 — Pre-launch SEO audit commands
Run these before declaring the build complete:
```bash
SITE=src
# Every page has a <title>
find $SITE -name "*.html" | xargs grep -L '<title>' | grep -v "_template"
# Every page has meta description
find $SITE -name "*.html" | xargs grep -L 'name="description"' | grep -v "_template"
# Every page has canonical
find $SITE -name "*.html" | xargs grep -L 'rel="canonical"' | grep -v "_template"
# Every page has JSON-LD
find $SITE -name "*.html" | xargs grep -L 'application/ld+json' | grep -v "_template"
# No WP URLs leaked into HTML
grep -r "wp-content\|wp-admin\|wordpress\|?p=\|?page_id=" $SITE --include="*.html"
# No unreplaced template placeholders
grep -r "{{" $SITE --include="*.html"
# No Divi class residue
grep -r "et_pb_\|divi-builder" $SITE --include="*.html"
```
All six commands must return zero results before launch.
## Next step
Proceed to `08-run-order.md` for the complete execution sequence,
then `02-wordpress-to-html-migration.md` Phase 7 for DNS cutover.
@@ -0,0 +1,230 @@
# 08 — Run Order (DEPRECATED)
> **Superseded by `10-agent-breadcrumbs.md`.**
> This file described the WP → static HTML (Stack B) run order.
> The pipeline now targets Stack A (PHP router + SQLite).
> Use `10-agent-breadcrumbs.md` for the current ordered execution checklist.
---
Step-by-step execution sequence for a complete .wpress → AM HTML migration.
Run each command, verify the output, then proceed to the next.
## Prerequisites
```bash
# Python 3.8+ required
python3 --version
# cwebp for image conversion (optional — Python fallback available)
which cwebp || echo "cwebp not installed — will use Python Pillow fallback"
# Set project domain variable (use throughout)
export DOMAIN="vibrantyou.yoga"
export PROJECT="/home/sirdrez/arisingmedia-websites/$DOMAIN"
export SOPS="/home/sirdrez/arisingmedia-websites/.am-webdesign-sops"
export WPRESS=$(ls $PROJECT/.planning/*.wpress | head -1)
echo "Domain: $DOMAIN"
echo "Project: $PROJECT"
echo "Archive: $WPRESS"
```
---
## Phase 0 — Setup
```bash
# Create directory structure
mkdir -p $PROJECT/{src/{about,services,contact,blog,classes,components,assets/{css,js,images,svg,fonts}},build,infra,api,.planning/{data/{content},scripts,wpress-extract}}
# Verify archive
ls -lh $WPRESS
file $WPRESS
```
---
## Phase 1 — Extract archive
```bash
python3 $SOPS/wp-divi-pipeline/scripts/extract_wpress.py \
"$WPRESS" \
"$PROJECT/.planning/wpress-extract/"
# Verify
ls $PROJECT/.planning/wpress-extract/
cat $PROJECT/.planning/wpress-extract/package.json | python3 -m json.tool | head -20
ls -lh $PROJECT/.planning/wpress-extract/database.sql
```
Expected output: `DONE: N files | X MB`
---
## Phase 2 — Database analysis
```bash
python3 $SOPS/wp-divi-pipeline/scripts/analyze_db.py \
"$PROJECT/.planning/wpress-extract/" \
"$PROJECT/.planning/data/"
# Verify
cat $PROJECT/.planning/data/site-info.json
echo "Pages: $(python3 -c "import json; print(len(json.load(open('$PROJECT/.planning/data/pages.json'))))")"
cat $PROJECT/.planning/data/design-system.json
```
Expected output: `pages.json (N pages/posts)`
If pages = 0, check the SQL prefix detection in the script output.
---
## Phase 3 — Content extraction
### Divi 5 (most common — check design-system.json divi_version first)
```bash
python3 $SOPS/wp-divi-pipeline/scripts/extract_divi5.py \
"$PROJECT/.planning/data/pages.json" \
"$PROJECT/.planning/data/content/"
# Verify
ls $PROJECT/.planning/data/content/
cat $PROJECT/.planning/data/content/home.json | python3 -m json.tool | head -40
```
---
## Phase 4 — Design system
Read `$PROJECT/.planning/data/design-system.json` and seed `main.css`:
```bash
cat $PROJECT/.planning/data/design-system.json
```
Manually translate to CSS custom properties per `04-design-system-extraction.md`.
Write to: `$PROJECT/src/assets/css/main.css`
Key values for vibrantyou.yoga:
- Primary: #1a8a7a Dark: #0f5f53
- Body font: DM Sans Heading font: DM Serif Display
---
## Phase 5 — Media migration
```bash
# Catalog originals (skip WP-generated size variants)
find $PROJECT/.planning/wpress-extract/uploads -type f \
\( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" -o -name "*.webp" \) | \
grep -v -E "\-[0-9]+x[0-9]+\.(jpg|jpeg|png|webp)$" | \
sort > $PROJECT/.planning/data/media-originals.txt
echo "Original images: $(wc -l < $PROJECT/.planning/data/media-originals.txt)"
# Copy to src/assets/images/
while IFS= read -r src; do
cp "$src" "$PROJECT/src/assets/images/$(basename $src)"
done < $PROJECT/.planning/data/media-originals.txt
# Convert to WebP (cwebp path)
cd $PROJECT/src/assets/images/
for img in *.jpg *.jpeg *.png; do
[ -f "$img" ] || continue
base="${img%.*}"
cwebp -q 82 "$img" -o "${base}.webp" 2>/dev/null && rm "$img"
done
echo "WebP count: $(ls *.webp 2>/dev/null | wc -l)"
cd $PROJECT
```
---
## Phase 6 — Build HTML
Per `05-content-migration.md`, build pages in this order:
```bash
# 1. Write src/assets/css/main.css (design tokens — manual)
# 2. Write src/assets/css/components.css (manual)
# 3. Write src/components/header.html (manual)
# 4. Write src/components/footer.html (manual)
# 5. Write src/assets/js/components.js (fetch + inject)
# 6. Write src/assets/js/main.js (scroll, animations)
# 7. Write src/index.html (home page — first, establishes design)
# 8. Write remaining pages
# After build, verify zero unreplaced placeholders
grep -r "{{" $PROJECT/src --include="*.html" && echo "FAIL: placeholders found" || echo "OK"
# Verify no Divi residue
grep -rn "et_pb_\|wp:divi\|\[et_pb" $PROJECT/src --include="*.html" && echo "FAIL: Divi residue" || echo "OK"
```
---
## Phase 7 — SEO audit
```bash
cd $PROJECT/src
# All pages have title
find . -name "*.html" | grep -v "_template" | xargs grep -L '<title>' | head
# All pages have canonical
find . -name "*.html" | grep -v "_template" | xargs grep -L 'rel="canonical"' | head
# All pages have JSON-LD
find . -name "*.html" | grep -v "_template" | xargs grep -L 'ld+json' | head
cd $PROJECT
```
All commands must return empty output.
---
## Phase 8 — Infra (Docker)
```bash
# Copy infra from reference project
cp /home/sirdrez/arisingmedia-websites/vibrantyoucoaching.com/Dockerfile $PROJECT/
cp /home/sirdrez/arisingmedia-websites/vibrantyoucoaching.com/docker-compose.yml $PROJECT/
cp -r /home/sirdrez/arisingmedia-websites/vibrantyoucoaching.com/infra/ $PROJECT/infra/
# Update nginx.conf: set server_name to $DOMAIN, add redirects from 07-seo-preservation.md
# Update docker-compose.yml: set container_name and port
# Test build
docker compose -f $PROJECT/docker-compose.yml build 2>&1 | tail -5
docker compose -f $PROJECT/docker-compose.yml up -d
curl -I http://localhost:PORT/ 2>&1 | head -5
```
---
## Phase 9 — Protection check
```bash
# Run after deploy
bash $SOPS/tools/verify-protection.sh https://$DOMAIN
# Must return exit 0 with no FAIL lines
```
---
## Checklist summary
- [ ] Phase 0: Directories created
- [ ] Phase 1: .wpress extracted, database.sql present
- [ ] Phase 2: pages.json > 0 entries, design-system.json has colors + fonts
- [ ] Phase 3: content/ dir has one JSON per page
- [ ] Phase 4: main.css written with full :root{} token block
- [ ] Phase 5: WebP images in src/assets/images/
- [ ] Phase 6: All HTML pages built, zero {{ placeholders, zero Divi residue
- [ ] Phase 7: All SEO audit commands return empty
- [ ] Phase 8: Docker container up, curl returns 200
- [ ] Phase 9: verify-protection.sh exits 0
@@ -0,0 +1,370 @@
# 09 — Stack A Output Spec (SQLite Schema + sections_json)
## SQLite databases produced by seed_databases.py
### pages.sqlite
```sql
CREATE TABLE pages (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
template TEXT NOT NULL, -- home | static | classes | schedule | glossary | blog
title TEXT NOT NULL,
meta_description TEXT,
canonical_url TEXT,
og_image TEXT,
schema_json TEXT,
hero_eyebrow TEXT,
hero_h1 TEXT,
hero_lead TEXT,
sections_json TEXT, -- JSON array of section objects
updated_at TEXT
);
```
### nav.sqlite
```sql
CREATE TABLE nav_items (
id INTEGER PRIMARY KEY,
label TEXT NOT NULL,
href TEXT NOT NULL,
display_order INTEGER DEFAULT 0,
is_cta INTEGER DEFAULT 0 -- 1 = render as button
);
```
### blog.sqlite
```sql
CREATE TABLE posts (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
title TEXT NOT NULL,
excerpt TEXT,
body_html TEXT,
author TEXT DEFAULT 'Admin',
published_at TEXT,
og_image TEXT,
tags TEXT
);
```
### testimonials.sqlite
```sql
CREATE TABLE testimonials (
id INTEGER PRIMARY KEY,
quote TEXT NOT NULL,
author_name TEXT NOT NULL,
author_role TEXT,
is_featured INTEGER DEFAULT 0,
display_order INTEGER DEFAULT 0
);
```
### glossary.sqlite (if site has a glossary)
```sql
CREATE TABLE terms (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
term TEXT NOT NULL,
pronunciation TEXT,
definition TEXT NOT NULL,
category TEXT NOT NULL,
level TEXT NOT NULL,
display_order INTEGER DEFAULT 0
);
```
### faq.sqlite (if site has FAQs)
```sql
CREATE TABLE faqs (
id INTEGER PRIMARY KEY,
question TEXT NOT NULL,
answer TEXT NOT NULL,
category TEXT NOT NULL,
display_order INTEGER DEFAULT 0
);
```
## sections_json section types
Each page row's sections_json is a JSON array. Each element is a typed object:
### text_split
Two-column: text on one side, image on the other. CTAs optional.
```json
{
"type": "text_split",
"eyebrow": "",
"h2": "",
"body": "",
"img": "/assets/images/x.webp",
"img_alt": "",
"cta_label": "",
"cta_href": "",
"reverse": false
}
```
### feature_cards
Grid of 3-4 cards, each with icon + title + body.
```json
{
"type": "feature_cards",
"eyebrow": "",
"h2": "",
"lead": "",
"cards": [
{"icon": "", "title": "", "body": ""}
]
}
```
### accordion
Collapsible question/answer pairs.
```json
{
"type": "accordion",
"eyebrow": "",
"h2": "",
"items": [
{"q": "", "a": ""}
]
}
```
### cta_band
Full-width call-to-action with headline + button.
```json
{
"type": "cta_band",
"eyebrow": "",
"h2": "",
"lead": "",
"btn_label": "",
"btn_href": "",
"variant": "forest"
}
```
### text_block
Simple text heading + body.
```json
{
"type": "text_block",
"eyebrow": "",
"h2": "",
"body": ""
}
```
### stats_strip
Grid of stat + label pairs.
```json
{
"type": "stats_strip",
"stats": [
{"value": "", "label": ""}
]
}
```
### topic_pills
Row of clickable topic/tag items.
```json
{
"type": "topic_pills",
"eyebrow": "",
"h2": "",
"items": [
{"label": "", "href": ""}
]
}
```
### form_contact
Embedded contact form.
```json
{
"type": "form_contact",
"h2": "",
"lead": ""
}
```
### booking_options
Pricing table or service options grid.
```json
{
"type": "booking_options",
"eyebrow": "",
"h2": "",
"options": [
{"name": "", "price": "", "features": [], "cta_label": "", "cta_href": ""}
]
}
```
## Divi module → section type mapping
| Divi Module | AM Section Type | Notes |
|---|---|---|
| et_pb_blurb | feature_cards item | Extract icon, title, body |
| et_pb_toggle | accordion item | Extract q/a pairs |
| et_pb_cta | cta_band | Extract headline, button text, href |
| et_pb_pricing_table | booking_options | Extract plan names, prices, features |
| et_pb_testimonial | testimonials.sqlite row | Extract quote, author, role |
| et_pb_text | text_block | Extract body copy |
| et_pb_code | text_block (sanitized) | Extract HTML, remove script tags |
| et_pb_number_counter | stats_strip item | Extract number, label |
| et_pb_button | cta_band (minimal) | Extract button text, href |
| et_pb_menu / header | nav.sqlite rows | Extract label, URL, menu order |
## seed_databases.py structure
Every migration generates a seed_databases.py at `build/seed_databases.py`.
Template structure:
```python
import sqlite3
import json
from pathlib import Path
pages_path = Path('src/api/data/pages.sqlite')
nav_path = Path('src/api/data/nav.sqlite')
blog_path = Path('src/api/data/blog.sqlite')
testimonials_path = Path('src/api/data/testimonials.sqlite')
def seed_pages(conn):
"""INSERT all pages with sections_json and hero data."""
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS pages (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
template TEXT NOT NULL,
title TEXT NOT NULL,
meta_description TEXT,
canonical_url TEXT,
og_image TEXT,
schema_json TEXT,
hero_eyebrow TEXT,
hero_h1 TEXT,
hero_lead TEXT,
sections_json TEXT,
updated_at TEXT
)
''')
pages = [
('home', 'home', 'Home', 'Home meta', '/home', '', '{}',
'', 'Welcome', 'Lead text', json.dumps([...])),
# ... more rows
]
for page in pages:
cursor.execute(
'INSERT INTO pages (slug, template, title, meta_description, canonical_url, og_image, schema_json, hero_eyebrow, hero_h1, hero_lead, sections_json, updated_at) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, datetime("now"))',
page
)
def seed_nav(conn):
"""INSERT navigation items from nav.json."""
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS nav_items (
id INTEGER PRIMARY KEY,
label TEXT NOT NULL,
href TEXT NOT NULL,
display_order INTEGER DEFAULT 0,
is_cta INTEGER DEFAULT 0
)
''')
items = [
('Home', '/', 0, 0),
('About', '/about', 1, 0),
('Contact', '/contact', 2, 1),
# ... more rows
]
for item in items:
cursor.execute(
'INSERT INTO nav_items (label, href, display_order, is_cta) VALUES (?, ?, ?, ?)',
item
)
def seed_blog(conn):
"""INSERT blog posts if site has a blog."""
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS posts (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
title TEXT NOT NULL,
excerpt TEXT,
body_html TEXT,
author TEXT DEFAULT 'Admin',
published_at TEXT,
og_image TEXT,
tags TEXT
)
''')
# ... INSERT rows
def seed_testimonials(conn):
"""INSERT testimonials if present."""
# ... CREATE TABLE + INSERT rows
if __name__ == '__main__':
for db_path, seeder_fn in [
(pages_path, seed_pages),
(nav_path, seed_nav),
(blog_path, seed_blog),
(testimonials_path, seed_testimonials),
]:
if db_path.exists():
db_path.unlink() # clear if re-running
conn = sqlite3.connect(db_path)
seeder_fn(conn)
conn.commit()
conn.close()
print(f"seeded: {db_path.name}")
print("All databases seeded successfully.")
```
## Content validation checklist
After staging seed_databases.py and before running it:
- [ ] No raw Divi shortcode residue: `[et_pb_`, `[vc_`, etc.
- [ ] No em-dashes (—): replace with commas, periods, or spaces
- [ ] No "Netherlands" or other location-specific copy (unless intentional)
- [ ] hero_h1 is 5-10 words (brand voice, not generic)
- [ ] Each section type matches the spec above (no custom types)
- [ ] All images are `/assets/images/{name}.webp` (not absolute URLs)
- [ ] All CTAs point to correct slugs (`/about`, `/contact`, etc.)
- [ ] Nav items include at least 3 menu links
- [ ] At least one nav item has `is_cta=1` (usually Contact or Book)
@@ -0,0 +1,249 @@
# 10 — Agent Execution Breadcrumbs
Step-by-step ordered checklist for an agent executing a .wpress migration to Stack A.
Each step has: input, command, expected output, verification. Complete each before next.
## Pre-flight
- [ ] .wpress file confirmed at `$PROJECT/.planning/*.wpress`
- [ ] python3 --version >= 3.8
- [ ] docker compose version confirmed
- [ ] DOMAIN and PROJECT env vars set
## Step 1 — Extract archive
**INPUT:** `$WPRESS` (path to .wpress file)
**CMD:**
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/extract_wpress.py "$WPRESS" "$PROJECT/.planning/wpress-extract/"
```
**VERIFY:**
```bash
ls $PROJECT/.planning/wpress-extract/
```
Expected: `database.sql` and `wp-content/` present
**BLOCK:** If database.sql missing, .wpress format differs — check extract_wpress.py logs.
---
## Step 2 — Analyze database
**INPUT:** `$PROJECT/.planning/wpress-extract/database.sql`
**CMD:**
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/analyze_db.py "$PROJECT/.planning/wpress-extract/" "$PROJECT/.planning/data/"
```
**VERIFY:**
```bash
cat $PROJECT/.planning/data/pages.json | python3 -m json.tool | head -20
cat $PROJECT/.planning/data/site-info.json
```
Expected: page objects with slug + title visible; divi_version: 4 or 5
**BLOCK:** If pages.json empty, check table prefix detection in analyze_db.py output.
---
## Step 3 — Extract nav menus
**INPUT:** `$PROJECT/.planning/wpress-extract/database.sql`
**CMD:**
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/extract_nav.py "$PROJECT/.planning/wpress-extract/" "$PROJECT/.planning/data/"
```
**VERIFY:**
```bash
cat $PROJECT/.planning/data/nav.json | python3 -m json.tool
```
Expected: array of `{label, href, display_order, is_cta}` objects. At least 3 items.
**NOTE:** `is_cta=1` for "Book", "Get Started", "Contact", "Sign Up" type items.
---
## Step 4 — Extract page content
**INPUT:** `$PROJECT/.planning/data/pages.json` + `wpress-extract/`
**CMD:** (choose based on Divi version from Step 2)
Divi 5:
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/extract_divi5.py "$PROJECT/.planning/data/pages.json" "$PROJECT/.planning/data/content/"
```
Divi 4:
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/extract_divi4.py "$PROJECT/.planning/data/pages.json" "$PROJECT/.planning/data/content/"
```
**VERIFY:**
```bash
ls $PROJECT/.planning/data/content/
cat $PROJECT/.planning/data/content/home.json | python3 -m json.tool | head -40
```
Expected: one .json file per page (home.json, about.json, etc.); sections array with type fields visible.
---
## Step 5 — Extract media
**INPUT:** `$PROJECT/.planning/wpress-extract/wp-content/uploads/`
**CMD:**
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/extract_media.py "$PROJECT/.planning/wpress-extract/" "$PROJECT/.planning/data/" "$PROJECT/assets/images/"
```
**VERIFY:**
```bash
ls $PROJECT/assets/images/ | head -10
cat $PROJECT/.planning/data/media-manifest.json | python3 -m json.tool | head -20
```
Expected: .webp files present; media-manifest.json shows `original_url → /assets/images/x.webp` mapping.
---
## Step 6 — Stage seed_databases.py skeleton
**INPUT:** All .json files in `$PROJECT/.planning/data/content/` + `nav.json` + `media-manifest.json`
**CMD:**
```bash
python3 $SOPS/wp-divi-pipeline-to-am-stack/scripts/stage_seed.py "$PROJECT/.planning/data/" "$PROJECT/build/seed_databases.py" --domain "$DOMAIN"
```
**VERIFY:**
```bash
python3 -c "import ast; ast.parse(open('$PROJECT/build/seed_databases.py').read()); print('syntax OK')"
grep "def seed_pages" $PROJECT/build/seed_databases.py
```
Expected: seed_databases.py is valid Python; contains seed_pages, seed_nav functions.
**NOTE:** Content stubs are in place. Human/agent reviews + fills in prose before running.
---
## Step 7 — Review and fill content
**MANUAL:** Open `$PROJECT/build/seed_databases.py`
For each page's `sections_json`:
- [ ] Confirm `hero_h1` and `hero_lead` match the brand (not raw Divi copy-paste)
- [ ] Confirm each section has correct type (see 09-stack-a-output.md mapping)
- [ ] Replace any em-dashes (—) with commas or periods
- [ ] Replace any Divi shortcode residue (`[et_pb_`, `vc_`, etc.)
- [ ] Ensure no "Netherlands" or location-specific copy if site is global
- [ ] Confirm nav items in `seed_nav()` match final site IA
- [ ] Verify all image paths are `/assets/images/{name}.webp`
- [ ] Verify all CTAs point to correct slugs (`/about`, `/contact`, etc.)
---
## Step 8 — Run seed_databases.py
**CMD:**
```bash
cd $PROJECT && python3 build/seed_databases.py
```
**VERIFY:**
```bash
ls -lh src/api/data/
```
Expected: Output line shows counts > 0: `seeded: pages=N nav=N blog=N ...`. Database files exist.
**BLOCK:** Any count=0 means that seeder function has an error — fix before continuing.
---
## Step 9 — Scaffold PHP templates
**CMD:** Copy reference templates from vibrantyou.yoga as starting point:
```bash
VYOGA="/home/sirdrez/arisingmedia-websites/vibrantyou.yoga"
cp $VYOGA/src/api/router.php $PROJECT/src/api/router.php
cp $VYOGA/src/api/contact.php $PROJECT/src/api/contact.php
cp $VYOGA/src/api/templates/static.php $PROJECT/src/api/templates/static.php
cp $VYOGA/src/api/templates/home.php $PROJECT/src/api/templates/home.php
cp $VYOGA/src/api/components/_header.php $PROJECT/src/api/components/_header.php
cp $VYOGA/src/api/components/_footer.php $PROJECT/src/api/components/_footer.php
cp -r $VYOGA/assets/css $PROJECT/assets/
cp -r $VYOGA/assets/js $PROJECT/assets/
cp $VYOGA/Dockerfile $PROJECT/
cp $VYOGA/docker-compose.yml $PROJECT/
cp -r $VYOGA/infra $PROJECT/
```
**VERIFY:**
```bash
php -l $PROJECT/src/api/router.php
```
Expected: `No syntax errors detected`
**NOTE:** Update brand name, colors, and any site-specific logic in templates.
**NOTE:** `_header.php` reads from nav.sqlite — no hardcoded nav needed.
---
## Step 10 — Build and test
**CMD:**
```bash
cd $PROJECT && docker compose build --no-cache && docker compose up -d
```
**VERIFY:**
```bash
sleep 5
curl -I http://localhost:8000/
curl -s http://localhost:8000/ | grep -i "title\|h1" | head -3
```
Expected: HTTP 200; site name visible in page.
---
## Step 11 — Protection + SEO check
**CMD:**
```bash
bash /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/tools/verify-protection.sh http://localhost:8000
```
**VERIFY:** Exit 0, no FAIL lines
---
## Step 12 — Lighthouse + cleanup
**MANUAL:**
- Open Firefox: `firefox http://localhost:8000/`
- Run Lighthouse (DevTools > Lighthouse)
**TARGET:**
- Performance >= 90
- SEO >= 95
- Accessibility >= 90
**CLEANUP:**
```bash
cd $PROJECT && docker compose down
```
+81
View File
@@ -0,0 +1,81 @@
# WP + Divi to AM Stack A Pipeline — SOP Index
End-to-end playbook for converting any WordPress / Divi site backup (.wpress)
into an Arising Media Stack A deployment: PHP router + SQLite + vanilla JS/CSS.
## Quick start (CLI launcher)
```bash
python3 scripts/migrate.py --wpress /path/to/backup.wpress --domain example.com
```
Runs phases 0-6 automatically (extract, analyze, nav, content, media, stage seed).
Prints agent breadcrumbs for phases 7-11. See `10-agent-breadcrumbs.md` for the
complete ordered execution checklist.
## SOPs in this folder
| File | Phase | Description |
|------|-------|-------------|
| `00-overview.md` | — | Pipeline overview, philosophy, what to extract vs not replicate |
| `01-wpress-extraction.md` | 1 | .wpress binary format, extraction script, verification |
| `02-database-analysis.md` | 2 | MySQL dump parsing, page inventory, Divi version detection |
| `03-divi-content-extraction.md` | 3 | Divi 4 shortcodes vs Divi 5 blocks, extraction scripts |
| `04-design-system-extraction.md` | 4 | Colors, fonts, spacing → tokens.css |
| `05-content-migration.md` | 5-6 | Section remapping, content staging, seed_databases.py |
| `06-media-assets.md` | 5 | Upload migration, WebP conversion, media manifest |
| `07-seo-preservation.md` | 7 | Redirect map, Rank Math extraction, schema.org |
| `08-run-order.md` | — | DEPRECATED — superseded by `10-agent-breadcrumbs.md` |
| `09-stack-a-output.md` | — | SQLite schemas, sections_json spec, Divi→AM module mapping |
| `10-agent-breadcrumbs.md` | 0-11 | Ordered agent execution checklist (.wpress → live Docker) |
## Scripts in scripts/
| Script | Purpose |
|--------|---------|
| `migrate.py` | CLI launcher — runs phases 0-6, prints breadcrumbs for 7-11 |
| `run_pipeline.sh` | Legacy shell wrapper (pre-migrate.py) |
| `extract_wpress.py` | Unpack .wpress binary archive |
| `analyze_db.py` | Parse SQL dump → pages.json + design-system.json |
| `extract_divi5.py` | Parse Divi 5 blocks → per-page content JSON |
| `extract_nav.py` | Extract WordPress nav menus → nav.json |
| `stage_seed.py` | Map extracted JSON → seed_databases.py skeleton (Phase 6) |
## Key facts about .wpress archives
- Format: Custom sequential binary (NOT zip/tar) — 4377-byte headers
- Table prefix in SQL dump: `SERVMASK_PREFIX_` (placeholder, NOT `wp_`)
- Directory layout: flat — `uploads/`, `themes/`, `plugins/` at archive root (no `wp-content/` wrapper)
- Divi 5 stores theme settings in `et_divi` option as PHP-serialized array
## vibrantyou.yoga — extracted data reference
Site: Vibrant You Yoga (instructor: Meghan)
Domain: https://vibrantyou.yoga
Divi version: 5.0.3
WP version: 6.9.4
Design system:
- Primary: #1a8a7a Dark: #0f5f53 Secondary: #2ea3f2
- Body: #5a6b68 Headings: #2d2d2d
- Body font: DM Sans 17px / 1.6 lh
- Heading font: DM Serif Display 600 / 36px / 1.2 lh
Pages to migrate (22 published):
- home, about, classes, schedule, instructors, contact, blog, faq
- book (private sessions), online-yoga, donate
- Drop: video-category, video-tag, search-videos, user-videos, player-embed,
categories, tags, my-bookings (all plugin-generated archive pages)
Plugins requiring AM replacements:
- Gravity Forms + Stripe → AM HTML form + Python API + Resend
- Events Manager → static schedule table in /schedule/
- All-in-One Video Gallery → embed YouTube/Vimeo directly or drop
## Related SOPs
- `../01-project-structure.md` — AM deployment directory layout
- `../02-wordpress-to-html-migration.md` — Original 8-phase WP migration playbook
- `../03-build-pipeline.md` — JSON + template stamping for repeated pages
- `../06-seo-meta.md` — Full `<head>` requirements, schema.org per page type
- `../tools/verify-protection.sh` — Post-deploy security audit
@@ -0,0 +1,368 @@
#!/usr/bin/env python3
"""Analyze WordPress MySQL dump from a .wpress extract.
Parses database.sql and outputs:
- pages.json : all published pages with title, slug, content, SEO meta
- design-system.json : colors, fonts from wp_options (Divi theme settings)
- site-info.json : domain, WP version, detected Divi version, plugin list
Usage:
python3 analyze_db.py <extract_dir> <output_data_dir>
extract_dir : path to wpress-extract/ (contains database.sql)
output_data_dir : where to write JSON output files (e.g. .planning/data/)
"""
from __future__ import annotations
import json
import os
import re
import sys
from pathlib import Path
from typing import Any
# ---------------------------------------------------------------------------
# SQL parsing helpers
# ---------------------------------------------------------------------------
def _unescape_sql(s: str) -> str:
"""Undo MySQL string escaping."""
return (s
.replace("\\'", "'")
.replace('\\"', '"')
.replace("\\\\", "\\")
.replace("\\n", "\n")
.replace("\\r", "\r")
.replace("\\t", "\t")
.replace("\\0", "\0"))
def _parse_values_block(sql_block: str) -> list[list[str]]:
"""Extract rows from a multi-row INSERT VALUES block.
Handles commas inside quoted strings via a simple state machine.
Returns list of rows; each row is a list of raw string values.
"""
rows: list[list[str]] = []
# Find VALUES section
m = re.search(r"VALUES\s*", sql_block, re.IGNORECASE)
if not m:
return rows
rest = sql_block[m.end():]
i = 0
n = len(rest)
while i < n:
# Skip to '('
while i < n and rest[i] != '(':
i += 1
if i >= n:
break
i += 1 # skip '('
row: list[str] = []
field = []
in_quote = False
quote_char = ''
while i < n:
c = rest[i]
if not in_quote:
if c in ("'", '"'):
in_quote = True
quote_char = c
i += 1
continue
elif c == ',' :
row.append("".join(field))
field = []
i += 1
continue
elif c == ')':
row.append("".join(field))
field = []
rows.append(row)
i += 1
break
elif c == 'N' and rest[i:i+4] == 'NULL':
field.append('\x00NULL\x00')
i += 4
continue
else:
field.append(c)
i += 1
else:
if c == '\\' and i + 1 < n:
field.append(c)
field.append(rest[i + 1])
i += 2
continue
elif c == quote_char:
in_quote = False
i += 1
continue
else:
field.append(c)
i += 1
return rows
def load_table(sql_text: str, table_name: str) -> list[dict]:
"""Return all rows for table_name as list of dicts."""
# Find column definition
col_re = re.compile(
rf"CREATE TABLE `{re.escape(table_name)}`\s*\((.*?)\)\s*ENGINE",
re.DOTALL | re.IGNORECASE,
)
m = col_re.search(sql_text)
if not m:
return []
col_block = m.group(1)
cols = re.findall(r"`([^`]+)`\s+(?:bigint|int|mediumint|smallint|tinyint|varchar|text|mediumtext|longtext|char|datetime|date|float|double|decimal|enum|set|blob|mediumblob|longblob)", col_block, re.IGNORECASE)
# Find INSERT blocks for this table
insert_re = re.compile(
rf"INSERT INTO `{re.escape(table_name)}`\s+VALUES\s*\(.+?\);",
re.DOTALL | re.IGNORECASE,
)
rows_out: list[dict] = []
for block in insert_re.finditer(sql_text):
parsed = _parse_values_block(block.group(0))
for row in parsed:
d: dict[str, Any] = {}
for idx, col in enumerate(cols):
val = row[idx] if idx < len(row) else ""
if val == "\x00NULL\x00":
d[col] = None
else:
d[col] = _unescape_sql(val)
rows_out.append(d)
return rows_out
# ---------------------------------------------------------------------------
# Divi version detection
# ---------------------------------------------------------------------------
def detect_divi_version(sql_text: str) -> str:
if "wp:divi/" in sql_text:
return "5"
if "[et_pb_section" in sql_text:
return "4"
# Check et_theme_builder version in options
m = re.search(r"'et_theme_builder_api_version','([^']+)'", sql_text)
if m:
return "5"
return "unknown"
# ---------------------------------------------------------------------------
# Options extraction
# ---------------------------------------------------------------------------
def load_options(sql_text: str, prefix: str = "wp_") -> dict[str, str]:
table = f"{prefix}options"
rows = load_table(sql_text, table)
return {r["option_name"]: r["option_value"] for r in rows if r.get("option_name")}
def _parse_php_serialized_pairs(raw: str) -> dict[str, str]:
"""Extract key/value string pairs from a PHP-serialized array.
Handles both escaped (SQL-dump) and unescaped forms.
Only returns s->s pairs (string key, string value).
"""
result: dict[str, str] = {}
# SQL dumps escape double-quotes as \\", giving patterns like:
# s:9:\\"body_font\\";s:7:\\"DM Sans\\";
# Also handle unescaped form: s:9:"body_font";s:7:"DM Sans";
pat = re.compile(
r's:\d+:\\"([^"\\]+)\\";s:\d+:\\"([^"\\]*)\\"' # SQL-escaped
r'|s:\d+:"([^"]+)";s:\d+:"([^"]*)"', # plain
)
for m in pat.finditer(raw):
if m.group(1) is not None:
k, v = m.group(1), m.group(2)
else:
k, v = m.group(3), m.group(4)
result[k] = v
return result
def extract_design_system(options: dict[str, str]) -> dict:
"""Pull Divi theme colors, fonts, and spacing from wp_options."""
raw = options.get("et_divi", "") or options.get("et_divi_options", "")
design: dict[str, Any] = {}
# Parse PHP-serialized et_divi option (Divi 4 + 5 store settings here)
if raw:
pairs = _parse_php_serialized_pairs(raw)
# Map Divi option keys to design-system keys
key_map = {
"accent_color": "primary_color_dark",
"link_color": "primary_color",
"body_font": "body_font",
"heading_font": "heading_font",
"header_font": "heading_font", # Divi 4 alias
"body_font_size": "body_font_size",
"body_line_height": "body_line_height",
"heading_font_weight": "heading_font_weight",
"header_text_size": "heading_font_size",
"header_line_height": "heading_line_height",
"header_color": "heading_color",
"font_color": "body_color",
"secondary_accent_color": "secondary_color",
}
for divi_key, design_key in key_map.items():
if divi_key in pairs:
design.setdefault(design_key, pairs[divi_key])
# Site info
design["site_url"] = options.get("siteurl", "")
design["site_name"] = options.get("blogname", "")
return design
# ---------------------------------------------------------------------------
# Page extraction
# ---------------------------------------------------------------------------
def extract_pages(sql_text: str, prefix: str = "wp_") -> list[dict]:
"""Return all published pages and posts with SEO meta."""
posts = load_table(sql_text, f"{prefix}posts")
postmeta = load_table(sql_text, f"{prefix}postmeta")
# Build postmeta lookup: post_id -> {meta_key: meta_value}
meta_map: dict[str, dict[str, str]] = {}
for row in postmeta:
pid = str(row.get("post_id", ""))
meta_map.setdefault(pid, {})[row.get("meta_key", "")] = row.get("meta_value", "")
pages = []
for p in posts:
if p.get("post_status") not in ("publish",):
continue
post_type = p.get("post_type", "")
if post_type not in ("page", "post", "event"):
continue
pid = str(p.get("ID", ""))
meta = meta_map.get(pid, {})
# Rank Math SEO fields
rm_title = meta.get("rank_math_title", "")
rm_desc = meta.get("rank_math_description", "")
rm_focus = meta.get("rank_math_focus_keyword", "")
entry = {
"id": pid,
"post_type": post_type,
"slug": p.get("post_name", ""),
"title": p.get("post_title", ""),
"status": p.get("post_status", ""),
"date": p.get("post_date", "")[:10],
"modified": p.get("post_modified", "")[:10],
"content_raw": p.get("post_content", ""),
"excerpt": p.get("post_excerpt", ""),
"parent_id": p.get("post_parent", "0"),
"menu_order": p.get("menu_order", "0"),
"seo_title": rm_title,
"seo_description": rm_desc,
"seo_keywords": rm_focus,
"acf": {k: v for k, v in meta.items() if not k.startswith("_") and not k.startswith("rank_math") and not k.startswith("et_")},
}
pages.append(entry)
pages.sort(key=lambda x: int(x["menu_order"] or 0))
return pages
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
if len(sys.argv) < 3:
print(f"Usage: {sys.argv[0]} <extract_dir> <output_data_dir>")
sys.exit(1)
extract_dir = Path(sys.argv[1])
out_dir = Path(sys.argv[2])
out_dir.mkdir(parents=True, exist_ok=True)
sql_file = extract_dir / "database.sql"
if not sql_file.exists():
# Search for it
found = list(extract_dir.rglob("*.sql"))
if not found:
print(f"ERROR: No .sql file found under {extract_dir}")
sys.exit(1)
sql_file = found[0]
print(f"Found SQL at: {sql_file}")
print(f"Loading {sql_file} ({sql_file.stat().st_size / 1024 / 1024:.1f} MB)...")
sql_text = sql_file.read_text(encoding="utf-8", errors="replace")
# Detect Divi version
divi_version = detect_divi_version(sql_text)
print(f"Divi version detected: {divi_version}")
# Load wp_options
pkg = {}
pkg_file = extract_dir / "package.json"
if pkg_file.exists():
pkg = json.loads(pkg_file.read_text())
# AIOIM dumps use SERVMASK_PREFIX_ as a placeholder in the SQL file.
# Detect which prefix the dump actually uses.
if "SERVMASK_PREFIX_" in sql_text:
sql_prefix = "SERVMASK_PREFIX_"
else:
sql_prefix = pkg.get("Database", {}).get("Prefix", "wp_")
runtime_prefix = pkg.get("Database", {}).get("Prefix", "wp_")
print(f"SQL prefix: {sql_prefix!r} (runtime prefix: {runtime_prefix!r})")
options = load_options(sql_text, sql_prefix)
print(f"Loaded {len(options)} options")
# Design system
design = extract_design_system(options)
design["divi_version"] = divi_version
design["wp_version"] = pkg.get("WordPress", {}).get("Version", "")
design["plugins"] = pkg.get("Plugins", [])
(out_dir / "design-system.json").write_text(json.dumps(design, indent=2, ensure_ascii=False))
print(f"Wrote design-system.json ({len(design)} keys)")
# Pages
pages = extract_pages(sql_text, sql_prefix)
(out_dir / "pages.json").write_text(json.dumps(pages, indent=2, ensure_ascii=False))
print(f"Wrote pages.json ({len(pages)} pages/posts)")
# Site info summary
site_info = {
"domain": pkg.get("SiteURL", options.get("siteurl", "")),
"name": options.get("blogname", ""),
"tagline": options.get("blogdescription", ""),
"admin_email": options.get("admin_email", ""),
"wp_version": pkg.get("WordPress", {}).get("Version", ""),
"divi_version": divi_version,
"plugins": pkg.get("Plugins", []),
"prefix": runtime_prefix,
"total_pages": len([p for p in pages if p["post_type"] == "page"]),
"total_posts": len([p for p in pages if p["post_type"] == "post"]),
}
(out_dir / "site-info.json").write_text(json.dumps(site_info, indent=2, ensure_ascii=False))
print(f"Wrote site-info.json")
print(f"\nDone. Output in: {out_dir}")
print(f" pages.json : {len(pages)} entries")
print(f" design-system.json: {len(design)} keys")
print(f" site-info.json : done")
if __name__ == "__main__":
main()
@@ -0,0 +1,271 @@
#!/usr/bin/env python3
"""Extract content from Divi 5 block markup in pages.json.
Reads .planning/data/pages.json (produced by analyze_db.py) and for each page
parses the `content_raw` Divi 5 block structure into a clean per-page JSON
under .planning/data/content/{slug}.json.
Usage:
python3 extract_divi5.py <pages_json> <output_dir>
pages_json : path to .planning/data/pages.json
output_dir : directory to write {slug}.json files (created if missing)
"""
from __future__ import annotations
import json
import re
import sys
from pathlib import Path
from html.parser import HTMLParser
# ---------------------------------------------------------------------------
# HTML inner-text extractor
# ---------------------------------------------------------------------------
class _TextExtractor(HTMLParser):
def __init__(self):
super().__init__()
self.parts: list[str] = []
def handle_data(self, data: str):
self.parts.append(data)
def get_text(self) -> str:
return " ".join(self.parts).strip()
def _text(html: str) -> str:
p = _TextExtractor()
p.feed(html)
return p.get_text()
# ---------------------------------------------------------------------------
# Divi block parsing
# ---------------------------------------------------------------------------
# Matches opening block comment: <!-- wp:divi/MODULE {JSON} -->
_BLOCK_OPEN = re.compile(r"<!--\s*wp:(divi/[a-z0-9_-]+)\s*(.*?)--?>", re.DOTALL)
# Matches closing block comment: <!-- /wp:divi/MODULE -->
_BLOCK_CLOSE = re.compile(r"<!--\s*/wp:(divi/[a-z0-9_-]+)\s*-->")
# Strip et_pb_* class tokens and data-et-* attributes
_ET_CLASS = re.compile(r"\b(et_pb_[a-z0-9_-]+|divi-[a-z0-9_-]+-[a-z0-9_-]+|d5_[a-z0-9_-]+)\b", re.IGNORECASE)
_ET_ATTR = re.compile(r'\s+data-(?:et|builder|module-id|module-class|d5)-[a-z0-9_-]+\s*=\s*"[^"]*"', re.IGNORECASE)
_EMPTY_CL = re.compile(r'\s+class="\s*"')
def _clean(html: str) -> str:
"""Strip Divi noise from an HTML fragment."""
out = _BLOCK_OPEN.sub("", html)
out = _BLOCK_CLOSE.sub("", out)
out = _ET_ATTR.sub("", out)
out = _ET_CLASS.sub("", out)
out = _EMPTY_CL.sub("", out)
out = re.sub(r"\n{3,}", "\n\n", out)
return out.strip()
def _parse_attrs(raw_json: str) -> dict:
"""Parse the JSON attrs blob from a block comment (may be empty)."""
raw_json = raw_json.strip()
if not raw_json:
return {}
try:
return json.loads(raw_json)
except Exception:
return {}
def _extract_inner(content: str, block_type: str) -> str:
"""Return the raw inner HTML of the first matching block."""
open_pat = re.compile(rf"<!--\s*wp:{re.escape(block_type)}[^>]*-->", re.DOTALL)
close_pat = re.compile(rf"<!--\s*/wp:{re.escape(block_type)}\s*-->")
m = open_pat.search(content)
if not m:
return ""
start = m.end()
m2 = close_pat.search(content, start)
end = m2.start() if m2 else len(content)
return content[start:end]
def _bg_color(attrs: dict) -> str:
"""Extract background colour from Divi 5 attrs dict."""
bg = attrs.get("backgroundColor", {})
if isinstance(bg, dict):
return bg.get("value", bg.get("color", ""))
return str(bg) if bg else ""
def _section_type(bg: str) -> str:
"""Classify section by background colour."""
dark_colors = {"#0f5f53", "#1a3a34", "#0d4d42"}
brand_colors = {"#1a8a7a", "#20a090"}
light_colors = {"#f5f5f5", "#fafafa", "#f0f0f0", "#efefef"}
bg_lower = bg.lower().strip()
if bg_lower in dark_colors:
return "dark"
if bg_lower in brand_colors:
return "brand"
if bg_lower in light_colors:
return "light"
if bg_lower in ("#ffffff", "#fff", ""):
return "white"
return "custom"
# ---------------------------------------------------------------------------
# Section/module extraction
# ---------------------------------------------------------------------------
def _extract_modules(section_html: str) -> list[dict]:
"""Walk block comments inside a section and extract module data."""
modules: list[dict] = []
pos = 0
content = section_html
for m in _BLOCK_OPEN.finditer(content):
block_type = m.group(1) # e.g. "divi/text"
attrs = _parse_attrs(m.group(2))
inner_start = m.end()
# Find matching close tag
close_pat = re.compile(rf"<!--\s*/wp:{re.escape(block_type)}\s*-->")
close_m = close_pat.search(content, inner_start)
inner_html = content[inner_start : close_m.start() if close_m else len(content)]
clean_inner = _clean(inner_html)
module_type = block_type.split("/")[-1] # "text", "button", "image", etc.
mod: dict = {"module": module_type}
if module_type == "text":
mod["html"] = clean_inner
mod["text"] = _text(clean_inner)
elif module_type in ("button", "cta"):
mod["text"] = attrs.get("buttonText", _text(clean_inner))
mod["url"] = attrs.get("buttonUrl", attrs.get("url", "#"))
elif module_type == "image":
src = attrs.get("src", attrs.get("url", ""))
mod["src"] = src
mod["alt"] = attrs.get("altText", attrs.get("alt", ""))
mod["caption"] = attrs.get("caption", "")
elif module_type == "blurb":
mod["title"] = attrs.get("title", "")
mod["icon"] = attrs.get("iconName", "")
mod["html"] = clean_inner
mod["text"] = _text(clean_inner)
elif module_type == "testimonial":
mod["quote"] = attrs.get("content", _text(clean_inner))
mod["author"] = attrs.get("authorName", "")
mod["company"] = attrs.get("authorJobTitle", "")
elif module_type == "video":
mod["src"] = attrs.get("src", "")
mod["poster"] = attrs.get("poster", attrs.get("image", ""))
elif module_type in ("accordion", "toggle"):
items = re.findall(r"<dt[^>]*>(.*?)</dt>\s*<dd[^>]*>(.*?)</dd>", clean_inner, re.DOTALL)
mod["items"] = [{"q": q.strip(), "a": a.strip()} for q, a in items]
elif module_type == "contact_form":
mod["form_id"] = attrs.get("formId", "")
mod["note"] = "REPLACE with AM vanilla form — see 08-forms.md"
else:
mod["html"] = clean_inner
mod["attrs"] = attrs
modules.append(mod)
return modules
def parse_page_content(content_raw: str) -> list[dict]:
"""Parse Divi 5 block content into a list of section dicts."""
sections: list[dict] = []
section_pat = re.compile(r"<!--\s*wp:divi/section(.*?)-->", re.DOTALL)
section_close = re.compile(r"<!--\s*/wp:divi/section\s*-->")
for sm in section_pat.finditer(content_raw):
attrs = _parse_attrs(sm.group(1).strip())
start = sm.end()
close_m = section_close.search(content_raw, start)
sec_html = content_raw[start : close_m.start() if close_m else len(content_raw)]
bg = _bg_color(attrs)
sec_type = _section_type(bg)
modules = _extract_modules(sec_html)
# Determine semantic role from first module
role = "content"
if modules and modules[0]["module"] in ("fullwidth_header", "text"):
first_html = modules[0].get("html", "")
if "<h1" in first_html:
role = "hero"
sections.append({
"role": role,
"section_type": sec_type,
"background_color": bg,
"attrs": attrs,
"modules": modules,
})
return sections
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
def main():
if len(sys.argv) < 3:
print(f"Usage: {sys.argv[0]} <pages_json> <output_dir>")
sys.exit(1)
pages_path = Path(sys.argv[1])
out_dir = Path(sys.argv[2])
out_dir.mkdir(parents=True, exist_ok=True)
pages = json.loads(pages_path.read_text(encoding="utf-8"))
print(f"Processing {len(pages)} pages...")
for page in pages:
slug = page.get("slug") or f"page-{page['id']}"
content = page.get("content_raw", "")
sections = parse_page_content(content) if content.strip() else []
output = {
"id": page["id"],
"slug": slug,
"title": page["title"],
"post_type": page["post_type"],
"seo_title": page.get("seo_title", ""),
"seo_description": page.get("seo_description", ""),
"seo_keywords": page.get("seo_keywords", ""),
"acf": page.get("acf", {}),
"date": page.get("date", ""),
"modified": page.get("modified", ""),
"sections": sections,
"section_count": len(sections),
}
out_file = out_dir / f"{slug}.json"
out_file.write_text(json.dumps(output, indent=2, ensure_ascii=False))
print(f" {slug}.json ({len(sections)} sections)")
print(f"\nDone. {len(pages)} content files in {out_dir}")
if __name__ == "__main__":
main()
@@ -0,0 +1,99 @@
#!/usr/bin/env python3
"""
extract_nav.py — Extract WordPress navigation menus from database.sql dump.
Outputs nav.json: [{label, href, display_order, is_cta}]
Usage: python3 extract_nav.py <wpress-extract-dir> <output-data-dir>
"""
import sys, re, json, os
CTA_KEYWORDS = {'book', 'get started', 'contact', 'sign up', 'register', 'join', 'buy', 'shop'}
def extract_nav(extract_dir: str, data_dir: str):
sql_path = os.path.join(extract_dir, 'database.sql')
if not os.path.exists(sql_path):
print(f"ERROR: {sql_path} not found", file=sys.stderr)
sys.exit(1)
with open(sql_path, encoding='utf-8', errors='replace') as f:
sql = f.read()
# Detect table prefix
prefix_match = re.search(r"INSERT INTO `(\w+)options`", sql)
prefix = prefix_match.group(1) if prefix_match else 'wp_'
# Find nav menu items: post_type = 'nav_menu_item'
# Extract INSERT rows from wp_posts
posts_pattern = re.compile(
r"INSERT INTO `%sposts`[^;]+?;" % re.escape(prefix),
re.DOTALL | re.IGNORECASE
)
postmeta_pattern = re.compile(
r"INSERT INTO `%spostmeta`[^;]+?;" % re.escape(prefix),
re.DOTALL | re.IGNORECASE
)
nav_posts = {}
for m in posts_pattern.finditer(sql):
rows = re.findall(r"\((\d+),[^,]*,'[^']*','[^']*','([^']*)'[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,'([^']*)'[^,]*,[^,]*,\d+,'nav_menu_item'", m.group())
for post_id, post_title, post_status in rows:
if post_status == 'publish':
nav_posts[post_id] = {'label': post_title, 'href': '/', 'menu_order': 0}
if not nav_posts:
# Fallback: simpler pattern
for m in posts_pattern.finditer(sql):
block = m.group()
ids = re.findall(r"\((\d+),", block)
titles = re.findall(r"'([^']{1,60})'", block)
for i, post_id in enumerate(ids):
if i < len(titles) and titles[i]:
nav_posts[post_id] = {'label': titles[i], 'href': '/', 'menu_order': i}
# Extract menu item URLs from postmeta (_menu_item_url or _menu_item_object_id)
for m in postmeta_pattern.finditer(sql):
block = m.group()
# _menu_item_url
url_matches = re.findall(r"\((\d+),\s*\d+,\s*'_menu_item_url',\s*'([^']*)'\)", block)
for post_id, url in url_matches:
if post_id in nav_posts and url:
nav_posts[post_id]['href'] = url
# _menu_item_menu_order
order_matches = re.findall(r"\((\d+),\s*\d+,\s*'_menu_item_menu_order',\s*'(\d+)'\)", block)
for post_id, order in order_matches:
if post_id in nav_posts:
nav_posts[post_id]['menu_order'] = int(order)
# Clean up hrefs: make relative if same domain
items = []
for idx, (post_id, item) in enumerate(sorted(nav_posts.items(), key=lambda x: x[1].get('menu_order', 0))):
label = item['label'].strip()
href = item['href'].strip()
if not label:
continue
# Make relative
href = re.sub(r'https?://[^/]+', '', href) or '/'
if not href.startswith('/'):
href = '/' + href
is_cta = 1 if any(kw in label.lower() for kw in CTA_KEYWORDS) else 0
items.append({
'label': label,
'href': href,
'display_order': idx + 1,
'is_cta': is_cta
})
os.makedirs(data_dir, exist_ok=True)
out_path = os.path.join(data_dir, 'nav.json')
with open(out_path, 'w', encoding='utf-8') as f:
json.dump(items, f, indent=2, ensure_ascii=False)
print(f"nav.json: {len(items)} items → {out_path}")
for item in items:
print(f" {'[CTA]' if item['is_cta'] else ' '} {item['label']}{item['href']}")
if __name__ == '__main__':
if len(sys.argv) != 3:
print("Usage: python3 extract_nav.py <wpress-extract-dir> <output-data-dir>")
sys.exit(1)
extract_nav(sys.argv[1], sys.argv[2])
@@ -0,0 +1,110 @@
#!/usr/bin/env python3
"""Extract All-in-One WP Migration .wpress archive.
Usage:
python3 extract_wpress.py <path/to/file.wpress> <output/directory>
The .wpress format is a sequential binary archive with 4377-byte headers:
255 bytes filename (null-padded)
14 bytes file size in bytes (ASCII digits, null-padded)
12 bytes mtime unix timestamp (ASCII digits, null-padded)
4096 bytes relative path (null-padded)
Followed immediately by the raw file bytes, then the next header.
"""
import os
import sys
import argparse
from pathlib import Path
HEADER_SIZE = 4377
NAME_LEN = 255
SIZE_LEN = 14
MTIME_LEN = 12
PATH_LEN = 4096
def _parse_int(b: bytes) -> int:
s = b.split(b"\x00", 1)[0].decode(errors="replace").strip()
return int(s) if s else 0
def _parse_str(b: bytes) -> str:
return b.split(b"\x00", 1)[0].decode(errors="replace")
def extract(wpress_path: str, out_dir: str, verbose: bool = True) -> dict:
out = Path(out_dir)
out.mkdir(parents=True, exist_ok=True)
count = 0
total_bytes = 0
skipped = 0
with open(wpress_path, "rb") as f:
while True:
header = f.read(HEADER_SIZE)
if not header or len(header) < HEADER_SIZE:
break
if header == b"\x00" * HEADER_SIZE:
break
name = _parse_str(header[0:NAME_LEN])
size = _parse_int(header[NAME_LEN : NAME_LEN + SIZE_LEN])
mtime = _parse_int(header[NAME_LEN + SIZE_LEN : NAME_LEN + SIZE_LEN + MTIME_LEN])
path = _parse_str(header[NAME_LEN + SIZE_LEN + MTIME_LEN : NAME_LEN + SIZE_LEN + MTIME_LEN + PATH_LEN])
# Sanitise path traversal
path = path.lstrip("/").lstrip("\\").lstrip(".")
path = path.lstrip("/")
dest_dir = out / path if path else out
dest_dir.mkdir(parents=True, exist_ok=True)
dest_file = dest_dir / name
if not name:
skipped += 1
f.seek(size, 1)
continue
with open(dest_file, "wb") as o:
remaining = size
while remaining > 0:
chunk = f.read(min(65536, remaining))
if not chunk:
break
o.write(chunk)
remaining -= len(chunk)
try:
if mtime > 0:
os.utime(dest_file, (mtime, mtime))
except Exception:
pass
count += 1
total_bytes += size
if verbose and count % 200 == 0:
print(f" [{count} files | {total_bytes / 1024 / 1024:.1f} MB extracted]", flush=True)
result = {
"files": count,
"bytes": total_bytes,
"mb": round(total_bytes / 1024 / 1024, 1),
"skipped": skipped,
"out_dir": str(out),
}
print(f"DONE: {count} files | {result['mb']} MB -> {out_dir} (skipped {skipped})")
return result
def main():
p = argparse.ArgumentParser(description="Extract .wpress archive")
p.add_argument("wpress", help="Path to .wpress file")
p.add_argument("outdir", help="Destination directory")
p.add_argument("-q", "--quiet", action="store_true", help="Suppress progress output")
args = p.parse_args()
extract(args.wpress, args.outdir, verbose=not args.quiet)
if __name__ == "__main__":
main()
@@ -0,0 +1,149 @@
#!/usr/bin/env python3
"""
migrate.py — AM Stack A migration launcher.
Points at a .wpress file and runs all extraction phases automatically.
Phases 7+ require human/agent review of staged seed_databases.py.
Usage:
python3 migrate.py --wpress /path/to/backup.wpress --domain example.com [--project /path/to/project]
Output:
Runs phases 0-6, then prints agent breadcrumbs for phases 7-11.
"""
import argparse, os, sys, subprocess, json
SOPS = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
SCRIPTS = os.path.join(SOPS, 'scripts')
def run(cmd: list, label: str) -> bool:
print(f"\n[{label}] Running: {' '.join(cmd)}")
result = subprocess.run(cmd, capture_output=False)
if result.returncode != 0:
print(f"[{label}] FAILED (exit {result.returncode})")
return False
print(f"[{label}] OK")
return True
def phase_header(n: int, title: str):
print(f"\n{'='*60}")
print(f" Phase {n}{title}")
print(f"{'='*60}")
def main():
parser = argparse.ArgumentParser(description='AM Stack A migration launcher')
parser.add_argument('--wpress', required=True, help='Path to .wpress backup file')
parser.add_argument('--domain', required=True, help='Target domain (e.g. example.com)')
parser.add_argument('--project', help='Project directory (default: ~/arisingmedia-websites/{domain})')
args = parser.parse_args()
wpress = os.path.abspath(args.wpress)
domain = args.domain
project = args.project or os.path.expanduser(f'~/arisingmedia-websites/{domain}')
extract_dir = os.path.join(project, '.planning', 'wpress-extract')
data_dir = os.path.join(project, '.planning', 'data')
content_dir = os.path.join(data_dir, 'content')
if not os.path.exists(wpress):
print(f"ERROR: .wpress file not found: {wpress}")
sys.exit(1)
print(f"\nAM Stack A Migration Pipeline")
print(f" Domain: {domain}")
print(f" Project: {project}")
print(f" Archive: {wpress}")
# Phase 0 — Setup
phase_header(0, 'Setup')
for d in [extract_dir, data_dir, content_dir,
os.path.join(project, 'assets', 'images'),
os.path.join(project, 'build'),
os.path.join(project, 'src', 'api', 'data'),
os.path.join(project, 'src', 'api', 'templates'),
os.path.join(project, 'src', 'api', 'components')]:
os.makedirs(d, exist_ok=True)
print(f" mkdir {d}")
# Phase 1 — Extract
phase_header(1, 'Extract .wpress archive')
if not run(['python3', os.path.join(SCRIPTS, 'extract_wpress.py'), wpress, extract_dir], 'Phase 1'):
sys.exit(1)
# Phase 2 — DB Analysis
phase_header(2, 'Database analysis')
if not run(['python3', os.path.join(SCRIPTS, 'analyze_db.py'), extract_dir, data_dir], 'Phase 2'):
sys.exit(1)
# Detect Divi version
site_info_path = os.path.join(data_dir, 'site-info.json')
divi_version = 5
if os.path.exists(site_info_path):
with open(site_info_path) as f:
info = json.load(f)
divi_version = info.get('divi_version', 5)
print(f" Divi version detected: {divi_version}")
# Phase 3 — Nav extraction
phase_header(3, 'Extract navigation menus')
run(['python3', os.path.join(SCRIPTS, 'extract_nav.py'), extract_dir, data_dir], 'Phase 3 (nav)')
# Phase 3 — Content extraction
extract_script = f'extract_divi{divi_version}.py'
pages_json = os.path.join(data_dir, 'pages.json')
if not run(['python3', os.path.join(SCRIPTS, extract_script), pages_json, content_dir], f'Phase 3 (divi{divi_version})'):
print(f" WARNING: content extraction had errors — review {content_dir}")
# Phase 5 — Media
phase_header(5, 'Extract and convert media')
run(['python3', os.path.join(SCRIPTS, 'extract_media.py'), extract_dir, data_dir,
os.path.join(project, 'assets', 'images')], 'Phase 5')
# Phase 6 — Stage seed_databases.py
phase_header(6, 'Stage seed_databases.py skeleton')
seed_path = os.path.join(project, 'build', 'seed_databases.py')
# Check if stage_seed.py exists
stage_script = os.path.join(SCRIPTS, 'stage_seed.py')
if os.path.exists(stage_script):
run(['python3', stage_script, data_dir, seed_path, '--domain', domain], 'Phase 6')
else:
print(f" WARNING: stage_seed.py not found — seed_databases.py must be written manually")
print(f" Reference: /home/sirdrez/arisingmedia-websites/vibrantyou.yoga/build/seed_databases.py")
# Print agent breadcrumbs for remaining phases
print(f"\n{'='*60}")
print(" EXTRACTION COMPLETE — Manual/Agent phases follow")
print(f"{'='*60}")
print(f"""
Phases 0-6 complete. Staged content is at:
{data_dir}/content/ ← extracted page sections (JSON)
{data_dir}/nav.json ← navigation items
{data_dir}/media-manifest.json ← image URL mappings
{seed_path} ← seed_databases.py skeleton
Next steps (see 10-agent-breadcrumbs.md for full detail):
Phase 7 — REVIEW seed_databases.py
Open: {seed_path}
For each page: verify sections_json has correct section types
Replace em-dashes. Remove Divi shortcode residue. Review nav items.
Phase 8 — RUN seed_databases.py
cd {project} && python3 build/seed_databases.py
Verify: output shows all counts > 0
Phase 9 — SCAFFOLD PHP templates
Copy from reference: vibrantyou.yoga/src/api/
Update brand name and colors in _header.php + _footer.php
Phase 10 — BUILD
cd {project} && docker compose build --no-cache && docker compose up -d
Verify: curl -I http://localhost:PORT/
Phase 11 — QA
bash {SOPS}/../tools/verify-protection.sh http://localhost:PORT
Lighthouse in Firefox
Reference: {SOPS}/wp-divi-pipeline-to-am-stack/10-agent-breadcrumbs.md
""")
if __name__ == '__main__':
main()
@@ -0,0 +1,175 @@
#!/usr/bin/env bash
# run_pipeline.sh — AM WP+Divi to HTML pipeline master script
# Usage: bash run_pipeline.sh <domain>
# Example: bash run_pipeline.sh vibrantyou.yoga
set -euo pipefail
DOMAIN="${1:-}"
if [ -z "$DOMAIN" ]; then
echo "Usage: $0 <domain>"
echo " Example: $0 vibrantyou.yoga"
exit 1
fi
PROJECT="/home/sirdrez/arisingmedia-websites/$DOMAIN"
SOPS="/home/sirdrez/arisingmedia-websites/.am-webdesign-sops"
SCRIPTS="$SOPS/wp-divi-pipeline/scripts"
WPRESS=$(ls "$PROJECT/.planning/"*.wpress 2>/dev/null | head -1)
if [ -z "$WPRESS" ]; then
echo "ERROR: No .wpress file found in $PROJECT/.planning/"
exit 1
fi
echo "================================================"
echo " AM WP+Divi Pipeline"
echo " Domain: $DOMAIN"
echo " Archive: $(basename $WPRESS)"
echo "================================================"
echo ""
# ---------------------------------------------------------------------------
# Phase 0 — Directory structure
# ---------------------------------------------------------------------------
echo "[Phase 0] Creating directory structure..."
mkdir -p "$PROJECT"/{src/{about,services,contact,blog,classes,components,assets/{css,js,images,svg,fonts}},build,infra,api}
mkdir -p "$PROJECT/.planning"/{data/{content},scripts,wpress-extract}
echo " OK: directories created"
echo ""
# ---------------------------------------------------------------------------
# Phase 1 — Extract .wpress archive
# ---------------------------------------------------------------------------
EXTRACT_DIR="$PROJECT/.planning/wpress-extract"
if [ -f "$EXTRACT_DIR/database.sql" ]; then
echo "[Phase 1] Archive already extracted — skipping"
echo " Found: $EXTRACT_DIR/database.sql"
else
echo "[Phase 1] Extracting archive (this may take a few minutes)..."
python3 "$SCRIPTS/extract_wpress.py" "$WPRESS" "$EXTRACT_DIR"
echo " OK: extraction complete"
fi
echo ""
# ---------------------------------------------------------------------------
# Phase 2 — Database analysis
# ---------------------------------------------------------------------------
DATA_DIR="$PROJECT/.planning/data"
echo "[Phase 2] Analyzing database..."
python3 "$SCRIPTS/analyze_db.py" "$EXTRACT_DIR" "$DATA_DIR"
PAGE_COUNT=$(python3 -c "import json; print(len(json.load(open('$DATA_DIR/pages.json'))))" 2>/dev/null || echo 0)
echo " OK: $PAGE_COUNT pages extracted"
echo ""
# ---------------------------------------------------------------------------
# Phase 3 — Content extraction (Divi 5)
# ---------------------------------------------------------------------------
echo "[Phase 3] Extracting Divi 5 content..."
python3 "$SCRIPTS/extract_divi5.py" \
"$DATA_DIR/pages.json" \
"$DATA_DIR/content/"
echo " OK: content JSON files written"
echo ""
# ---------------------------------------------------------------------------
# Phase 4 — Design system (manual step)
# ---------------------------------------------------------------------------
echo "[Phase 4] Design system (MANUAL STEP REQUIRED)"
echo " Read: $DATA_DIR/design-system.json"
echo " Write: $PROJECT/src/assets/css/main.css"
echo " Ref: $SOPS/wp-divi-pipeline/04-design-system-extraction.md"
echo ""
# ---------------------------------------------------------------------------
# Phase 5 — Media migration
# ---------------------------------------------------------------------------
UPLOADS_DIR="$EXTRACT_DIR/uploads"
IMAGES_DIR="$PROJECT/src/assets/images"
if [ -d "$UPLOADS_DIR" ]; then
echo "[Phase 5] Migrating media..."
# Catalog originals (skip WP-generated size variants)
find "$UPLOADS_DIR" -type f \( -name "*.jpg" -o -name "*.jpeg" -o -name "*.png" -o -name "*.gif" -o -name "*.webp" \) \
| grep -v -E "\-[0-9]+x[0-9]+\.(jpg|jpeg|png|webp|gif)$" \
| sort > "$DATA_DIR/media-originals.txt"
MEDIA_COUNT=$(wc -l < "$DATA_DIR/media-originals.txt")
echo " Found: $MEDIA_COUNT original images"
# Copy to src/assets/images/
while IFS= read -r src_img; do
fname=$(basename "$src_img")
cp "$src_img" "$IMAGES_DIR/$fname"
done < "$DATA_DIR/media-originals.txt"
# Convert to WebP if cwebp available
if command -v cwebp &>/dev/null; then
echo " Converting to WebP..."
cd "$IMAGES_DIR"
for img in *.jpg *.jpeg *.png; do
[ -f "$img" ] || continue
base="${img%.*}"
cwebp -q 82 "$img" -o "${base}.webp" 2>/dev/null && rm "$img"
done
WEBP_COUNT=$(ls *.webp 2>/dev/null | wc -l)
echo " WebP files: $WEBP_COUNT"
cd "$PROJECT"
else
echo " WARN: cwebp not found — images copied as-is (convert manually)"
fi
echo " OK: media migrated to $IMAGES_DIR"
else
echo "[Phase 5] No uploads/ directory found — skipping media migration"
fi
echo ""
# ---------------------------------------------------------------------------
# Phase 6 — HTML build (manual step)
# ---------------------------------------------------------------------------
echo "[Phase 6] HTML Build (MANUAL STEP REQUIRED)"
echo " Ref: $SOPS/wp-divi-pipeline/05-content-migration.md"
echo " Build order:"
echo " 1. src/assets/css/main.css"
echo " 2. src/assets/css/components.css"
echo " 3. src/components/header.html"
echo " 4. src/components/footer.html"
echo " 5. src/assets/js/components.js"
echo " 6. src/assets/js/main.js"
echo " 7. src/index.html (home — design system anchor)"
echo " 8. Remaining pages"
echo ""
# ---------------------------------------------------------------------------
# Phase 7 — SEO audit
# ---------------------------------------------------------------------------
echo "[Phase 7] SEO audit (run after HTML build):"
echo " grep -rL '<title>' $PROJECT/src --include='*.html' | grep -v _template"
echo " grep -rL 'canonical' $PROJECT/src --include='*.html' | grep -v _template"
echo " grep -rL 'ld+json' $PROJECT/src --include='*.html' | grep -v _template"
echo " grep -r '{{' $PROJECT/src --include='*.html'"
echo ""
# ---------------------------------------------------------------------------
# Phase 8 — Infra
# ---------------------------------------------------------------------------
echo "[Phase 8] Infra setup:"
echo " Copy Dockerfile + docker-compose.yml from vibrantyoucoaching.com"
echo " Update server_name in infra/nginx.conf to: $DOMAIN"
echo " Run: docker compose up -d --build"
echo ""
# ---------------------------------------------------------------------------
# Phase 9 — Protection check
# ---------------------------------------------------------------------------
echo "[Phase 9] After deploy, run:"
echo " bash $SOPS/tools/verify-protection.sh https://$DOMAIN"
echo ""
echo "================================================"
echo " Pipeline setup complete."
echo " Phases 0-3 + 5 executed automatically."
echo " Phases 4, 6, 7, 8, 9 require manual steps."
echo " See $SOPS/wp-divi-pipeline/ for all SOPs."
echo "================================================"
@@ -0,0 +1,574 @@
#!/usr/bin/env python3
"""
stage_seed.py — Phase 6 of WP/Divi → Stack A migration pipeline.
Reads extracted JSON from prior pipeline run and generates a seed_databases.py
skeleton for the target project. Human/agent reviews [FILL] markers and fills
gaps before running the seeder.
Usage:
python3 stage_seed.py <data_dir> <seed_path> --domain <domain> [--force]
Example:
python3 stage_seed.py /path/to/.planning/data build/seed_databases.py --domain example.com
"""
import argparse
import json
import os
import re
from datetime import datetime
def slugify(text):
"""Convert text to URL-safe slug."""
return re.sub(r'[^a-z0-9]+', '-', text.lower()).strip('-')
def infer_template(slug):
"""Infer template type from page slug."""
slug_lower = slug.lower()
if slug_lower == 'home':
return 'home'
elif slug_lower in ('classes', 'class'):
return 'classes'
elif slug_lower == 'schedule':
return 'schedule'
elif slug_lower == 'glossary':
return 'glossary'
elif slug_lower in ('blog', 'posts', 'articles'):
return 'blog'
else:
return 'static'
def load_json_file(path):
"""Load JSON file, return empty dict/list if not found."""
if not os.path.exists(path):
return None
try:
with open(path, 'r') as f:
return json.load(f)
except Exception as e:
print(f"Warning: Failed to load {path}: {e}")
return None
def generate_seed_script(data_dir, domain, design_system, pages, glossary, nav):
"""Generate the seed_databases.py script content."""
now = datetime.now().isoformat()
# Build pages_data list in outer scope
pages_list = []
for page in pages:
if page.get('status') != 'publish' or page.get('post_type') != 'page':
continue
slug = page.get('slug', '')
title = page.get('title', '[FILL] Title needed')
meta_desc = page.get('seo_description', '')
if not meta_desc:
meta_desc = f"[FILL] Meta description for {slug}"
canonical = f"https://{domain}/{slug}/" if slug != 'home' else f"https://{domain}/"
date_str = page.get('date', datetime.now().isoformat())
# Infer template
template_map = {
'home': 'home',
'classes': 'classes',
'schedule': 'schedule',
'glossary': 'glossary',
'blog': 'blog',
}
template = template_map.get(slug, 'static')
pages_list.append({
'slug': slug,
'template': template,
'title': title,
'meta_description': meta_desc,
'canonical_url': canonical,
'hero_h1': f"[FILL] {title}",
'sections_json': '[]',
'updated_at': date_str
})
# Build pages_data JSON string
pages_json_str = json.dumps(pages_list, indent=8)
script = f'''#!/usr/bin/env python3
"""
seed_databases.py — generated by stage_seed.py on {now}
Source: {data_dir}
Domain: {domain}
EDIT THIS FILE then run: python3 build/seed_databases.py
Content marked [FILL] needs human/agent review before seeding.
"""
import sqlite3
import json
import os
from datetime import datetime
DB_DIR = os.path.join(os.path.dirname(__file__), '..', 'src', 'api', 'data')
os.makedirs(DB_DIR, exist_ok=True)
def slugify(text):
"""Convert text to URL-safe slug."""
import re
return re.sub(r'[^a-z0-9]+', '-', text.lower()).strip('-')
def seed_pages():
"""Create pages.sqlite and populate with published pages."""
db_path = os.path.join(DB_DIR, 'pages.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS pages (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
template TEXT NOT NULL,
title TEXT NOT NULL,
meta_description TEXT,
canonical_url TEXT,
og_image TEXT,
schema_json TEXT,
hero_eyebrow TEXT,
hero_h1 TEXT,
hero_lead TEXT,
sections_json TEXT,
updated_at TEXT
)
""")
pages_data = {pages_json_str}
for page in pages_data:
c.execute("""
INSERT OR REPLACE INTO pages
(slug, template, title, meta_description, canonical_url, hero_h1, sections_json, updated_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?)
""", (
page['slug'],
page['template'],
page['title'],
page['meta_description'],
page['canonical_url'],
page['hero_h1'],
page['sections_json'],
page['updated_at']
))
conn.commit()
conn.close()
print(f"✓ pages.sqlite created with {{len(pages_data)}} pages")
def seed_nav():
"""Create nav.sqlite and populate navigation items."""
db_path = os.path.join(DB_DIR, 'nav.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS nav_items (
id INTEGER PRIMARY KEY,
label TEXT NOT NULL,
href TEXT NOT NULL,
display_order INTEGER DEFAULT 0,
is_cta INTEGER DEFAULT 0
)
""")
'''
if nav:
script += f'''
nav_items = {json.dumps(nav, indent=8)}
for item in nav_items:
c.execute("""
INSERT INTO nav_items (label, href, display_order, is_cta)
VALUES (?, ?, ?, ?)
""", (item['label'], item['href'], item.get('display_order', 0), item.get('is_cta', 0)))
conn.commit()
conn.close()
print(f"✓ nav.sqlite created with {{len(nav_items)}} nav items")
'''
else:
script += '''
# [FILL] nav.json not found — add navigation items manually
# Example:
# nav_items = [
# {"label": "Home", "href": "/", "display_order": 1, "is_cta": 0},
# {"label": "Classes", "href": "/classes", "display_order": 2, "is_cta": 0},
# {"label": "Schedule", "href": "/schedule", "display_order": 3, "is_cta": 0},
# {"label": "Get Started", "href": "/contact", "display_order": 4, "is_cta": 1},
# ]
# Then uncomment and insert rows
conn.commit()
conn.close()
print("✓ nav.sqlite created (empty — [FILL] navigation items)")
'''
# Seed glossary
if glossary:
script += f'''
def seed_glossary():
"""Create glossary.sqlite and populate terms."""
db_path = os.path.join(DB_DIR, 'glossary.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS terms (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
term TEXT NOT NULL,
pronunciation TEXT,
definition TEXT NOT NULL,
category TEXT NOT NULL,
level TEXT NOT NULL,
display_order INTEGER DEFAULT 0
)
""")
glossary_items = {json.dumps(glossary, indent=8)}
for idx, item in enumerate(glossary_items):
fields = item.get('fields', {{}})
term = fields.get('sanskrit_name', '[FILL] Term needed')
slug = slugify(term)
pronunciation = fields.get('pronunciation', '')
definition = fields.get('definition', '[FILL] Definition needed')
category = fields.get('category', 'yoga')
level = fields.get('level', 'beginner')
c.execute("""
INSERT OR REPLACE INTO terms
(slug, term, pronunciation, definition, category, level, display_order)
VALUES (?, ?, ?, ?, ?, ?, ?)
""", (slug, term, pronunciation, definition, category, level, idx))
conn.commit()
conn.close()
print(f"✓ glossary.sqlite created with {{len(glossary_items)}} terms")
'''
else:
script += '''
def seed_glossary():
"""Create glossary.sqlite (empty — no glossary.json found)."""
db_path = os.path.join(DB_DIR, 'glossary.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS terms (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
term TEXT NOT NULL,
pronunciation TEXT,
definition TEXT NOT NULL,
category TEXT NOT NULL,
level TEXT NOT NULL,
display_order INTEGER DEFAULT 0
)
""")
conn.commit()
conn.close()
print("✓ glossary.sqlite created (empty)")
'''
script += '''
def seed_testimonials():
"""Create testimonials.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'testimonials.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS testimonials (
id INTEGER PRIMARY KEY,
quote TEXT NOT NULL,
author_name TEXT NOT NULL,
author_role TEXT,
is_featured INTEGER DEFAULT 0
)
""")
# [FILL] Add testimonials extracted from Divi testimonial modules or client-provided
# rows = [
# {"quote": "...", "author_name": "...", "author_role": "...", "is_featured": 0},
# ]
conn.commit()
conn.close()
print("✓ testimonials.sqlite created (empty — [FILL] add testimonials)")
def seed_blog():
"""Create blog.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'blog.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS posts (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
title TEXT NOT NULL,
excerpt TEXT,
content TEXT,
author TEXT,
published_at TEXT,
is_featured INTEGER DEFAULT 0
)
""")
# [FILL] Add blog posts extracted from WP posts table
# rows = [
# {"slug": "...", "title": "...", "excerpt": "...", "content": "...", "author": "...", "published_at": "..."},
# ]
conn.commit()
conn.close()
print("✓ blog.sqlite created (empty — [FILL] add blog posts)")
def seed_videos():
"""Create videos.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'videos.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS videos (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
title TEXT NOT NULL,
duration TEXT,
embed_url TEXT,
thumbnail TEXT,
category TEXT,
level TEXT,
is_free INTEGER DEFAULT 1
)
""")
# [FILL] Add on-demand video entries if site has video content
# rows = [
# {"slug": "...", "title": "...", "duration": "12:34", "embed_url": "...", "category": "...", "level": "..."},
# ]
conn.commit()
conn.close()
print("✓ videos.sqlite created (empty — [FILL] add videos)")
def seed_events():
"""Create events.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'events.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS events (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
title TEXT NOT NULL,
event_date TEXT,
time_cet TEXT,
format TEXT,
capacity INTEGER,
price_eur REAL,
status TEXT DEFAULT 'open'
)
""")
# [FILL] Add workshop/event entries
# rows = [
# {"slug": "...", "title": "...", "event_date": "2026-06-15", "time_cet": "10:00", "format": "online", "capacity": 20, "price_eur": 29.99},
# ]
conn.commit()
conn.close()
print("✓ events.sqlite created (empty — [FILL] add events)")
def seed_schedule():
"""Create schedule.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'schedule.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS classes (
id INTEGER PRIMARY KEY,
day_of_week TEXT NOT NULL,
day_order INTEGER NOT NULL,
time_cet TEXT NOT NULL,
class_name TEXT NOT NULL,
level TEXT NOT NULL,
format TEXT NOT NULL,
duration_min INTEGER NOT NULL,
badge_variant TEXT DEFAULT ''
)
""")
# [FILL] Add recurring class schedule rows
# rows = [
# {"day_of_week": "Monday", "day_order": 1, "time_cet": "10:00", "class_name": "Hatha Yoga", "level": "beginner", "format": "online", "duration_min": 60, "badge_variant": "featured"},
# ]
conn.commit()
conn.close()
print("✓ schedule.sqlite created (empty — [FILL] add class schedule)")
def seed_instructors():
"""Create instructors.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'instructors.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS instructors (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
title TEXT,
bio TEXT,
certifications TEXT,
image TEXT,
is_primary INTEGER DEFAULT 0
)
""")
# [FILL] Add instructor rows
# rows = [
# {"slug": "alice-johnson", "name": "Alice Johnson", "title": "Lead Instructor", "bio": "...", "certifications": "...", "is_primary": 1},
# ]
conn.commit()
conn.close()
print("✓ instructors.sqlite created (empty — [FILL] add instructors)")
def seed_packages():
"""Create packages.sqlite (empty stub)."""
db_path = os.path.join(DB_DIR, 'packages.sqlite')
conn = sqlite3.connect(db_path)
c = conn.cursor()
c.execute("""
CREATE TABLE IF NOT EXISTS packages (
id INTEGER PRIMARY KEY,
slug TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
price_eur REAL,
sessions_count INTEGER,
validity_days INTEGER,
is_featured INTEGER DEFAULT 0
)
""")
# [FILL] Add class pack/package options
# rows = [
# {"slug": "starter", "name": "Starter Pack", "price_eur": 49.99, "sessions_count": 5, "validity_days": 30, "is_featured": 0},
# {"slug": "unlimited", "name": "Unlimited Monthly", "price_eur": 99.99, "sessions_count": None, "validity_days": 30, "is_featured": 1},
# ]
conn.commit()
conn.close()
print("✓ packages.sqlite created (empty — [FILL] add packages)")
if __name__ == '__main__':
seed_pages()
seed_nav()
seed_glossary()
seed_testimonials()
seed_blog()
seed_videos()
seed_events()
seed_schedule()
seed_instructors()
seed_packages()
print("\\nSeeding complete. Review [FILL] markers before running in production.")
'''
return script
def main():
parser = argparse.ArgumentParser(
description='Generate seed_databases.py from extracted WP/Divi JSON data'
)
parser.add_argument('data_dir', help='Path to extracted data directory (.planning/data/)')
parser.add_argument('seed_path', help='Output path for seed_databases.py')
parser.add_argument('--domain', required=True, help='Domain name (e.g., example.com)')
parser.add_argument('--force', action='store_true', help='Overwrite existing seed_databases.py')
args = parser.parse_args()
# Validate inputs
if not os.path.isdir(args.data_dir):
print(f"Error: data_dir not found: {args.data_dir}")
return 1
if os.path.exists(args.seed_path) and not args.force:
print(f"Error: seed_databases.py already exists at {args.seed_path}")
print("Use --force to overwrite")
return 1
# Load required data files
pages = load_json_file(os.path.join(args.data_dir, 'pages.json'))
if not pages:
print("Error: pages.json not found or invalid")
return 1
design_system = load_json_file(os.path.join(args.data_dir, 'design-system.json'))
glossary = load_json_file(os.path.join(args.data_dir, 'glossary.json'))
nav = load_json_file(os.path.join(args.data_dir, 'nav.json'))
# Generate script
script_content = generate_seed_script(
args.data_dir,
args.domain,
design_system,
pages,
glossary,
nav
)
# Write output
os.makedirs(os.path.dirname(args.seed_path), exist_ok=True)
with open(args.seed_path, 'w') as f:
f.write(script_content)
# Make executable
os.chmod(args.seed_path, 0o755)
print(f"✓ Generated: {args.seed_path}")
print(f" Pages: {len([p for p in pages if p.get('status') == 'publish' and p.get('post_type') == 'page'])}")
print(f" Glossary terms: {len(glossary) if glossary else 0}")
print(f" Nav items: {len(nav) if nav else 0}")
print("\nNext: Review [FILL] markers, then run: python3 " + args.seed_path)
return 0
if __name__ == '__main__':
exit(main())