4.5 KiB
02 — Database Analysis
Parse the WordPress MySQL dump to inventory pages, detect Divi version, extract design settings, and build the data JSON files that drive the AM build.
Script
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/analyze_db.py \
{domain}/.planning/wpress-extract/ \
{domain}/.planning/data/
Outputs three files into .planning/data/:
pages.json— all published pages/posts with content and SEO metadesign-system.json— colors, fonts, Divi settingssite-info.json— domain, plugin list, WP version, Divi version
Divi version detection
The script auto-detects Divi version by scanning database.sql:
| Signal in SQL | Divi version |
|---|---|
wp:divi/ in post_content |
Divi 5 |
[et_pb_section in post_content |
Divi 4 |
This determines the content extraction path. Divi 4 → use extract_divi4.py.
Divi 5 → use extract_divi5.py. See 03-divi-content-extraction.md.
Key WordPress tables
| Table | Contents | Used for |
|---|---|---|
wp_posts |
All pages, posts, attachments, layouts | Page inventory, content |
wp_postmeta |
Per-post metadata | ACF fields, Rank Math SEO, Divi layout JSON |
wp_options |
Site-wide settings | Divi theme settings, colors, fonts |
wp_gf_forms |
Gravity Forms definitions | Form field schema |
wp_gf_entries |
Gravity Form submissions | Not needed for migration |
wp_rank_math_seo_meta |
Rank Math SEO per page | SEO titles, descriptions |
Reading pages.json
Each entry in pages.json:
{
"id": "42",
"post_type": "page",
"slug": "about",
"title": "About VibrantYou Yoga",
"status": "publish",
"date": "2026-03-15",
"modified": "2026-04-10",
"content_raw": "<!-- wp:divi/section ... -->...",
"excerpt": "",
"parent_id": "0",
"menu_order": "3",
"seo_title": "About VibrantYou Yoga | Mindful Movement in [City]",
"seo_description": "...",
"seo_keywords": "yoga studio, mindful movement",
"acf": {
"vyy_hero_headline": "Move With Intention",
"vyy_hero_subhead": "..."
}
}
content_raw holds the raw Divi block markup. Pass it to the extractor scripts.
acf holds Advanced Custom Fields values — often cleaner than block content.
Reading design-system.json
Contains extracted Divi theme settings. Key fields:
{
"primary_color": "#1a8a7a",
"body_font": "DM Sans",
"header_font": "DM Serif Display",
"body_font_size": "16",
"body_line_height": "1.7",
"divi_version": "5",
"wp_version": "6.9.4",
"site_url": "https://vibrantyou.yoga",
"site_name": "VibrantYou Yoga"
}
Use these values to seed the AM main.css CSS custom properties block.
Manual inspection (when script output is sparse)
Sometimes the Divi theme options are stored as PHP-serialized data. Use grep to find and eyeball the raw values:
DB=.planning/wpress-extract/database.sql
# Divi global colors
grep -o "'et_divi[^']*','[^']*'" $DB | head -30
# Site name + URL
grep -E "'(siteurl|blogname|admin_email)','[^']*'" $DB
# Rank Math SEO meta for a specific post
grep "rank_math_title\|rank_math_description" $DB | head -20
# All published page slugs
grep -o "post_name','[^']*'" $DB | grep -v "revision\|auto-draft" | sort | uniq
Gravity Forms schema (for form replacement)
Find form field definitions:
grep "INSERT INTO \`wp_gf_forms\`" .planning/wpress-extract/database.sql | \
python3 -c "
import sys, json, re
for line in sys.stdin:
m = re.search(r\"'([^']+)'\s*\)\s*;\", line)
if m:
try: print(json.dumps(json.loads(m.group(1).replace('\\\\\"','\"')), indent=2)[:2000])
except: pass
" 2>/dev/null | head -100
Field types seen in Gravity Forms: text, email, phone, textarea, select, checkbox, radio, name, address, fileupload. Map each to a plain HTML input equivalent.
Archive directory layout note
The AIOIM .wpress format extracts flat — no wp-content/ wrapper:
wpress-extract/
├── database.sql ← NOT in wp-content/
├── package.json
├── uploads/ ← NOT wp-content/uploads/
├── themes/ ← NOT wp-content/themes/
├── plugins/ ← NOT wp-content/plugins/
└── et-cache/
Scripts must reference uploads/, themes/, plugins/ directly under
wpress-extract/, not wpress-extract/wp-content/.
Next step
Once pages.json is written, proceed to 03-divi-content-extraction.md
to parse content_raw for each page into structured AM-ready HTML.