Files
arisingmedia-web-sops/wp-divi-pipeline-to-am-stack/02-database-analysis.md
T
2026-06-09 18:31:59 +02:00

4.5 KiB

02 — Database Analysis

Parse the WordPress MySQL dump to inventory pages, detect Divi version, extract design settings, and build the data JSON files that drive the AM build.

Script

python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/analyze_db.py \
  {domain}/.planning/wpress-extract/ \
  {domain}/.planning/data/

Outputs three files into .planning/data/:

  • pages.json — all published pages/posts with content and SEO meta
  • design-system.json — colors, fonts, Divi settings
  • site-info.json — domain, plugin list, WP version, Divi version

Divi version detection

The script auto-detects Divi version by scanning database.sql:

Signal in SQL Divi version
wp:divi/ in post_content Divi 5
[et_pb_section in post_content Divi 4

This determines the content extraction path. Divi 4 → use extract_divi4.py. Divi 5 → use extract_divi5.py. See 03-divi-content-extraction.md.

Key WordPress tables

Table Contents Used for
wp_posts All pages, posts, attachments, layouts Page inventory, content
wp_postmeta Per-post metadata ACF fields, Rank Math SEO, Divi layout JSON
wp_options Site-wide settings Divi theme settings, colors, fonts
wp_gf_forms Gravity Forms definitions Form field schema
wp_gf_entries Gravity Form submissions Not needed for migration
wp_rank_math_seo_meta Rank Math SEO per page SEO titles, descriptions

Reading pages.json

Each entry in pages.json:

{
  "id": "42",
  "post_type": "page",
  "slug": "about",
  "title": "About VibrantYou Yoga",
  "status": "publish",
  "date": "2026-03-15",
  "modified": "2026-04-10",
  "content_raw": "<!-- wp:divi/section ... -->...",
  "excerpt": "",
  "parent_id": "0",
  "menu_order": "3",
  "seo_title": "About VibrantYou Yoga | Mindful Movement in [City]",
  "seo_description": "...",
  "seo_keywords": "yoga studio, mindful movement",
  "acf": {
    "vyy_hero_headline": "Move With Intention",
    "vyy_hero_subhead": "..."
  }
}

content_raw holds the raw Divi block markup. Pass it to the extractor scripts. acf holds Advanced Custom Fields values — often cleaner than block content.

Reading design-system.json

Contains extracted Divi theme settings. Key fields:

{
  "primary_color": "#1a8a7a",
  "body_font": "DM Sans",
  "header_font": "DM Serif Display",
  "body_font_size": "16",
  "body_line_height": "1.7",
  "divi_version": "5",
  "wp_version": "6.9.4",
  "site_url": "https://vibrantyou.yoga",
  "site_name": "VibrantYou Yoga"
}

Use these values to seed the AM main.css CSS custom properties block.

Manual inspection (when script output is sparse)

Sometimes the Divi theme options are stored as PHP-serialized data. Use grep to find and eyeball the raw values:

DB=.planning/wpress-extract/database.sql

# Divi global colors
grep -o "'et_divi[^']*','[^']*'" $DB | head -30

# Site name + URL
grep -E "'(siteurl|blogname|admin_email)','[^']*'" $DB

# Rank Math SEO meta for a specific post
grep "rank_math_title\|rank_math_description" $DB | head -20

# All published page slugs
grep -o "post_name','[^']*'" $DB | grep -v "revision\|auto-draft" | sort | uniq

Gravity Forms schema (for form replacement)

Find form field definitions:

grep "INSERT INTO \`wp_gf_forms\`" .planning/wpress-extract/database.sql | \
  python3 -c "
import sys, json, re
for line in sys.stdin:
    m = re.search(r\"'([^']+)'\s*\)\s*;\", line)
    if m:
        try: print(json.dumps(json.loads(m.group(1).replace('\\\\\"','\"')), indent=2)[:2000])
        except: pass
" 2>/dev/null | head -100

Field types seen in Gravity Forms: text, email, phone, textarea, select, checkbox, radio, name, address, fileupload. Map each to a plain HTML input equivalent.

Archive directory layout note

The AIOIM .wpress format extracts flat — no wp-content/ wrapper:

wpress-extract/
├── database.sql          ← NOT in wp-content/
├── package.json
├── uploads/              ← NOT wp-content/uploads/
├── themes/               ← NOT wp-content/themes/
├── plugins/              ← NOT wp-content/plugins/
└── et-cache/

Scripts must reference uploads/, themes/, plugins/ directly under wpress-extract/, not wpress-extract/wp-content/.

Next step

Once pages.json is written, proceed to 03-divi-content-extraction.md to parse content_raw for each page into structured AM-ready HTML.