Files
2026-06-09 18:31:59 +02:00

152 lines
4.5 KiB
Markdown

# 02 — Database Analysis
Parse the WordPress MySQL dump to inventory pages, detect Divi version,
extract design settings, and build the data JSON files that drive the AM build.
## Script
```bash
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/analyze_db.py \
{domain}/.planning/wpress-extract/ \
{domain}/.planning/data/
```
Outputs three files into `.planning/data/`:
- `pages.json` — all published pages/posts with content and SEO meta
- `design-system.json` — colors, fonts, Divi settings
- `site-info.json` — domain, plugin list, WP version, Divi version
## Divi version detection
The script auto-detects Divi version by scanning `database.sql`:
| Signal in SQL | Divi version |
|---------------|-------------|
| `wp:divi/` in post_content | Divi 5 |
| `[et_pb_section` in post_content | Divi 4 |
**This determines the content extraction path.** Divi 4 → use `extract_divi4.py`.
Divi 5 → use `extract_divi5.py`. See `03-divi-content-extraction.md`.
## Key WordPress tables
| Table | Contents | Used for |
|-------|----------|---------|
| `wp_posts` | All pages, posts, attachments, layouts | Page inventory, content |
| `wp_postmeta` | Per-post metadata | ACF fields, Rank Math SEO, Divi layout JSON |
| `wp_options` | Site-wide settings | Divi theme settings, colors, fonts |
| `wp_gf_forms` | Gravity Forms definitions | Form field schema |
| `wp_gf_entries` | Gravity Form submissions | Not needed for migration |
| `wp_rank_math_seo_meta` | Rank Math SEO per page | SEO titles, descriptions |
## Reading pages.json
Each entry in `pages.json`:
```json
{
"id": "42",
"post_type": "page",
"slug": "about",
"title": "About VibrantYou Yoga",
"status": "publish",
"date": "2026-03-15",
"modified": "2026-04-10",
"content_raw": "<!-- wp:divi/section ... -->...",
"excerpt": "",
"parent_id": "0",
"menu_order": "3",
"seo_title": "About VibrantYou Yoga | Mindful Movement in [City]",
"seo_description": "...",
"seo_keywords": "yoga studio, mindful movement",
"acf": {
"vyy_hero_headline": "Move With Intention",
"vyy_hero_subhead": "..."
}
}
```
`content_raw` holds the raw Divi block markup. Pass it to the extractor scripts.
`acf` holds Advanced Custom Fields values — often cleaner than block content.
## Reading design-system.json
Contains extracted Divi theme settings. Key fields:
```json
{
"primary_color": "#1a8a7a",
"body_font": "DM Sans",
"header_font": "DM Serif Display",
"body_font_size": "16",
"body_line_height": "1.7",
"divi_version": "5",
"wp_version": "6.9.4",
"site_url": "https://vibrantyou.yoga",
"site_name": "VibrantYou Yoga"
}
```
Use these values to seed the AM `main.css` CSS custom properties block.
## Manual inspection (when script output is sparse)
Sometimes the Divi theme options are stored as PHP-serialized data.
Use grep to find and eyeball the raw values:
```bash
DB=.planning/wpress-extract/database.sql
# Divi global colors
grep -o "'et_divi[^']*','[^']*'" $DB | head -30
# Site name + URL
grep -E "'(siteurl|blogname|admin_email)','[^']*'" $DB
# Rank Math SEO meta for a specific post
grep "rank_math_title\|rank_math_description" $DB | head -20
# All published page slugs
grep -o "post_name','[^']*'" $DB | grep -v "revision\|auto-draft" | sort | uniq
```
## Gravity Forms schema (for form replacement)
Find form field definitions:
```bash
grep "INSERT INTO \`wp_gf_forms\`" .planning/wpress-extract/database.sql | \
python3 -c "
import sys, json, re
for line in sys.stdin:
m = re.search(r\"'([^']+)'\s*\)\s*;\", line)
if m:
try: print(json.dumps(json.loads(m.group(1).replace('\\\\\"','\"')), indent=2)[:2000])
except: pass
" 2>/dev/null | head -100
```
Field types seen in Gravity Forms: text, email, phone, textarea, select, checkbox, radio, name, address, fileupload. Map each to a plain HTML input equivalent.
## Archive directory layout note
The AIOIM .wpress format extracts flat — no `wp-content/` wrapper:
```
wpress-extract/
├── database.sql ← NOT in wp-content/
├── package.json
├── uploads/ ← NOT wp-content/uploads/
├── themes/ ← NOT wp-content/themes/
├── plugins/ ← NOT wp-content/plugins/
└── et-cache/
```
Scripts must reference `uploads/`, `themes/`, `plugins/` directly under
`wpress-extract/`, not `wpress-extract/wp-content/`.
## Next step
Once `pages.json` is written, proceed to `03-divi-content-extraction.md`
to parse `content_raw` for each page into structured AM-ready HTML.