5.2 KiB
03 — Divi Content Extraction
Parse raw Divi page content from pages.json into clean, structured HTML
sections ready to map into AM templates.
Divi 4 vs Divi 5 — critical difference
Divi 4 (shortcode-based)
Content is stored as shortcodes in wp_posts.post_content:
[et_pb_section fb_built="1" admin_label="Hero" _builder_version="4.27.4"
background_color="#0f5f53" ...]
[et_pb_row ...]
[et_pb_column type="4_4" ...]
[et_pb_text ...]<h1>Move With Intention</h1>[/et_pb_text]
[et_pb_button button_url="/contact" button_text="Book a Class" /]
[/et_pb_column]
[/et_pb_row]
[/et_pb_section]
Use extract_divi4.py → parses shortcode tree into section/row/module JSON.
Divi 5 (block-based)
Content is stored as Gutenberg-style block comments:
<!-- wp:divi/section {"id":"section-abc123","attrs":{"backgroundColor":{"value":"#0f5f53"}}} -->
<div class="et_pb_section ...">
<!-- wp:divi/row ... -->
<!-- wp:divi/column ... -->
<!-- wp:divi/text ... -->
<div class="et_pb_text_inner"><h1>Move With Intention</h1></div>
<!-- /wp:divi/text -->
<!-- /wp:divi/column -->
<!-- /wp:divi/row -->
</div>
<!-- /wp:divi/section -->
Use extract_divi5.py → strips block wrapper, extracts inner HTML per module.
Divi 5 extraction script
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_divi5.py \
{domain}/.planning/data/pages.json \
{domain}/.planning/data/content/
Produces one JSON file per page: content/{slug}.json
{
"slug": "about",
"title": "About VibrantYou Yoga",
"seo_title": "About VibrantYou Yoga | ...",
"seo_description": "...",
"sections": [
{
"type": "hero",
"background_color": "#0f5f53",
"modules": [
{ "module": "text", "html": "<h1>Move With Intention</h1>" },
{ "module": "button", "text": "Book a Class", "url": "/contact/" }
]
},
{
"type": "standard",
"modules": [
{ "module": "text", "html": "<h2>Our Story</h2><p>...</p>" },
{ "module": "image", "src": "/assets/images/studio.webp", "alt": "..." }
]
}
]
}
ACF fields take priority
If a page has ACF fields (in pages.json[].acf), use those over block content.
ACF fields are typically cleaner, pre-authored copy without Divi wrapper noise.
Convention for VYY-specific ACF keys:
vyy_hero_headline→<h1>in hero sectionvyy_hero_subhead→<p class="hero-lead">in herovyy_hero_cta_text→ primary CTA button labelvyy_hero_cta_url→ primary CTA button href
Always check acf keys before parsing content_raw.
Stripping Divi class/attribute noise
After extraction, run every HTML snippet through the clean_divi_html()
function from divi_to_html.py:
from divi_to_html import clean_divi_html, rewrite_internal_links
cleaned = clean_divi_html(raw_html)
cleaned = rewrite_internal_links(cleaned, staging_hosts=("vibrantyou.yoga",))
This removes:
<!-- wp:divi/... -->block commentsdata-et-*,data-builder-*attributeset_pb_*,divi-builder-*,d5_*class tokens- Empty
class=""attributes
What to extract per section type
| Divi module | Extract | Map to AM element |
|---|---|---|
divi/text |
inner HTML | <section>, <p>, headings as-is |
divi/button |
text, url |
<a class="btn-primary"> |
divi/image |
src, alt, title |
<img> → rewrite to WebP path |
divi/blurb |
icon, title, body | .am-card component |
divi/testimonial |
quote, author, company | .am-testimonial component |
divi/video |
src, poster |
<video> or YouTube embed |
divi/contact_form |
field list | → replace with AM form, see 08 |
divi/accordion |
Q+A pairs | <details><summary> |
divi/fullwidth_header |
title, subhead, CTA | hero section |
Section background colors → AM section modifiers
Divi 5 stores backgroundColor in the block attrs JSON.
Map to AM CSS modifier classes:
| Divi background | AM class modifier |
|---|---|
#0f5f53 (dark teal) |
.section--dark |
#1a8a7a (mid teal) |
.section--brand |
#f5f5f5 / #fafafa |
.section--light |
#ffffff / none |
.section--white |
Content quality pass (required before HTML build)
After extraction, review every page's content for:
- Cut bloated copy — WordPress sites often have 3x more text than needed. Target 30-50% reduction. One clear idea per paragraph.
- Remove stale metrics — "Over 500 students" only stays if it's verifiable.
Otherwise remove or mark
DRAFT NEEDED. - Remove plugin artifacts — Gravity Forms shortcodes
[gravityforms id="1"], Events Manager tags, Divi shortcode residue that survived extraction. - Improve CTAs — Replace generic "Learn More" with action-specific text: "Book a Free Class", "View the Schedule", "Start Your Practice".
- Flag images — Note every
<img>that needs a real photo vs stock.
Next step
Proceed to 04-design-system-extraction.md to convert Divi theme settings
into AM CSS custom properties, then 05-content-migration.md to build the
HTML templates.