# 03 — Divi Content Extraction Parse raw Divi page content from `pages.json` into clean, structured HTML sections ready to map into AM templates. ## Divi 4 vs Divi 5 — critical difference ### Divi 4 (shortcode-based) Content is stored as shortcodes in `wp_posts.post_content`: ``` [et_pb_section fb_built="1" admin_label="Hero" _builder_version="4.27.4" background_color="#0f5f53" ...] [et_pb_row ...] [et_pb_column type="4_4" ...] [et_pb_text ...]

Move With Intention

[/et_pb_text] [et_pb_button button_url="/contact" button_text="Book a Class" /] [/et_pb_column] [/et_pb_row] [/et_pb_section] ``` Use `extract_divi4.py` → parses shortcode tree into section/row/module JSON. ### Divi 5 (block-based) Content is stored as Gutenberg-style block comments: ```html

Move With Intention

``` Use `extract_divi5.py` → strips block wrapper, extracts inner HTML per module. ## Divi 5 extraction script ```bash python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_divi5.py \ {domain}/.planning/data/pages.json \ {domain}/.planning/data/content/ ``` Produces one JSON file per page: `content/{slug}.json` ```json { "slug": "about", "title": "About VibrantYou Yoga", "seo_title": "About VibrantYou Yoga | ...", "seo_description": "...", "sections": [ { "type": "hero", "background_color": "#0f5f53", "modules": [ { "module": "text", "html": "

Move With Intention

" }, { "module": "button", "text": "Book a Class", "url": "/contact/" } ] }, { "type": "standard", "modules": [ { "module": "text", "html": "

Our Story

...

" }, { "module": "image", "src": "/assets/images/studio.webp", "alt": "..." } ] } ] } ``` ## ACF fields take priority If a page has ACF fields (in `pages.json[].acf`), use those over block content. ACF fields are typically cleaner, pre-authored copy without Divi wrapper noise. Convention for VYY-specific ACF keys: - `vyy_hero_headline` → `

` in hero section - `vyy_hero_subhead` → `

` in hero - `vyy_hero_cta_text` → primary CTA button label - `vyy_hero_cta_url` → primary CTA button href Always check `acf` keys before parsing `content_raw`. ## Stripping Divi class/attribute noise After extraction, run every HTML snippet through the `clean_divi_html()` function from `divi_to_html.py`: ```python from divi_to_html import clean_divi_html, rewrite_internal_links cleaned = clean_divi_html(raw_html) cleaned = rewrite_internal_links(cleaned, staging_hosts=("vibrantyou.yoga",)) ``` This removes: - `` block comments - `data-et-*`, `data-builder-*` attributes - `et_pb_*`, `divi-builder-*`, `d5_*` class tokens - Empty `class=""` attributes ## What to extract per section type | Divi module | Extract | Map to AM element | |-------------|---------|-------------------| | `divi/text` | inner HTML | `

`, `

`, headings as-is | | `divi/button` | `text`, `url` | `` | | `divi/image` | `src`, `alt`, `title` | `` → rewrite to WebP path | | `divi/blurb` | icon, title, body | `.am-card` component | | `divi/testimonial` | quote, author, company | `.am-testimonial` component | | `divi/video` | `src`, poster | `

` | | `divi/fullwidth_header` | title, subhead, CTA | hero section | ## Section background colors → AM section modifiers Divi 5 stores `backgroundColor` in the block `attrs` JSON. Map to AM CSS modifier classes: | Divi background | AM class modifier | |----------------|------------------| | `#0f5f53` (dark teal) | `.section--dark` | | `#1a8a7a` (mid teal) | `.section--brand` | | `#f5f5f5` / `#fafafa` | `.section--light` | | `#ffffff` / none | `.section--white` | ## Content quality pass (required before HTML build) After extraction, review every page's content for: 1. **Cut bloated copy** — WordPress sites often have 3x more text than needed. Target 30-50% reduction. One clear idea per paragraph. 2. **Remove stale metrics** — "Over 500 students" only stays if it's verifiable. Otherwise remove or mark `DRAFT NEEDED`. 3. **Remove plugin artifacts** — Gravity Forms shortcodes `[gravityforms id="1"]`, Events Manager tags, Divi shortcode residue that survived extraction. 4. **Improve CTAs** — Replace generic "Learn More" with action-specific text: "Book a Free Class", "View the Schedule", "Start Your Practice". 5. **Flag images** — Note every `` that needs a real photo vs stock. ## Next step Proceed to `04-design-system-extraction.md` to convert Divi theme settings into AM CSS custom properties, then `05-content-migration.md` to build the HTML templates.