recent updates
This commit is contained in:
@@ -0,0 +1,157 @@
|
||||
# 03 — Divi Content Extraction
|
||||
|
||||
Parse raw Divi page content from `pages.json` into clean, structured HTML
|
||||
sections ready to map into AM templates.
|
||||
|
||||
## Divi 4 vs Divi 5 — critical difference
|
||||
|
||||
### Divi 4 (shortcode-based)
|
||||
|
||||
Content is stored as shortcodes in `wp_posts.post_content`:
|
||||
|
||||
```
|
||||
[et_pb_section fb_built="1" admin_label="Hero" _builder_version="4.27.4"
|
||||
background_color="#0f5f53" ...]
|
||||
[et_pb_row ...]
|
||||
[et_pb_column type="4_4" ...]
|
||||
[et_pb_text ...]<h1>Move With Intention</h1>[/et_pb_text]
|
||||
[et_pb_button button_url="/contact" button_text="Book a Class" /]
|
||||
[/et_pb_column]
|
||||
[/et_pb_row]
|
||||
[/et_pb_section]
|
||||
```
|
||||
|
||||
Use `extract_divi4.py` → parses shortcode tree into section/row/module JSON.
|
||||
|
||||
### Divi 5 (block-based)
|
||||
|
||||
Content is stored as Gutenberg-style block comments:
|
||||
|
||||
```html
|
||||
<!-- wp:divi/section {"id":"section-abc123","attrs":{"backgroundColor":{"value":"#0f5f53"}}} -->
|
||||
<div class="et_pb_section ...">
|
||||
<!-- wp:divi/row ... -->
|
||||
<!-- wp:divi/column ... -->
|
||||
<!-- wp:divi/text ... -->
|
||||
<div class="et_pb_text_inner"><h1>Move With Intention</h1></div>
|
||||
<!-- /wp:divi/text -->
|
||||
<!-- /wp:divi/column -->
|
||||
<!-- /wp:divi/row -->
|
||||
</div>
|
||||
<!-- /wp:divi/section -->
|
||||
```
|
||||
|
||||
Use `extract_divi5.py` → strips block wrapper, extracts inner HTML per module.
|
||||
|
||||
## Divi 5 extraction script
|
||||
|
||||
```bash
|
||||
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_divi5.py \
|
||||
{domain}/.planning/data/pages.json \
|
||||
{domain}/.planning/data/content/
|
||||
```
|
||||
|
||||
Produces one JSON file per page: `content/{slug}.json`
|
||||
|
||||
```json
|
||||
{
|
||||
"slug": "about",
|
||||
"title": "About VibrantYou Yoga",
|
||||
"seo_title": "About VibrantYou Yoga | ...",
|
||||
"seo_description": "...",
|
||||
"sections": [
|
||||
{
|
||||
"type": "hero",
|
||||
"background_color": "#0f5f53",
|
||||
"modules": [
|
||||
{ "module": "text", "html": "<h1>Move With Intention</h1>" },
|
||||
{ "module": "button", "text": "Book a Class", "url": "/contact/" }
|
||||
]
|
||||
},
|
||||
{
|
||||
"type": "standard",
|
||||
"modules": [
|
||||
{ "module": "text", "html": "<h2>Our Story</h2><p>...</p>" },
|
||||
{ "module": "image", "src": "/assets/images/studio.webp", "alt": "..." }
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## ACF fields take priority
|
||||
|
||||
If a page has ACF fields (in `pages.json[].acf`), use those over block content.
|
||||
ACF fields are typically cleaner, pre-authored copy without Divi wrapper noise.
|
||||
|
||||
Convention for VYY-specific ACF keys:
|
||||
- `vyy_hero_headline` → `<h1>` in hero section
|
||||
- `vyy_hero_subhead` → `<p class="hero-lead">` in hero
|
||||
- `vyy_hero_cta_text` → primary CTA button label
|
||||
- `vyy_hero_cta_url` → primary CTA button href
|
||||
|
||||
Always check `acf` keys before parsing `content_raw`.
|
||||
|
||||
## Stripping Divi class/attribute noise
|
||||
|
||||
After extraction, run every HTML snippet through the `clean_divi_html()`
|
||||
function from `divi_to_html.py`:
|
||||
|
||||
```python
|
||||
from divi_to_html import clean_divi_html, rewrite_internal_links
|
||||
|
||||
cleaned = clean_divi_html(raw_html)
|
||||
cleaned = rewrite_internal_links(cleaned, staging_hosts=("vibrantyou.yoga",))
|
||||
```
|
||||
|
||||
This removes:
|
||||
- `<!-- wp:divi/... -->` block comments
|
||||
- `data-et-*`, `data-builder-*` attributes
|
||||
- `et_pb_*`, `divi-builder-*`, `d5_*` class tokens
|
||||
- Empty `class=""` attributes
|
||||
|
||||
## What to extract per section type
|
||||
|
||||
| Divi module | Extract | Map to AM element |
|
||||
|-------------|---------|-------------------|
|
||||
| `divi/text` | inner HTML | `<section>`, `<p>`, headings as-is |
|
||||
| `divi/button` | `text`, `url` | `<a class="btn-primary">` |
|
||||
| `divi/image` | `src`, `alt`, `title` | `<img>` → rewrite to WebP path |
|
||||
| `divi/blurb` | icon, title, body | `.am-card` component |
|
||||
| `divi/testimonial` | quote, author, company | `.am-testimonial` component |
|
||||
| `divi/video` | `src`, poster | `<video>` or YouTube embed |
|
||||
| `divi/contact_form` | field list | → replace with AM form, see `08` |
|
||||
| `divi/accordion` | Q+A pairs | `<details><summary>` |
|
||||
| `divi/fullwidth_header` | title, subhead, CTA | hero section |
|
||||
|
||||
## Section background colors → AM section modifiers
|
||||
|
||||
Divi 5 stores `backgroundColor` in the block `attrs` JSON.
|
||||
Map to AM CSS modifier classes:
|
||||
|
||||
| Divi background | AM class modifier |
|
||||
|----------------|------------------|
|
||||
| `#0f5f53` (dark teal) | `.section--dark` |
|
||||
| `#1a8a7a` (mid teal) | `.section--brand` |
|
||||
| `#f5f5f5` / `#fafafa` | `.section--light` |
|
||||
| `#ffffff` / none | `.section--white` |
|
||||
|
||||
## Content quality pass (required before HTML build)
|
||||
|
||||
After extraction, review every page's content for:
|
||||
|
||||
1. **Cut bloated copy** — WordPress sites often have 3x more text than needed.
|
||||
Target 30-50% reduction. One clear idea per paragraph.
|
||||
2. **Remove stale metrics** — "Over 500 students" only stays if it's verifiable.
|
||||
Otherwise remove or mark `DRAFT NEEDED`.
|
||||
3. **Remove plugin artifacts** — Gravity Forms shortcodes `[gravityforms id="1"]`,
|
||||
Events Manager tags, Divi shortcode residue that survived extraction.
|
||||
4. **Improve CTAs** — Replace generic "Learn More" with action-specific text:
|
||||
"Book a Free Class", "View the Schedule", "Start Your Practice".
|
||||
5. **Flag images** — Note every `<img>` that needs a real photo vs stock.
|
||||
|
||||
## Next step
|
||||
|
||||
Proceed to `04-design-system-extraction.md` to convert Divi theme settings
|
||||
into AM CSS custom properties, then `05-content-migration.md` to build the
|
||||
HTML templates.
|
||||
Reference in New Issue
Block a user