# 07 — SEO Preservation

Before building HTML, map every WordPress page URL to its new AM URL and
ensure title, description, canonical, and schema.org are preserved or improved.

## Step 1 — Inventory all WP URLs

Extract every published page slug from `pages.json`:

```bash
python3 -c "
import json
pages = json.load(open('.planning/data/pages.json'))
for p in pages:
    slug = p['slug']
    ptype = p['post_type']
    print(f'/{slug}/  ({ptype})  title={p[\"title\"]!r}')
" | tee .planning/data/wp-url-inventory.txt
```

## Step 2 — Build redirect map

Map each WP URL to the new AM URL. Write to `.planning/data/redirect-map.txt`:

Format: `OLD_PATH -> NEW_PATH`

Common mapping patterns for yoga sites:

| Old WP URL | New AM URL | Action |
|-----------|-----------|--------|
| `/` | `/` | Same |
| `/about/` | `/about/` | Same |
| `/classes/` | `/classes/` | Same |
| `/yoga-class-name/` | `/classes/yoga-class-name.html` | Restructure |
| `/private-yoga-sessions/` | `/private-sessions/` | Rename |
| `/contact-us/` | `/contact/` | Simplify |
| `/?page_id=42` | `/about/` | WP ID → slug |
| `/blog/post-title/` | `/blog/post-title.html` | Flatten |
| `/events/event-name/` | `/classes/` | Consolidate into schedule |

Redirects go into `infra/nginx.conf`:

```nginx
# Exact-match redirects
location = /contact-us/ { return 301 /contact/; }
location = /private-yoga-sessions/ { return 301 /private-sessions/; }

# WP page ID redirects
location = / {
  if ($arg_page_id = "42") { return 301 /about/; }
  if ($arg_p) { return 301 /blog/; }
}

# WP upload URLs → AM asset paths (catch-all)
location ^~ /wp-content/uploads/ {
  return 301 /assets/images/$uri;
}

# Block all WP URLs
location ~ ^/wp-(admin|login|json|cron|includes|content/plugins|content/themes) {
  return 410;
}
```

## Step 3 — Rank Math SEO extraction

Rank Math stores titles and descriptions in `wp_postmeta`.
`analyze_db.py` already extracts these into `pages.json` as `seo_title` and `seo_description`.

For each page, the priority order for SEO fields:
1. `seo_title` from Rank Math (if not empty and not a template like `%title% - %sitename%`)
2. `post_title` with AM format appended: `{Title} | VibrantYou Yoga`
3. Never leave title as the raw WP default

Rank Math title templates use `%` tokens — strip them and rebuild:
```python
import re

def clean_rm_title(rm_title: str, post_title: str, site_name: str) -> str:
    if not rm_title or "%" in rm_title:
        return f"{post_title} | {site_name}"
    return rm_title

def clean_rm_desc(rm_desc: str) -> str:
    # Strip %token% placeholders
    return re.sub(r"%[a-z_]+%", "", rm_desc).strip(" -|")
```

## Step 4 — Per-page SEO checklist

For every page in `pages.json`, fill in this record before writing HTML:

```json
{
  "slug":         "about",
  "new_path":     "/about/",
  "canonical":    "https://vibrantyou.yoga/about/",
  "title":        "About VibrantYou Yoga | Mindful Movement in [City], [State]",
  "description":  "Meet the instructors and story behind VibrantYou Yoga. [150-160 chars, include city]",
  "keywords":     "yoga studio [city], yoga instructor, mindful movement",
  "og_image":     "/assets/images/about-studio.webp",
  "schema_type":  "AboutPage",
  "h1":           "Our Story"
}
```

Write to `.planning/data/seo-map.json`. The HTML build reads this file to
stamp `<head>` tags.

## Step 5 — Schema.org per page type

| Page | Schema type | Required fields |
|------|------------|----------------|
| Home | `LocalBusiness` | name, url, telephone, address, areaServed, openingHours |
| About | `AboutPage` + `Organization` | name, description, founders |
| Classes index | `ItemList` of `Course` | name, url, description per class |
| Class detail | `Course` | name, description, provider, educationalLevel |
| Contact | `ContactPage` | name, url, telephone, email, address |
| Blog post | `Article` | headline, datePublished, author, image |
| 404 | none | — |

LocalBusiness schema for vibrantyou.yoga (seed from `site-info.json`):
```json
{
  "@context": "https://schema.org",
  "@type": ["LocalBusiness", "HealthAndBeautyBusiness"],
  "@id": "https://vibrantyou.yoga/#business",
  "name": "VibrantYou Yoga",
  "url": "https://vibrantyou.yoga",
  "telephone": "",
  "priceRange": "$$",
  "servesCuisine": null,
  "currenciesAccepted": "USD",
  "paymentAccepted": "Cash, Credit Card",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "",
    "addressLocality": "",
    "addressRegion": "",
    "postalCode": "",
    "addressCountry": "US"
  }
}
```
Mark address fields `DRAFT NEEDED` — do not fabricate. Pull from `wp_options`
(`admin_email`, Events Manager location settings) or ask client.

## Step 6 — Pre-launch SEO audit commands

Run these before declaring the build complete:

```bash
SITE=src

# Every page has a <title>
find $SITE -name "*.html" | xargs grep -L '<title>' | grep -v "_template"

# Every page has meta description
find $SITE -name "*.html" | xargs grep -L 'name="description"' | grep -v "_template"

# Every page has canonical
find $SITE -name "*.html" | xargs grep -L 'rel="canonical"' | grep -v "_template"

# Every page has JSON-LD
find $SITE -name "*.html" | xargs grep -L 'application/ld+json' | grep -v "_template"

# No WP URLs leaked into HTML
grep -r "wp-content\|wp-admin\|wordpress\|?p=\|?page_id=" $SITE --include="*.html"

# No unreplaced template placeholders
grep -r "{{" $SITE --include="*.html"

# No Divi class residue
grep -r "et_pb_\|divi-builder" $SITE --include="*.html"
```

All six commands must return zero results before launch.

## Next step

Proceed to `08-run-order.md` for the complete execution sequence,
then `02-wordpress-to-html-migration.md` Phase 7 for DNS cutover.