# 01 — .wpress Extraction Unpack the All-in-One WP Migration `.wpress` archive into the project's `.planning/wpress-extract/` directory. ## .wpress binary format NOT a standard zip or tar. Custom sequential binary format: ``` [HEADER 4377 bytes] [FILE DATA n bytes] [HEADER] [FILE DATA] ... ``` Header breakdown: ``` Offset Length Field 0 255 Filename (null-padded) 255 14 File size in bytes (ASCII decimal, null-padded) 269 12 mtime unix timestamp (ASCII decimal, null-padded) 281 4096 Relative path (null-padded) 4377 n Raw file bytes (size from header) ``` The archive ends when a header of all null bytes is encountered, or EOF. ## Extraction script Script: `.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py` ```bash python3 ~/.am-webdesign-sops-path/scripts/extract_wpress.py \ .planning/vibrantyou-yoga-YYYYMMDD-*.wpress \ .planning/wpress-extract/ ``` Or from the SOP scripts directory directly: ```bash python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py \ /home/sirdrez/arisingmedia-websites/{domain}/.planning/{file}.wpress \ /home/sirdrez/arisingmedia-websites/{domain}/.planning/wpress-extract/ ``` Progress prints every 200 files. A 300-400MB archive typically extracts in 2-5 minutes and produces 1,000-5,000 files. ## Expected archive contents After extraction, `wpress-extract/` contains: ``` wpress-extract/ ├── package.json ← archive metadata (domain, WP version, plugin list) ├── database.sql ← full MySQL dump (the most important file) └── wp-content/ ├── uploads/ ← all media (images, PDFs, videos) │ └── YYYY/MM/ ← WordPress date-organized subdirs ├── themes/ │ ├── Divi/ ← Divi 4 theme files (if Divi 4) │ └── divi-5/ ← Divi 5 theme files (if Divi 5) └── plugins/ ← installed plugins (useful for form schema) ├── gravityforms/ └── contact-form-7/ ``` ## Verify extraction After the script completes, confirm the key files exist: ```bash # Database dump present? ls -lh .planning/wpress-extract/database.sql # Uploads present? find .planning/wpress-extract/wp-content/uploads -name "*.jpg" | wc -l find .planning/wpress-extract/wp-content/uploads -name "*.png" | wc -l # Archive metadata cat .planning/wpress-extract/package.json ``` `package.json` contains the site URL, WordPress version, Divi version, and plugin list — read it before proceeding to Phase 2. ## Common issues **"Not a zip file" error** — Expected. The .wpress format is not zip. The `extract_wpress.py` script handles it correctly. **Missing database.sql** — The archive may name it differently. Check: ```bash find .planning/wpress-extract -name "*.sql" 2>/dev/null ``` **Partial extraction** — If the script stops early, check disk space: ```bash df -h .planning/wpress-extract/ ``` A 378MB .wpress typically expands to 1-3GB uncompressed. **Path traversal in filenames** — The script strips leading `/` and `.` from paths. If files land in unexpected locations, check the raw path field with: ```bash python3 -c " import sys HEADER_SIZE=4377; NAME_LEN=255; SIZE_LEN=14; MTIME_LEN=12; PATH_LEN=4096 with open(sys.argv[1],'rb') as f: for i in range(5): h = f.read(HEADER_SIZE) name = h[:NAME_LEN].split(b'\x00',1)[0].decode(errors='replace') size = int(h[NAME_LEN:NAME_LEN+SIZE_LEN].split(b'\x00',1)[0] or 0) path = h[NAME_LEN+SIZE_LEN+MTIME_LEN:].split(b'\x00',1)[0].decode(errors='replace') print(f' [{i}] path={repr(path)} name={repr(name)} size={size}') f.seek(size, 1) " .planning/file.wpress ``` ## Next step Proceed to `02-database-analysis.md` to inventory pages and detect Divi version.