Files
arisingmedia-web-sops/wp-divi-pipeline-to-am-stack/01-wpress-extraction.md
T
2026-06-09 18:31:59 +02:00

3.8 KiB

01 — .wpress Extraction

Unpack the All-in-One WP Migration .wpress archive into the project's .planning/wpress-extract/ directory.

.wpress binary format

NOT a standard zip or tar. Custom sequential binary format:

[HEADER 4377 bytes] [FILE DATA n bytes] [HEADER] [FILE DATA] ...

Header breakdown:

Offset   Length  Field
0        255     Filename (null-padded)
255      14      File size in bytes (ASCII decimal, null-padded)
269      12      mtime unix timestamp (ASCII decimal, null-padded)
281      4096    Relative path (null-padded)
4377     n       Raw file bytes (size from header)

The archive ends when a header of all null bytes is encountered, or EOF.

Extraction script

Script: .am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py

python3 ~/.am-webdesign-sops-path/scripts/extract_wpress.py \
  .planning/vibrantyou-yoga-YYYYMMDD-*.wpress \
  .planning/wpress-extract/

Or from the SOP scripts directory directly:

python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py \
  /home/sirdrez/arisingmedia-websites/{domain}/.planning/{file}.wpress \
  /home/sirdrez/arisingmedia-websites/{domain}/.planning/wpress-extract/

Progress prints every 200 files. A 300-400MB archive typically extracts in 2-5 minutes and produces 1,000-5,000 files.

Expected archive contents

After extraction, wpress-extract/ contains:

wpress-extract/
├── package.json              ← archive metadata (domain, WP version, plugin list)
├── database.sql              ← full MySQL dump (the most important file)
└── wp-content/
    ├── uploads/              ← all media (images, PDFs, videos)
    │   └── YYYY/MM/          ← WordPress date-organized subdirs
    ├── themes/
    │   ├── Divi/             ← Divi 4 theme files (if Divi 4)
    │   └── divi-5/           ← Divi 5 theme files (if Divi 5)
    └── plugins/              ← installed plugins (useful for form schema)
        ├── gravityforms/
        └── contact-form-7/

Verify extraction

After the script completes, confirm the key files exist:

# Database dump present?
ls -lh .planning/wpress-extract/database.sql

# Uploads present?
find .planning/wpress-extract/wp-content/uploads -name "*.jpg" | wc -l
find .planning/wpress-extract/wp-content/uploads -name "*.png" | wc -l

# Archive metadata
cat .planning/wpress-extract/package.json

package.json contains the site URL, WordPress version, Divi version, and plugin list — read it before proceeding to Phase 2.

Common issues

"Not a zip file" error — Expected. The .wpress format is not zip. The extract_wpress.py script handles it correctly.

Missing database.sql — The archive may name it differently. Check:

find .planning/wpress-extract -name "*.sql" 2>/dev/null

Partial extraction — If the script stops early, check disk space:

df -h .planning/wpress-extract/

A 378MB .wpress typically expands to 1-3GB uncompressed.

Path traversal in filenames — The script strips leading / and . from paths. If files land in unexpected locations, check the raw path field with:

python3 -c "
import sys
HEADER_SIZE=4377; NAME_LEN=255; SIZE_LEN=14; MTIME_LEN=12; PATH_LEN=4096
with open(sys.argv[1],'rb') as f:
    for i in range(5):
        h = f.read(HEADER_SIZE)
        name = h[:NAME_LEN].split(b'\x00',1)[0].decode(errors='replace')
        size = int(h[NAME_LEN:NAME_LEN+SIZE_LEN].split(b'\x00',1)[0] or 0)
        path = h[NAME_LEN+SIZE_LEN+MTIME_LEN:].split(b'\x00',1)[0].decode(errors='replace')
        print(f'  [{i}] path={repr(path)} name={repr(name)} size={size}')
        f.seek(size, 1)
" .planning/file.wpress

Next step

Proceed to 02-database-analysis.md to inventory pages and detect Divi version.