recent updates

2026-06-09 18:31:59 +02:00
parent 398b94965c
commit 94f7a1f72a
42 changed files with 8686 additions and 0 deletions
@@ -0,0 +1,120 @@
+# 01 — .wpress Extraction
+
+Unpack the All-in-One WP Migration `.wpress` archive into the project's
+`.planning/wpress-extract/` directory.
+
+## .wpress binary format
+
+NOT a standard zip or tar. Custom sequential binary format:
+
+```
+[HEADER 4377 bytes] [FILE DATA n bytes] [HEADER] [FILE DATA] ...
+```
+
+Header breakdown:
+```
+Offset   Length  Field
+0        255     Filename (null-padded)
+255      14      File size in bytes (ASCII decimal, null-padded)
+269      12      mtime unix timestamp (ASCII decimal, null-padded)
+281      4096    Relative path (null-padded)
+4377     n       Raw file bytes (size from header)
+```
+
+The archive ends when a header of all null bytes is encountered, or EOF.
+
+## Extraction script
+
+Script: `.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py`
+
+```bash
+python3 ~/.am-webdesign-sops-path/scripts/extract_wpress.py \
+  .planning/vibrantyou-yoga-YYYYMMDD-*.wpress \
+  .planning/wpress-extract/
+```
+
+Or from the SOP scripts directory directly:
+
+```bash
+python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py \
+  /home/sirdrez/arisingmedia-websites/{domain}/.planning/{file}.wpress \
+  /home/sirdrez/arisingmedia-websites/{domain}/.planning/wpress-extract/
+```
+
+Progress prints every 200 files. A 300-400MB archive typically extracts in
+2-5 minutes and produces 1,000-5,000 files.
+
+## Expected archive contents
+
+After extraction, `wpress-extract/` contains:
+
+```
+wpress-extract/
+├── package.json              ← archive metadata (domain, WP version, plugin list)
+├── database.sql              ← full MySQL dump (the most important file)
+└── wp-content/
+    ├── uploads/              ← all media (images, PDFs, videos)
+    │   └── YYYY/MM/          ← WordPress date-organized subdirs
+    ├── themes/
+    │   ├── Divi/             ← Divi 4 theme files (if Divi 4)
+    │   └── divi-5/           ← Divi 5 theme files (if Divi 5)
+    └── plugins/              ← installed plugins (useful for form schema)
+        ├── gravityforms/
+        └── contact-form-7/
+```
+
+## Verify extraction
+
+After the script completes, confirm the key files exist:
+
+```bash
+# Database dump present?
+ls -lh .planning/wpress-extract/database.sql
+
+# Uploads present?
+find .planning/wpress-extract/wp-content/uploads -name "*.jpg" | wc -l
+find .planning/wpress-extract/wp-content/uploads -name "*.png" | wc -l
+
+# Archive metadata
+cat .planning/wpress-extract/package.json
+```
+
+`package.json` contains the site URL, WordPress version, Divi version, and
+plugin list — read it before proceeding to Phase 2.
+
+## Common issues
+
+**"Not a zip file" error** — Expected. The .wpress format is not zip.
+The `extract_wpress.py` script handles it correctly.
+
+**Missing database.sql** — The archive may name it differently. Check:
+```bash
+find .planning/wpress-extract -name "*.sql" 2>/dev/null
+```
+
+**Partial extraction** — If the script stops early, check disk space:
+```bash
+df -h .planning/wpress-extract/
+```
+A 378MB .wpress typically expands to 1-3GB uncompressed.
+
+**Path traversal in filenames** — The script strips leading `/` and `.` from
+paths. If files land in unexpected locations, check the raw path field with:
+```bash
+python3 -c "
+import sys
+HEADER_SIZE=4377; NAME_LEN=255; SIZE_LEN=14; MTIME_LEN=12; PATH_LEN=4096
+with open(sys.argv[1],'rb') as f:
+    for i in range(5):
+        h = f.read(HEADER_SIZE)
+        name = h[:NAME_LEN].split(b'\x00',1)[0].decode(errors='replace')
+        size = int(h[NAME_LEN:NAME_LEN+SIZE_LEN].split(b'\x00',1)[0] or 0)
+        path = h[NAME_LEN+SIZE_LEN+MTIME_LEN:].split(b'\x00',1)[0].decode(errors='replace')
+        print(f'  [{i}] path={repr(path)} name={repr(name)} size={size}')
+        f.seek(size, 1)
+" .planning/file.wpress
+```
+
+## Next step
+
+Proceed to `02-database-analysis.md` to inventory pages and detect Divi version.