recent updates
This commit is contained in:
@@ -0,0 +1,120 @@
|
||||
# 01 — .wpress Extraction
|
||||
|
||||
Unpack the All-in-One WP Migration `.wpress` archive into the project's
|
||||
`.planning/wpress-extract/` directory.
|
||||
|
||||
## .wpress binary format
|
||||
|
||||
NOT a standard zip or tar. Custom sequential binary format:
|
||||
|
||||
```
|
||||
[HEADER 4377 bytes] [FILE DATA n bytes] [HEADER] [FILE DATA] ...
|
||||
```
|
||||
|
||||
Header breakdown:
|
||||
```
|
||||
Offset Length Field
|
||||
0 255 Filename (null-padded)
|
||||
255 14 File size in bytes (ASCII decimal, null-padded)
|
||||
269 12 mtime unix timestamp (ASCII decimal, null-padded)
|
||||
281 4096 Relative path (null-padded)
|
||||
4377 n Raw file bytes (size from header)
|
||||
```
|
||||
|
||||
The archive ends when a header of all null bytes is encountered, or EOF.
|
||||
|
||||
## Extraction script
|
||||
|
||||
Script: `.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py`
|
||||
|
||||
```bash
|
||||
python3 ~/.am-webdesign-sops-path/scripts/extract_wpress.py \
|
||||
.planning/vibrantyou-yoga-YYYYMMDD-*.wpress \
|
||||
.planning/wpress-extract/
|
||||
```
|
||||
|
||||
Or from the SOP scripts directory directly:
|
||||
|
||||
```bash
|
||||
python3 /home/sirdrez/arisingmedia-websites/.am-webdesign-sops/wp-divi-pipeline/scripts/extract_wpress.py \
|
||||
/home/sirdrez/arisingmedia-websites/{domain}/.planning/{file}.wpress \
|
||||
/home/sirdrez/arisingmedia-websites/{domain}/.planning/wpress-extract/
|
||||
```
|
||||
|
||||
Progress prints every 200 files. A 300-400MB archive typically extracts in
|
||||
2-5 minutes and produces 1,000-5,000 files.
|
||||
|
||||
## Expected archive contents
|
||||
|
||||
After extraction, `wpress-extract/` contains:
|
||||
|
||||
```
|
||||
wpress-extract/
|
||||
├── package.json ← archive metadata (domain, WP version, plugin list)
|
||||
├── database.sql ← full MySQL dump (the most important file)
|
||||
└── wp-content/
|
||||
├── uploads/ ← all media (images, PDFs, videos)
|
||||
│ └── YYYY/MM/ ← WordPress date-organized subdirs
|
||||
├── themes/
|
||||
│ ├── Divi/ ← Divi 4 theme files (if Divi 4)
|
||||
│ └── divi-5/ ← Divi 5 theme files (if Divi 5)
|
||||
└── plugins/ ← installed plugins (useful for form schema)
|
||||
├── gravityforms/
|
||||
└── contact-form-7/
|
||||
```
|
||||
|
||||
## Verify extraction
|
||||
|
||||
After the script completes, confirm the key files exist:
|
||||
|
||||
```bash
|
||||
# Database dump present?
|
||||
ls -lh .planning/wpress-extract/database.sql
|
||||
|
||||
# Uploads present?
|
||||
find .planning/wpress-extract/wp-content/uploads -name "*.jpg" | wc -l
|
||||
find .planning/wpress-extract/wp-content/uploads -name "*.png" | wc -l
|
||||
|
||||
# Archive metadata
|
||||
cat .planning/wpress-extract/package.json
|
||||
```
|
||||
|
||||
`package.json` contains the site URL, WordPress version, Divi version, and
|
||||
plugin list — read it before proceeding to Phase 2.
|
||||
|
||||
## Common issues
|
||||
|
||||
**"Not a zip file" error** — Expected. The .wpress format is not zip.
|
||||
The `extract_wpress.py` script handles it correctly.
|
||||
|
||||
**Missing database.sql** — The archive may name it differently. Check:
|
||||
```bash
|
||||
find .planning/wpress-extract -name "*.sql" 2>/dev/null
|
||||
```
|
||||
|
||||
**Partial extraction** — If the script stops early, check disk space:
|
||||
```bash
|
||||
df -h .planning/wpress-extract/
|
||||
```
|
||||
A 378MB .wpress typically expands to 1-3GB uncompressed.
|
||||
|
||||
**Path traversal in filenames** — The script strips leading `/` and `.` from
|
||||
paths. If files land in unexpected locations, check the raw path field with:
|
||||
```bash
|
||||
python3 -c "
|
||||
import sys
|
||||
HEADER_SIZE=4377; NAME_LEN=255; SIZE_LEN=14; MTIME_LEN=12; PATH_LEN=4096
|
||||
with open(sys.argv[1],'rb') as f:
|
||||
for i in range(5):
|
||||
h = f.read(HEADER_SIZE)
|
||||
name = h[:NAME_LEN].split(b'\x00',1)[0].decode(errors='replace')
|
||||
size = int(h[NAME_LEN:NAME_LEN+SIZE_LEN].split(b'\x00',1)[0] or 0)
|
||||
path = h[NAME_LEN+SIZE_LEN+MTIME_LEN:].split(b'\x00',1)[0].decode(errors='replace')
|
||||
print(f' [{i}] path={repr(path)} name={repr(name)} size={size}')
|
||||
f.seek(size, 1)
|
||||
" .planning/file.wpress
|
||||
```
|
||||
|
||||
## Next step
|
||||
|
||||
Proceed to `02-database-analysis.md` to inventory pages and detect Divi version.
|
||||
Reference in New Issue
Block a user