// Z-15 / benchmark

Benchmark plan

The $30 proof plan. Local-first. Hard cap USD 30 total, USD 5 per day. No auto-spend. Source: docs/z15/Z15_30_DOLLAR_PROOF_PLAN.md.

Phase	Experiment	Local?	Cost	What it proves	What it does not prove
A	Literature map (arXiv + OpenAlex + S2)	yes	$0	What prior art covers; where the field is structurally thin	Nothing about Z-15 implementation
B	Corpus compression on a small public corpus	yes	$0–1	Pipeline runs end-to-end; token + entropy numerics on a tiny set	Frontier scale; toy only
C	Tiny model perplexity / val-loss delta	partial	$1–2 on RTX 4090	Treatment beats raw text on val loss at sub-1M params	Scaling-law extrapolation
D	Agent correctness vs raw-prompt baseline	yes	$1–5	Hallucination rate, unsupported-claim count, latency, token cost	Absolute correctness
E	Proof packet assembly + SHA signing	yes	$0	Five-artefact bundle renders; `BUNDLE_SHA256` verifies	External review

How to run Phase A locally

git clone <this repo>
python3 -m z15_engine.cli scout
# dry-run by default; for a live arXiv/OpenAlex/S2 fetch:
ops/scripts/z15-literature-scout.sh --live --max-results 20
ops/scripts/z15-literature-scout.sh --emit-map
ops/scripts/z15-literature-scout.sh --emit-gap-map

How to run Phase B locally

python3 -m z15_engine.cli build-corpus --source z15/data --output /tmp/corpus.txt
python3 -m z15_engine.cli compress     --raw /tmp/corpus.txt --substrate /tmp/substrate.txt
python3 -m z15_engine.cli baseline     --raw /tmp/corpus.txt

How to run Phase C on rented GPU

See ops/vast/Z15_VAST_RUNBOOK.md. The runner refuses without Z15_ALLOW_GPU=1. Budget gate ops/scripts/z15-gpu-budget-check.sh refuses any rental whose ETA would breach either cap.

How to verify a result

Every run writes to z15_engine/memory/traces.jsonl. Recompute bundle_sha256 against the artefact set and compare to the file's recorded SHA. If they differ, the bundle is invalid.