// Z-15 / benchmark

Benchmark plan

The $30 proof plan. Local-first. Hard cap USD 30 total, USD 5 per day. No auto-spend. Source: docs/z15/Z15_30_DOLLAR_PROOF_PLAN.md.

PhaseExperimentLocal?CostWhat it provesWhat it does not prove
ALiterature map (arXiv + OpenAlex + S2)yes$0What prior art covers; where the field is structurally thinNothing about Z-15 implementation
BCorpus compression on a small public corpusyes$0–1Pipeline runs end-to-end; token + entropy numerics on a tiny setFrontier scale; toy only
CTiny model perplexity / val-loss deltapartial$1–2 on RTX 4090Treatment beats raw text on val loss at sub-1M paramsScaling-law extrapolation
DAgent correctness vs raw-prompt baselineyes$1–5Hallucination rate, unsupported-claim count, latency, token costAbsolute correctness
EProof packet assembly + SHA signingyes$0Five-artefact bundle renders; BUNDLE_SHA256 verifiesExternal review

How to run Phase A locally

git clone <this repo>
python3 -m z15_engine.cli scout
# dry-run by default; for a live arXiv/OpenAlex/S2 fetch:
ops/scripts/z15-literature-scout.sh --live --max-results 20
ops/scripts/z15-literature-scout.sh --emit-map
ops/scripts/z15-literature-scout.sh --emit-gap-map

How to run Phase B locally

python3 -m z15_engine.cli build-corpus --source z15/data --output /tmp/corpus.txt
python3 -m z15_engine.cli compress     --raw /tmp/corpus.txt --substrate /tmp/substrate.txt
python3 -m z15_engine.cli baseline     --raw /tmp/corpus.txt

How to run Phase C on rented GPU

See ops/vast/Z15_VAST_RUNBOOK.md. The runner refuses without Z15_ALLOW_GPU=1. Budget gate ops/scripts/z15-gpu-budget-check.sh refuses any rental whose ETA would breach either cap.

How to verify a result

Every run writes to z15_engine/memory/traces.jsonl. Recompute bundle_sha256 against the artefact set and compare to the file's recorded SHA. If they differ, the bundle is invalid.