Getting started¶
Generate your first runnable project.
By the end of this page you'll have a Flask Todo API generated under runs/<id>/, with a passing pytest suite, produced from a 40-line spec. Cost and wall-clock are reported in the run's eval_report.json; see the Benchmarks page for measured figures across problems.
What you need¶
- Squeaky Clean installed — see Install if you haven't.
- An Anthropic API key in
ANTHROPIC_API_KEY. - Python 3.10+ on
PATH.
Quick setup¶
1. Pick a problem spec¶
The repo ships three runnable examples; we'll use the smallest:
git clone https://github.com/garciaalan186/squeaky-clean.git
cd squeaky-clean
cat examples/todo_api/todo_problem.json
You're looking at a 40-line JSON file declaring three bounded contexts, six acceptance criteria, and Flask + local-disk persistence.
2. Generate¶
squeaky generate \
--problem-file examples/todo_api/todo_problem.json \
--infra=auto
The CLI streams per-tier progress: PrincipalArchitect → TestArchitect → atomic agents (parallel) → IntegrateModule → TestRunner.
The output is at meta-evaluation-results/meta-evaluation_<NNN>_<timestamp>/problem-set-1-todo_api-code/. We'll call this <output> below.
3. Read the report¶
cat <output>/eval_report.json
Look for tests_pass. A value above 0.9 means the framework's own pytest run found that fraction of acceptance criteria covered by passing tests.
What if tests_pass is below 0.9?
The Todo API is a stable canonical example; values below 0.9 usually mean the run hit a transient API or rate-limit issue. Re-run with --replicates 3 to surface mean ± stddev across runs.
4. Install the generated project's deps¶
The generated project ships with a requirements.txt. Install into an isolated dir so it doesn't pollute your environment:
cd <output>
pip install -r requirements.txt --target .test-deps/
5. Run the generated tests¶
PYTHONPATH=.:.test-deps python -m pytest tests/ -q
Green dots; the same tests_pass ratio you saw in the report.
Advanced: pin a specific tech stack
By default --infra=auto lets the framework's MCDA scoring pick technologies. To pin them explicitly, declare infrastructure_choices in your ProblemSpec:
"infrastructure_choices": [
{"category": "blob_storage", "technology": "s3", "version_pin": "boto3==1.34"},
{"category": "kv_cache", "technology": "redis", "version_pin": "redis-py==5.0"}
]
See Author your first ProblemSpec for the full schema.
What to do next¶
-
Author your first ProblemSpec
Walk through the JSON shape with worked examples and anti-patterns.
-
Verify a run
Decode
eval_report.json,SUMMARY.md, andarchitecture.squib. -
Architecture deep-dive
Three model tiers, agent hierarchy, why determinism is non-negotiable.
-
Squib grammar
The compact text format that flows between agent tiers.