Getting started¶

Generate your first runnable project.

By the end of this page you'll have a Flask Todo API generated under runs/<id>/, with a passing pytest suite, produced from a 40-line spec. Cost and wall-clock are reported in the run's eval_report.json; see the Benchmarks page for measured figures across problems.

What you need¶

Squeaky Clean installed — see Install if you haven't.
An Anthropic API key in ANTHROPIC_API_KEY.
Python 3.10+ on PATH.

Quick setup¶

1. Pick a problem spec¶

The repo ships three runnable examples; we'll use the smallest:

git clone https://github.com/garciaalan186/squeaky-clean.git
cd squeaky-clean
cat examples/todo_api/todo_problem.json

You're looking at a 40-line JSON file declaring three bounded contexts, six acceptance criteria, and Flask + local-disk persistence.

2. Generate¶

squeaky generate \
    --problem-file examples/todo_api/todo_problem.json \
    --infra=auto

The CLI streams per-tier progress: PrincipalArchitect → TestArchitect → atomic agents (parallel) → IntegrateModule → TestRunner.

The output is at meta-evaluation-results/meta-evaluation_<NNN>_<timestamp>/problem-set-1-todo_api-code/. We'll call this <output> below.

3. Read the report¶

cat <output>/eval_report.json

Look for tests_pass. A value above 0.9 means the framework's own pytest run found that fraction of acceptance criteria covered by passing tests.

What if tests_pass is below 0.9?

The Todo API is a stable canonical example; values below 0.9 usually mean the run hit a transient API or rate-limit issue. Re-run with --replicates 3 to surface mean ± stddev across runs.

4. Install the generated project's deps¶

The generated project ships with a requirements.txt. Install into an isolated dir so it doesn't pollute your environment:

cd <output>
pip install -r requirements.txt --target .test-deps/

5. Run the generated tests¶

PYTHONPATH=.:.test-deps python -m pytest tests/ -q

Green dots; the same tests_pass ratio you saw in the report.

Advanced: pin a specific tech stack

By default --infra=auto lets the framework's MCDA scoring pick technologies. To pin them explicitly, declare infrastructure_choices in your ProblemSpec:

"infrastructure_choices": [
  {"category": "blob_storage", "technology": "s3", "version_pin": "boto3==1.34"},
  {"category": "kv_cache", "technology": "redis", "version_pin": "redis-py==5.0"}
]

See Author your first ProblemSpec for the full schema.

What to do next¶

Author your first ProblemSpec

Walk through the JSON shape with worked examples and anti-patterns.

Authoring guide
Verify a run

Decode eval_report.json, SUMMARY.md, and architecture.squib.

Verifying
Architecture deep-dive

Three model tiers, agent hierarchy, why determinism is non-negotiable.

Architecture
Squib grammar

The compact text format that flows between agent tiers.

Reference