Why Squeaky Clean?¶
A 5-minute read on what the framework does and why it exists.
The problem¶
Small-parameter models frequently hallucinate, forcing engineers toward high-parameter alternatives where computational costs scale quadratically with context. Squeaky Clean is an opinionated, semi-deterministic agentic software development framework designed to break this cycle.
LLM codegen also has a Clean Architecture problem. "Write me a Spring Boot service that publishes to Kafka" from a single high-parameter prompt produces a single 200-line file with the controller talking directly to KafkaTemplate, the entity mutating to JSON inline, and zero separation between domain logic and infrastructure. It compiles. It runs. It's unmaintainable the moment you swap Kafka for SQS.
What's missing is architectural discipline: the discipline that says domain entities don't import frameworks; that ports live in the application layer and adapters live in the infrastructure layer; that crossing a bounded-context boundary requires an explicit contract. That discipline is what separates a one-shot demo from code that survives a refactor.
The approach¶
Squeaky Clean (or Squeaky) capitalizes on the modularity and granularity of Clean Architecture, SOLID principles, and GoF + DDD patterns. By doing so, it maximizes parallelization and wall-clock velocity while minimizing both the "hallucination blast radius" and operational costs.
The framework defines an Architectural DSL to orchestrate atomic, pattern-specialized agents that run efficiently on compact, low-parameter models. The pipeline splits into three layers, each with its own constraints:
- PrincipalArchitect (Architect tier). Reads a
ProblemSpecand emits a structuredArchitectureSpecas a Squib. Decides bounded contexts, classes per context, layer assignment, dep edges. Deterministic by default. - ImplementClass (ICP tier — Implements Clean Pattern). For each class in the architecture, runs a parallelizable atomic agent. Each ICP specializes in exactly one GoF/DDD pattern (or a Tier C infrastructure category). One file in, one file out.
- IntegrateModule + ValidateArchitecture. Assembles the per-class outputs into a runnable project, validates dependency rules, runs the generated test suite, computes metrics.
The Squib between tiers is a frozen, validated grammar (~200 chars per class, machine-checkable), ensuring the cheaper tier never has to guess what the more capable tier meant.
What you write¶
A 40-line ProblemSpec JSON. See Author your first ProblemSpec for the full shape.
What you get¶
A runnable project: src/, tests/, requirements.txt, main.py, plus an eval_report.json with the tests_pass ratio.
What's different¶
- Architectural rigor enforced. Domain imports nothing, application imports only domain, infrastructure implements domain ports. The framework's own
dependency_rule.pyvalidator catches violations in generated code and in its own source. SOLID + GoF + DDD are the shared vocabulary between agent tiers; the rigid agent contracts keep quality consistent at the cheaper execution tier. - Pattern-specialized atomic agents. One pattern per agent. One agent per file. Sub-80-line system prompts. 60 infrastructure agents across 15 categories. Each agent's contract is tight enough that a compact-tier language model satisfies it without supervision.
- Parallel fan-out across distributed architectures. Architects emit a multi-MODULE plan; agents run concurrently across classes within a module and across modules whose dependencies have resolved.
- Compact-tier cost. Most token volume routes to compact, low-parameter models; the larger tier is reserved for architectural decisions. Cost and wall-clock figures by problem are on the Benchmarks page.
- Cross-service contract fidelity. When two services produce/consume the same Kafka topic, the Contract Registry enforces field-shape agreement across language boundaries with case-tolerant validation.
- Six languages from one spec. Switch
target_languagetojavaand you get the equivalent Spring Boot project. Same architectural shape, idiomatic SDK calls. Deterministic replay across runs.
What it's not¶
- Not a one-shot LLM call. Squeaky Clean orchestrates dozens of LLM calls per run, parallelized, with prompt caching and a strict per-tier cost budget.
- Not a substitute for understanding your domain. The framework asks you to declare your bounded contexts in
required_bounded_contextsand your acceptance criteria inacceptance_criteria. Garbage spec → garbage generation. - Not a code-completion tool. Squeaky Clean produces complete projects from a spec; it doesn't run inside your editor.
What to do next¶
- Get started — generate the Todo API in 5 minutes.
- Architecture deep-dive — three model tiers + agent hierarchy.
- Author your first ProblemSpec — walkthrough + best practices.