Roadmap¶

Public, milestone-level. Last updated 2026-04-29.

Shipped¶

Milestone	What it does
A Measurement Foundation	Per-agent unit evals, replicate runs, regression detection, deterministic mode, cache visibility.
B Spec Architecture Cleanup	Shared specs + per-language profiles, prompt caching with `cache_control`, structured outputs.
C Multi-Module Architecture	Multi-MODULE Squib with cross-module DAG validation, per-module worktrees, layered output paths.
D Convergent ICP Optimization	DSPy POC closed with INCONCLUSIVE verdict on Haiku 4.5; hand-written ICP specs remain authoritative.
E Reliability, Cost, Security	Graceful agent failure, retry policy, cost budget, rate limiting, secret scan, SAST, reproducibility manifest.
F Language & Domain Coverage	Go and Rust profiles, sample-domain library (P5 OAuth2), user-supplied ProblemSpec, custom-pattern hook, richer ProblemSpec schema.
G Productionization	CI workflow, Dockerfile, JSON logger, latency/cost percentiles, resumable runs, history dashboard, versioned spec library.
H Generalized Infrastructure Layer	60 Tier C ICPs (15 categories × 4 languages) with TechSpec catalog (~130 bundled snapshots), MCDA-driven choice selection, MCP + web-fetch resolver chain with anti-poisoning.
K Cross-language end-to-end gaps	Polymorphic class-parser, dependency installer, HTTP-conventions validator, per-module criterion filtering, JS/TS Tier C parity, registry-driven dispatch. Open-source launch blockers closed.

Architectural Complexity Score (ACS) — composite metric for normalizing cost/velocity across heterogeneous problems. Implemented; calibrating across the canonical problem set.

Milestone I — SystemSpec for distributed services. Today, multi-service distributed systems require running each service as a separate ProblemSpec; cross-service contract fidelity is enforced via the registry. SystemSpec will let one declaration cover topology + services + resources together, generating all service codebases in one run.
Anthropic-only abstraction. The LLMGateway port is multi-provider-ready; concrete adapters for OpenAI / Bedrock / local-Llama are post-launch work. PRs welcome.
Hosted dashboard service. A multi-user meta-evaluation-results/ analysis service. Currently the dashboard is per-user static HTML.
Versioned spec library at v1.0. The spec library is currently 0.1.0. Tag a stable v1.0 once the catalog stabilizes after community feedback.
Reduce architect HTTP-type drift. The validator catches it with retry; long-term we want the architect to never need a retry on this class of constraint.
Per-language Tier C maturity. Today Java/Go/Rust/JS/TS tests_pass=0.00 in our event-pipeline benchmark because per-language test runners report zero on toolchain-availability fallback. Closing this requires CI-environment toolchain pinning + occasionally tightening the language-specific code-emit rules.

The 12 open design questions in docs/infrastructure_layer_design.md §10 are the primary RFC seed material. Highlights:

Generate Dockerfiles, Kubernetes manifests, Terraform. Out of scope per the design doc — the framework generates code; provisioning is operator responsibility.
Generate frontend UIs. UI codegen has different constraints (visual fidelity, design systems) that don't map cleanly onto Clean Architecture's port/adapter discipline. We'd ship a separate tool.
Do domain inference. "You're building a Twitter clone, so timelines should include self" is exactly what the framework refuses to assume. Domain conventions go in the ProblemSpec; the framework doesn't guess.

The framework follows semver. v0.x is pre-launch; v1.0 ships when:

All six languages have tests_pass > 0 on the canonical event-pipeline benchmark.
Spec library is tagged + frozen.
CI green from a fresh clone with zero env-specific assumptions.
≥3 external users have shipped real apps generated by the framework.