Skip to content

Roadmap

Public, milestone-level. Last updated 2026-04-29.

Shipped

Milestone What it does
A Measurement Foundation Per-agent unit evals, replicate runs, regression detection, deterministic mode, cache visibility.
B Spec Architecture Cleanup Shared specs + per-language profiles, prompt caching with cache_control, structured outputs.
C Multi-Module Architecture Multi-MODULE Squib with cross-module DAG validation, per-module worktrees, layered output paths.
D Convergent ICP Optimization DSPy POC closed with INCONCLUSIVE verdict on Haiku 4.5; hand-written ICP specs remain authoritative.
E Reliability, Cost, Security Graceful agent failure, retry policy, cost budget, rate limiting, secret scan, SAST, reproducibility manifest.
F Language & Domain Coverage Go and Rust profiles, sample-domain library (P5 OAuth2), user-supplied ProblemSpec, custom-pattern hook, richer ProblemSpec schema.
G Productionization CI workflow, Dockerfile, JSON logger, latency/cost percentiles, resumable runs, history dashboard, versioned spec library.
H Generalized Infrastructure Layer 60 Tier C ICPs (15 categories × 4 languages) with TechSpec catalog (~130 bundled snapshots), MCDA-driven choice selection, MCP + web-fetch resolver chain with anti-poisoning.
K Cross-language end-to-end gaps Polymorphic class-parser, dependency installer, HTTP-conventions validator, per-module criterion filtering, JS/TS Tier C parity, registry-driven dispatch. Open-source launch blockers closed.

In progress

  • Architectural Complexity Score (ACS) — composite metric for normalizing cost/velocity across heterogeneous problems. Implemented; calibrating across the canonical problem set.

Planned (post-launch)

  • Milestone I — SystemSpec for distributed services. Today, multi-service distributed systems require running each service as a separate ProblemSpec; cross-service contract fidelity is enforced via the registry. SystemSpec will let one declaration cover topology + services + resources together, generating all service codebases in one run.
  • Anthropic-only abstraction. The LLMGateway port is multi-provider-ready; concrete adapters for OpenAI / Bedrock / local-Llama are post-launch work. PRs welcome.
  • Hosted dashboard service. A multi-user meta-evaluation-results/ analysis service. Currently the dashboard is per-user static HTML.
  • Versioned spec library at v1.0. The spec library is currently 0.1.0. Tag a stable v1.0 once the catalog stabilizes after community feedback.
  • Reduce architect HTTP-type drift. The validator catches it with retry; long-term we want the architect to never need a retry on this class of constraint.
  • Per-language Tier C maturity. Today Java/Go/Rust/JS/TS tests_pass=0.00 in our event-pipeline benchmark because per-language test runners report zero on toolchain-availability fallback. Closing this requires CI-environment toolchain pinning + occasionally tightening the language-specific code-emit rules.

Open RFCs

The 12 open design questions in docs/infrastructure_layer_design.md §10 are the primary RFC seed material. Highlights:

  • Build-time vs eval-time TechSpec resolution
  • MCDA weights — problem-specific or framework-default
  • Same category supporting concurrent technologies in one project
  • SDK breaking changes between bundled and live-fetched
  • TechSpecs language-specific or shared

What we won't do

  • Generate Dockerfiles, Kubernetes manifests, Terraform. Out of scope per the design doc — the framework generates code; provisioning is operator responsibility.
  • Generate frontend UIs. UI codegen has different constraints (visual fidelity, design systems) that don't map cleanly onto Clean Architecture's port/adapter discipline. We'd ship a separate tool.
  • Do domain inference. "You're building a Twitter clone, so timelines should include self" is exactly what the framework refuses to assume. Domain conventions go in the ProblemSpec; the framework doesn't guess.

Versioning

The framework follows semver. v0.x is pre-launch; v1.0 ships when:

  1. All six languages have tests_pass > 0 on the canonical event-pipeline benchmark.
  2. Spec library is tagged + frozen.
  3. CI green from a fresh clone with zero env-specific assumptions.
  4. ≥3 external users have shipped real apps generated by the framework.