SDD isn't about managing AI agents, it's about managing context

By Eric Caskey · April 19, 2026 · 4 min read

The last post ended on a claim: the reason I can ship production software from a phone is that the thinking happens in specs and the typing happens elsewhere. This post is the thesis underneath that.

Spec-Driven Development is not a methodology for managing AI agents. It is a methodology for managing context. The specs are not documentation, they are the persistent memory that makes a stateless AI useful across sessions.

Why stateless is the real problem#

Claude Code does not remember the last session. Every session starts from zero, no memory of the stack, the conventions, the decisions from last week, the reason a particular function is implemented the way it is. This is the single most underappreciated fact about working with coding agents at scale.

Without specs, every session rebuilds context from scratch. The agent reads the codebase, guesses at conventions, infers design intent from whatever it happens to open first, and produces code that is plausible but not necessarily correct. Small changes land fine. Larger changes expose every gap.

With specs, a session that opens with "continue the contact form work" already has the stack, the conventions, the prior decisions, and the acceptance criteria. The spec is the handoff. Point a fresh session at it and the session is at full speed in one read.

The stateless/persistent split is the actual architecture:

Claude Code, stateless execution. Given a spec, it writes the code. It does not need to remember anything between sessions because the spec carries everything.
Perplexity Computer, persistent decisions. It remembers across sessions. It is where the thinking happens, where decisions are made, where specs get written.

The workflow depends on the right tool handling the right layer. Trying to make Claude Code carry state across sessions fails. Trying to make Perplexity implement features inside its planning context fails. Separation is the point.

The spec structure#

Three components do almost all the work:

A specs repo, separate from code, readable by both tools. This is not a /docs folder. It is its own thing, version-controlled, review-gated. See the public demo repo for a guided walkthrough of both package shapes, it is the public teaching artifact for the methodology behind my private work.

Two package shapes, chosen per project. An integration package (steering/ + feature/ + decision/) for evolving platform work where each feature lands into a live system with existing conventions. A domain package (product/ + domain/ + contracts/ + delivery/) for bounded greenfield apps where you are defining the domain as you go. One of each lives in the demo repo.

CLAUDE.md at every code repo root. The first file any AI session reads. It is a router, identity, folder structure, and a routing table that maps task types to the right specs. Forty to fifty lines, not four hundred. Routing table columns: task, go to, read first. That is the whole shape. A sample CLAUDE.md is in the demo repo.

The CLAUDE.md file is the most expensive real estate in the whole system. Every token in it is spent on every turn, in every session, forever. Treat it like one. Put routing there. Put everything else in workspace-level CONTEXT.md files that only load when the agent enters that workspace.

What breaks#

Three failure modes are worth naming.

Spec rot against a stateless reader. A spec describing a design that shipped differently becomes a wrong answer the AI trusts. The fix is not more process, it is a discipline to either update the spec when the code changes, or mark the spec deprecated. Drift is the single biggest risk.

Context window limits. Loading a full package into every session crowds out the code itself. The CLAUDE.md routing table exists precisely to avoid this. The agent reads the router, goes to the right workspace, loads only that workspace's context, and works. Planning context doesn't contaminate implementation context. Client A doesn't leak into Client B.

Treating the spec like documentation. Documentation describes something that exists. A spec describes something to build. When a spec turns into a write-up of the finished code, it has stopped doing its job. Specs are forward-looking artifacts that happen to survive as a record. The record is a side effect.

The claim this post is making#

The discipline is in the specs. The automation is in the pipeline. And the part that lets a stateless agent ship reliable production code is not the agent, it is the context the agent loads on turn one.

Spec-driven development looks like a way of keeping AI honest. What it is actually doing is giving a stateless tool a persistent memory that lives outside it, on purpose, where humans can review it, gate it, and change it.

The next post is the numbers. Same week, same workflow, pulled straight from GitHub.

Previous in the series: I shipped two production sites and a blog from my phone

Next in the series: One week of SDD in production: the numbers

Read the methodology:

caskeycoding-specs-demo, public teaching artifact
ADR-003, Spec-Driven Development
ADR-004, SDD File Structure

Keep reading

Demo

Watch the agent write

A polish agent drafts an essay against a pre-approved topic.

Read

Post

Ten days of June: the SDD velocity numbers, seven weeks in

In April I published one week of SDD production numbers. The same data trail rerun for June 1 through 10 shows the velocity curve: 309 PRs opened, 293 merged, about 185 production deploys, and one footnote about outrunning GitHub Actions' default limits.

Read

Post

Autonomy is mostly knowing when to stop

I handed a backlog to Claude Fable, told it once it could merge, and let it run. It shipped seventeen items across five repos. The line that mattered was not in the work it finished. It was in the work it refused to touch.

Read

Post

Context architecture beats documentation dumps

Dumping the whole corpus into an AI agent makes it worse, not better. The fix is architectural: each task loads a curated slice, not everything you have. Here is the method, and the same move at three different layers: specs, sensor data, and evaluation lenses.

Read

Post

An orchestration mode is only as good as its backlog

Anthropic published a guide on building a session-level orchestration mode. I built it two ways, on the CLI and on the API, and then hit the part the guide does not cover: an orchestrator that fans out is useless without a backlog of real work to fan out over.

Read

Post

One week of SDD in production: the numbers

The previous two posts made claims. Here is what a week of the workflow looks like as a data trail, PRs, deploys, CI runs, specs merged, pulled from GitHub.

Read

Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.