Caskey Engineering

← Back to Blog

SDD isn't about managing AI agents, it's about managing context

The last post ended on a claim: the reason I can ship production software from a phone is that the thinking happens in specs and the typing happens elsewhere. This post is the thesis underneath that.

Spec-Driven Development is not a methodology for managing AI agents. It is a methodology for managing context. The specs are not documentation, they are the persistent memory that makes a stateless AI useful across sessions.

Why stateless is the real problem

Claude Code does not remember the last session. Every session starts from zero, no memory of the stack, the conventions, the decisions from last week, the reason a particular function is implemented the way it is. This is the single most underappreciated fact about working with coding agents at scale.

Without specs, every session rebuilds context from scratch. The agent reads the codebase, guesses at conventions, infers design intent from whatever it happens to open first, and produces code that is plausible but not necessarily correct. Small changes land fine. Larger changes expose every gap.

With specs, a session that opens with "continue the contact form work" already has the stack, the conventions, the prior decisions, and the acceptance criteria. The spec is the handoff. Point a fresh session at it and the session is at full speed in one read.

The stateless/persistent split is the actual architecture:

  • Claude Code, stateless execution. Given a spec, it writes the code. It does not need to remember anything between sessions because the spec carries everything.
  • Perplexity Computer, persistent decisions. It remembers across sessions. It is where the thinking happens, where decisions are made, where specs get written.

The workflow depends on the right tool handling the right layer. Trying to make Claude Code carry state across sessions fails. Trying to make Perplexity implement features inside its planning context fails. Separation is the point.

The spec structure

Three components do almost all the work:

A specs repo, separate from code, readable by both tools. This is not a /docs folder. It is its own thing, version-controlled, review-gated. See the public demo repo for a guided walkthrough of both package shapes, it is the public teaching artifact for the methodology behind my private work.

Two package shapes, chosen per project. An integration package (steering/ + feature/ + decision/) for evolving platform work where each feature lands into a live system with existing conventions. A domain package (product/ + domain/ + contracts/ + delivery/) for bounded greenfield apps where you are defining the domain as you go. One of each lives in the demo repo.

CLAUDE.md at every code repo root. The first file any AI session reads. It is a router, identity, folder structure, and a routing table that maps task types to the right specs. Forty to fifty lines, not four hundred. Routing table columns: task, go to, read first. That is the whole shape. A sample CLAUDE.md is in the demo repo.

The CLAUDE.md file is the most expensive real estate in the whole system. Every token in it is spent on every turn, in every session, forever. Treat it like one. Put routing there. Put everything else in workspace-level CONTEXT.md files that only load when the agent enters that workspace.

What breaks

Three failure modes are worth naming.

Spec rot against a stateless reader. A spec describing a design that shipped differently becomes a wrong answer the AI trusts. The fix is not more process, it is a discipline to either update the spec when the code changes, or mark the spec deprecated. Drift is the single biggest risk.

Context window limits. Loading a full package into every session crowds out the code itself. The CLAUDE.md routing table exists precisely to avoid this. The agent reads the router, goes to the right workspace, loads only that workspace's context, and works. Planning context doesn't contaminate implementation context. Client A doesn't leak into Client B.

Treating the spec like documentation. Documentation describes something that exists. A spec describes something to build. When a spec turns into a write-up of the finished code, it has stopped doing its job. Specs are forward-looking artifacts that happen to survive as a record. The record is a side effect.

The claim this post is making

The discipline is in the specs. The automation is in the pipeline. And the part that lets a stateless agent ship reliable production code is not the agent, it is the context the agent loads on turn one.

Spec-driven development looks like a way of keeping AI honest. What it is actually doing is giving a stateless tool a persistent memory that lives outside it, on purpose, where humans can review it, gate it, and change it.

The next post is the numbers. Same week, same workflow, pulled straight from GitHub.


Previous in the series: I shipped two production sites and a blog from my phone

Next in the series: One week of SDD in production: the numbers

Read the methodology: