Context architecture beats documentation dumps

By Eric Caskey · June 8, 2026 · 11 min read

The instinct, when an AI agent gets something wrong, is to give it more. More docs, more examples, more of the codebase pasted into the prompt. It feels like the responsible move. It is almost always the wrong one.

More context does not make an agent more correct. Past a point, it makes it worse. The agent loses the thread of the task, weights an irrelevant file as if it mattered, and produces code that is plausibly wrong in a way that is expensive to catch. The failure does not look like a refusal. It looks like a confident answer built on the wrong half of what you handed it.

The thesis I keep coming back to: context architecture beats documentation dumps. Each task loads a curated spec slice, not the whole corpus. The work is not writing more documentation; it is deciding what a given task is allowed to see.

What makes that architecture and not housekeeping is the constraint underneath it: the context window is finite, and it is the scarce resource everything else is competing for. Which also means the problem is not new. Reaching a store far larger than your working memory is one of the oldest problems in systems engineering, and the operating system solved its shape decades ago: directories to partition, paths to find what you need. What follows is that idea ported to an agent's context: the method, the three places I have watched it hold, and the part that costs something, because it does.

Why more makes it worse#

A coding agent does not read context the way a person skims a wiki and ignores the irrelevant parts. Everything in the window is a candidate signal. Give it the auth module, the billing service, three architecture decisions, and last quarter's migration notes, and ask it to fix a date-formatting bug, and you have not given it more help. You have given it more ways to be wrong. It can anchor on a convention from the billing service that does not apply, or reconcile two decisions that were never meant to be read together.

There is also a hard ceiling, and it is the whole reason this is an engineering problem and not a filing preference: the context window is finite. It is the scarce resource every other concern is competing for. Every token spent on context the task does not need is a token not spent on the code it does, so loading a full spec package into every session crowds out the work itself. The corpus is not the asset. The right slice of it, in the window at the right moment, is.

So the question stops being "what does the agent need to know about the system" and becomes "what does this task need to see, and nothing else." That is an architecture question, not a documentation one.

The method: partition, then route#

The method has two halves. Partition the corpus, then route each task to its slice.

Partitioning means the specs are organized so that a slice is a coherent, loadable unit: by service, by feature domain, by architecture layer. Not one monolithic document, and not a thousand undifferentiated files either. A partition is the amount of context one kind of task needs to be done well.

Routing means a task arrives and something decides which slice it loads. The cheapest version of this is a router file at the root of each repo, a CLAUDE.md that is forty to fifty lines, not four hundred. Identity, folder structure, and a table that maps task types to the workspace and the context file they read first. That file is the most expensive real estate in the system, because every token in it is spent on every turn of every session, forever. So it holds routing and nothing else. The heavy context lives in workspace files that load only when a task enters that workspace.

Concretely, this is the filesystem, and it solves the same half the OS always did, addressing rather than relevance. A disk holds far more than memory can, so a process never maps the whole disk into itself; the OS exposes a hierarchical namespace and the process opens the path it needs. Partitioning is directories. Routing is path resolution. The CLAUDE.md is a resolver, a workspace is a directory, and a session opens the slice it needs and leaves the rest on disk. The context window is the new scarce memory, and the discipline for reaching a large store through a small window is as old as the filesystem.

The discipline this produces: planning context never contaminates implementation context, one client's specs never leak into another's, and a fresh session reaches full speed on one read instead of reconstructing the world from whatever file it happened to open first. I wrote up the mechanics of that in SDD isn't about managing AI agents, it's about managing context. This post is the principle behind those mechanics.

Partitioning needs linking, or it's just silos#

Partitioning has an obvious failure mode. You cut the corpus into clean slices, and then the slices stop knowing about each other. That is not architecture. That is a filing cabinet. The value was never the cutting: it was that the right slice loads with the right connections still attached.

So linking matters as much as partitioning, in a specific way: a link is how a task learns it needs a second slice. The routing table is itself the first and most important link: task to slice. Without it a bounded slice is an orphan, because nothing connects the work to the context. Partitioning makes the slices small. The router is what makes them findable.

The sharper case is the seam between two slices that have to move together. I have hit this on my own stack. A CloudFront rewrite function was pinned to one trailing-slash setting while the framework config flipped to the other, and for a few hours every sub-route on the site served the home page. Two partitions, two sides of one contract, no explicit link between them, so they drifted, and production broke. (I wrote that incident up here.) The lesson is not "stop partitioning." It is "name the link." The fix was to make the two sides reference each other as a pair that changes in lockstep, so the next person editing one is pointed straight at the other.

That is the whole discipline of linking: sparse and directional. The router points a task at its slice. A spec cites the decision that governs it. A config pair names its counterpart. What you do not do is wire everything to everything: that just reconstitutes the corpus you partitioned away, and you are back to dumping. The skill is choosing which seams get a link and leaving the rest unconnected, on purpose.

The solid edges are routing: each task drops into one slice. The dotted edges are the links that keep the slices honest: shared conventions inherited from the hub, and the named seam between two partitions that must change together. Subtract everything else.

The same move, three different layers#

The reason I trust this is repetition. The same move keeps paying off where the scale, the corpus, and even the kind of context are completely different.

At enterprise scale, the curated layer is specs. The largest instance is around 138 spec files partitioned across roughly a dozen services by feature domain and architecture layer. The agents are not one generalist that has read everything; they are a small set of specialist agents (on the order of seven) each scoped to a layer and inheriting shared conventions from a central hub. A specialist loads something like five to eight thousand tokens of relevant context per task, not the whole corpus. The same specs serve three audiences from one source: the agents that build, the engineers who review at design time, and a generated wiki for everyone downstream. A monolithic dump could not serve any one of those well, let alone all three.

That is the version with a spec corpus and a team of agents behind it, and the scale is the part that is easiest to dismiss as a big-company luxury. So strip the scale away. Two of my own projects run on the same move, and neither has a spec corpus at all. That is the actual claim: context architecture is not a discipline you earn at 138 specs. It reappears the moment an agent has to decide anything from data, which is to say almost immediately.

In my marathon coach, the curated layer is sensor data. The coach makes a go or no-go call on every workout, and the context that decides it is not my entire Garmin history. It is a deliberately narrow slice: the recovery signals that actually gate a hard session (resting heart rate, heart-rate variability, last night's sleep, training readiness) read against recent training load. That slice is the architecture. When the coach tells me to back off a tempo run, it is weighing this morning's recovery against the plan; a generic plan handed the full firehose of every metric a watch emits would just run the calendar and tell me to do the tempo because it is tempo day. The curation is defensive, too: a wellness metric that fails to load stays empty rather than becoming a fake zero, because a wrong number in that slice would poison the recommendation more quietly than a missing one ever could. I wrote that integration up here. It is the platform pattern one layer down: bound what the agent sees to what the decision needs.

In my investment review tool, the curated layer is perspective. A five-persona committee evaluates a position, and each persona receives only the financial dimensions relevant to its lens: the growth view is not handed the same inputs as the balance-sheet view, and none of them gets the whole dump. The scoping is the design, not a formatting choice: a moat read is grounded in the current financial indicators that bear on it, not in free-form commentary over everything at once. Give every persona the whole dump and they converge into one mushy averaged take; give each one its slice and the disagreement between them becomes the signal you were actually after.

Three layers (specs, sensor data, evaluation lenses) and the same result every time. The corpus changes. The move does not: bound what the agent sees to what the task needs, and the output gets sharper, not poorer.

Something has to read the slice#

Partitioning, routing, and linking are all about organizing context. They say nothing about what consumes a slice once it loads, and that is the other half. A slice can be read by a script, a skill, an agent, or a guardrail that fires in CI, and the choice matters as much as the partition did. The test I use is determinism: push each slice to the leanest thing that can handle it. Deterministic, mechanical work is a script. Deterministic enforcement (the checks that must never be skipped) is a hook that runs without anyone remembering to invoke it. Fixed steps with a little judgment are a skill, which is really a curated slice made executable: it loads one procedure and the minimal context that procedure needs, not the whole repo. Only open-ended judgment gets an agent.

That ordering is the same discipline as the rest of this post, one level up. Routing every task to a single all-powerful agent is the execution-layer version of dumping the whole corpus: too much latitude over too much context. So you bound it (determinism down, judgment up) and you scope each agent to its slice the way you scoped the spec to its task. A standard procedure, an SOP, a deploy runbook, the written constitution I handed a self-governing box, is itself a curated slice, loaded when its task fires and not before. The strictest version of the idea is a guardrail that can only pattern-match a command, a partition of authority rather than context (file permissions are the fifty-year-old version), but the same move. Which work becomes a skill, which an agent, which a hook is its own essay. The principle is the one you already have: curate what each piece sees, and curate what each piece is allowed to do.

What it costs#

This is not free, and pretending it is would undercut the point. The cost is the links. A task that genuinely spans partitions has to load more than one slice, and the link between them has to exist and stay current: that is real maintenance, and it carries the same drift risk a spec does. A cross-cutting change, one that touches the orchestration layer and the monitoring layer at once, only stays safe as long as the seam between them is named and kept honest. At a dozen services that overhead is manageable. I would not claim it scales linearly forever, and I would be suspicious of anyone who did.

The honest framing is that you are trading the comfort of "it's all in there somewhere" for the discipline of deciding, per task, what there means. That trade is worth making because the comfort was an illusion. "It's all in there" is exactly the condition that produces confident, wrong output.

The shape of it#

Restraint is the value. Context architecture is the method. The corpus can be as large as it needs to be (138 specs, a dozen services, years of training data) as long as no single task has to swallow it whole. The window is small on purpose. The disk was always bigger than the memory, and the answer was never a bigger memory; it was an address, and the discipline to open only what the task needs. The skill is curation, not accumulation: knowing what to withhold from a given task so that what remains is the part that matters.

If you take one thing from this: the next time an agent gets it wrong, resist the urge to add. Ask what it should not have been looking at.

The methodology underneath this: SDD isn't about managing AI agents, it's about managing context

See the shape of the system:

caskeycoding-specs-demo: two example spec packages and a sample CLAUDE.md
ADR-003, Spec-Driven Development
ADR-004, SDD File Structure

Keep reading

Demo

Watch the agent write

A polish agent drafts an essay against a pre-approved topic.

Read

Post

When your method repo and your product repo don't talk to each other

I built a method as a public repo and the product that runs it as two private ones, and none of them treated the others as a source of truth. The domain enum lived in four places. A persona drifted between its lens file and its API contract. Here is what that cost, and the one structural change that turned the whole class of bug into a failing test.

Read

Post

An orchestration mode is only as good as its backlog

Anthropic published a guide on building a session-level orchestration mode. I built it two ways, on the CLI and on the API, and then hit the part the guide does not cover: an orchestrator that fans out is useless without a backlog of real work to fan out over.

Read

Post

SDD isn't about managing AI agents, it's about managing context

Spec-driven development reads like a methodology for controlling AI agents. It isn't. It's a methodology for managing context across stateless sessions. The spec is the persistent memory.

Read

Post

Spec-Driven Development and the Folder Architecture That Makes It Work

Why spec-driven development and structured folder architecture are the missing infrastructure for AI-assisted engineering: methodology, common mistakes, and where to start.

Read

Post

The schema is the product

SpecSelf looks like a set of features: coherence checks, persona rotation, review cadences, an audit trail. Every one of them was implied by ten frontmatter fields decided on day one. A life-OS is not a feature list. It is a schema decision, and the features are what fall out of making it correctly.

Read

Follow the work

New tools and writing as they ship — pick a channel.

RSS feed LinkedIn

Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.