An orchestration mode is only as good as its backlog

By Eric Caskey · May 31, 2026 · 4 min read

AI claude-code claude-api orchestration multi-agent spec-driven-development

Anthropic published a guide on building an orchestration mode: a session-level mode that grants the model standing consent to fan out to parallel subagents, switched on and off with mid-conversation system messages. No special API parameter, just three documented pieces stacked together. I read it, built it, and then ran straight into the thing it does not talk about. More on that at the end.

What the mode actually is#

The clever part of the guide is that "orchestration mode" is not a feature. It is an emergent behavior you assemble from three primitives:

High effort. Every request runs at a high effort level so the model optimizes for the most thorough answer rather than the cheapest one.
Mode reminders. Mid-conversation system messages carry the state: a full instruction block when the mode turns on, a one-line refresher every few turns, an exit notice when it turns off. They are placed after the user turn so the cached prefix ahead of them stays intact.
Standing consent. The fan-out tool's own description says that while a system message confirms the mode is on, the model should author and run a workflow by default, without stopping to ask each time.

Turn those three on together and the agent stops being polite about parallelism. It decomposes the task, spins up a wave of subagents, and synthesizes the results, every time, until you tell it to stop.

Our setup: two backends, same pattern#

I wanted this both ways. The API version bills per token. The CLI version runs on a Claude Code subscription, which is the one I actually wanted to lean on. So I built both as one-shot tools that share the same shape.

API. A faithful port of the guide's reference implementation into a single orchestration_mode.py. A ModeAgent loop holds the message history and toggles the mode with set_mode(). A Workflow tool fans subtasks across a thread pool, capped at ten. Each subagent is its own nested loop with a shell tool and a structured report_findings tool, isolated so one failure returns an error string instead of crashing the run. One command, one orchestrated turn, one synthesized report.

CLI. A custom /orchestrate slash command for Claude Code. Same pattern, different plumbing:

API version	CLI equivalent
Custom `Workflow` tool that fans out	The built-in Agent / Task tool, parallel subagents
High effort level (`xhigh`)	Opus plus extended thinking ("ultrathink"); the exact analog is Claude Code's `ultracode` mode, which pairs `xhigh` with standing multi-agent consent
Standing consent in the tool description	The instructions inside the slash command

The command scouts the task inline first, decomposes it into independent subtasks, fans out a wave of parallel subagents, runs a second adversarial wave that checks the first wave's findings against the source, and then synthesizes. Invoking the command is itself the standing consent, which matters because the default posture is to not spawn agents unasked.

Both are honest one-shots. You point them at a task, they multiply effort across it, you get one consolidated answer back.

The part the guide leaves out#

Here is where I got stuck, and it is not a technical problem. It is a purpose problem.

An orchestrator is an engine for multiplying effort across a work-list. That is the whole value: take N independent units of work, run them at once, verify them against each other, synthesize. The mode is built to treat cost as a non-constraint precisely because the payoff is supposed to be a lot of real work done in parallel.

But an engine with no work-list is just an expensive way to answer one question with ten agents instead of one. If I fire /orchestrate at a vague prompt, the model spends its first move inventing subtasks, and invented subtasks are exactly the low-value, overlapping busywork the fan-out was supposed to avoid. The mode does not supply purpose. It assumes you already have it.

So the missing input is a backlog. Not a vague intention, an actual queue of independent, verifiable units of work the orchestrator can pull from and grind through. The quality of the run is bounded by the quality of that list. Garbage backlog, garbage fan-out, real bill.

This lands in a familiar place for how I already work. Spec-driven development is, among other things, a backlog generator. A specs repo full of feature specs, each with acceptance criteria, is a work-list that is already decomposed into independent and verifiable units. The followups files I leave at the end of a session are the same thing at smaller grain. The orchestrator does not need me to think up tasks on the spot. It needs me to keep a good backlog, and then it is the thing that works through it.

That is the next build. The mode is the engine, and it runs. The backlog is the fuel, and that is on me. An orchestration mode is only as good as the list of work you hand it.

Keep reading

Demo

Watch the agent write

A polish agent drafts an essay against a pre-approved topic.

Read

Case study

Multi-Region Workflow Orchestration Platform

Platform supporting millions of executions across multiple global regions, expanding adoption across Amazon.

Read

Post

Autonomy is mostly knowing when to stop

I handed a backlog to Claude Fable, told it once it could merge, and let it run. It shipped seventeen items across five repos. The line that mattered was not in the work it finished. It was in the work it refused to touch.

Read

Post

Ten days of June: the SDD velocity numbers, seven weeks in

In April I published one week of SDD production numbers. The same data trail rerun for June 1 through 10 shows the velocity curve: 309 PRs opened, 293 merged, about 185 production deploys, and one footnote about outrunning GitHub Actions' default limits.

Read

Post

Context architecture beats documentation dumps

Dumping the whole corpus into an AI agent makes it worse, not better. The fix is architectural: each task loads a curated slice, not everything you have. Here is the method, and the same move at three different layers: specs, sensor data, and evaluation lenses.

Read

Post

One week of SDD in production: the numbers

The previous two posts made claims. Here is what a week of the workflow looks like as a data trail, PRs, deploys, CI runs, specs merged, pulled from GitHub.

Read

Follow the work

New tools and writing as they ship — pick a channel.

RSS feed LinkedIn

Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.