Ten days of June: the SDD velocity numbers, seven weeks in

By Eric Caskey · June 10, 2026 · 6 min read

AI spec-driven-development claude-code metrics engineering-velocity ci-cd

In April I published one week of SDD production numbers: a launch week, five repositories, about a hundred cumulative code PRs across the whole platform. This post reruns the same queries against the GitHub API for June 1 through June 10, 2026. Ten days, nine actively-built repositories, four products.

Two numbers carry the story. The velocity curve: 309 pull requests opened and 293 merged in ten days, with about 185 production deploys behind them, and the mix is feature work, not churn: nearly half of the merged code PRs carry a feat prefix. The platform's entire first cumulative hundred code PRs took five weeks; June now clears that every four days. And a cost note: GitHub Actions' out-of-the-box limits were never sized for this pace, and the initial configuration taps out before the code does.

Same workflow as April in every repo: spec first, implemented by Claude Code, validated by GitHub Actions, merged, deployed.

Pull requests, June 1–10#

Surface	PRs opened	PRs merged
caskeycoding.com frontend	71	67
Backend API (finance, coach, blog)	89	85
Infrastructure (CDK)	26	23
Specs repo	79	75
ericcaskey.com	8	8
Products in private development (2 products, 4 repos)	36	35
Total	309	293

The velocity curve#

That is roughly 31 PRs opened and 29 merged per day, weekends included. Put it against the April post directly: launch week was 12 PRs on the new site and just over a hundred cumulative code PRs across the whole platform since it began. Seven weeks later, the same workflow does a launch week before lunch and the platform's entire launch-to-date output every four days.

Deploys moved the same way. Through April 21 the platform had 35 cumulative production deploys. The first ten days of June produced about 185, more than five times the launch-to-date total, in a third of the time, across twice as many products.

Nothing about the workflow changed to get here. The spec still precedes the code, the same gates still run on every PR. The throughput came from running more of the loop in parallel and trusting the gates to hold, which is the argument the first post in this series made on faith and this one can make with a table.

The specs repo is still the number I watch. 75 spec PRs merged in ten days, each one preceding the code it governs. The ratio of spec PRs to code PRs has held roughly steady since launch week, which is the discipline surviving contact with volume.

What the PRs actually were#

A fair objection to any raw PR count: 293 merges could mean changing a color 293 times. The merged set can answer that itself, because every repo titles PRs with Conventional Commit prefixes, so the work classifies by its own labels.

Type (218 code-repo PRs)	Count
`feat` (new capability)	102
`fix`	44
`chore` / `docs` / `refactor` / `ci`	~44
Backlog-tagged items without a prefix	~28

Nearly half the code PRs are explicit feature work, a fifth are fixes, and most of the unprefixed backlog items are capability work too. The remaining 75 are the specs repo: the contracts ahead of that code, plus about forty literature-review notes feeding a standing research loop on the scoring engine.

Prefixes can lie, so a concrete sample of what shipped inside the window: a public contact form end to end (page, API route, Lambda, SES permission, hardened error states), a /health endpoint wired through infrastructure with a post-deploy probe in the pipeline, an admin rebuild with inline editing and a pending-review panel, per-user daily guardrails on the LLM routes with a cross-user isolation test behind them, CloudWatch alarms on the new Lambda and a DynamoDB throttle, structured-data and Open Graph coverage across both public sites, and build-time markdown rendering with syntax highlighting on the blog you are reading.

There are visual PRs in the set, a theme-token overhaul among them. The distribution is the point: the volume is mostly new capability and fixes, with a steady maintenance tax that is the cost of keeping nine repos honest, not paint applied 293 times.

Production deploys, June 1–10#

The first table is successful runs of each repo's production deploy workflow, pulled from the Actions API.

Surface	Successful CI deploys
caskeycoding.com frontend	48
Backend API (Backend + Coach + Finance lambdas)	56
Infrastructure (cdk deploy)	14
ericcaskey.com	7
Products in private development	23
Total CI deploys	148

The Actions API undercounts, because not every deploy goes through CI. One of the private-development products deploys exclusively from a local script by design, and when the minute cap bites, deploys on the other sites move to a workstation and never touch Actions. Those still leave a trail: every frontend deploy here ends in a CloudFront full-site invalidation, so the invalidation logs catch what the Actions API misses.

Local deploys (CloudFront invalidation trail)	Count
caskeycoding.com	~19
Local-script product	16
Other sites	2
Total local deploys	~37

Call it about 185 production deploys in ten days. The same logs show the cadence running underneath the deploys: 127 market-data refreshes pushed to the public site in the same window, one every 30 minutes through market hours, none of which are counted above.

What CI actually ran#

Those 293 merges rode on about 1,226 workflow runs in ten days. Every PR on the main platform triggers PR validation (tests, lint, layer build, contract drift), a secret scan, an automated Claude code review, and on the frontends a Lighthouse run.

PR validation failed 132 times across the fleet in those ten days. That is the gate doing its job: agent-written PRs that fail tests get fixed or closed before a human ever merges them. The 293 that merged are the ones that came out clean.

The footnote: default limits were sized for human pace#

One cost of that curve is worth naming. Eight of the nine repositories are private, so nearly every CI minute is billable, and the org runs on a GitHub Team plan whose initial configuration includes 3,000 Actions minutes a month. A human team merging five PRs a day lives comfortably inside that. This workflow exhausted the included minutes mid-May and CI hard-paused across the org, with jobs dying at setup rather than failing usefully.

It is managed now, with a local CI mirror as the real validation gate, local deploys when the cap bites, and an auto-pause guard so the budget dies gracefully. But the lesson generalizes: every default in the CI stack assumes a team that types at human speed. Raise the velocity an order of magnitude and the economics break before the engineering does.

Where that leaves it#

The workflow that shipped a hundred PRs in its first five weeks now merges 293 in ten days across four products, with about 185 production deploys behind them and a spec PR still ahead of every code PR. The only piece that buckled under that curve was not the code, the specs, or the review gates. It was GitHub Actions' initial configuration, built for teams that type at human speed. The workflow scaled. The defaults did not.

Previous in the series:

Keep reading

Demo

Watch the agent write

A polish agent drafts an essay against a pre-approved topic.

Read

Post

One week of SDD in production: the numbers

The previous two posts made claims. Here is what a week of the workflow looks like as a data trail, PRs, deploys, CI runs, specs merged, pulled from GitHub.

Read

Post

Autonomy is mostly knowing when to stop

I handed a backlog to Claude Fable, told it once it could merge, and let it run. It shipped seventeen items across five repos. The line that mattered was not in the work it finished. It was in the work it refused to touch.

Read

Post

An orchestration mode is only as good as its backlog

Anthropic published a guide on building a session-level orchestration mode. I built it two ways, on the CLI and on the API, and then hit the part the guide does not cover: an orchestrator that fans out is useless without a backlog of real work to fan out over.

Read

Post

SDD isn't about managing AI agents, it's about managing context

Spec-driven development reads like a methodology for controlling AI agents. It isn't. It's a methodology for managing context across stateless sessions. The spec is the persistent memory.

Read

Post

Specs in, deploys out, no keyboard

Two production sites, a blog, and two personal AI projects, shipped this week from a phone. The chain is voice dictation into Perplexity Computer, a spec, then Claude Code on the web. The interaction model is the story.

Read

Follow the work

New tools and writing as they ship — pick a channel.

RSS feed LinkedIn

Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.