← Back to Blog

When CI Costs More Than It Saves

In my June production numbers post, one paragraph near the end did a lot of quiet work: the only thing that buckled under ten days of agent-paced shipping was not the code or the review gates, it was GitHub Actions' default configuration. I called it a footnote and said the lesson generalized. This is the post that footnote pointed at.

Two takeaways, and I will say both again at the end. First, the failure mode: every default in a hosted CI stack is priced and tuned for a team that merges a handful of PRs a day, and when an AI-native workflow raises that an order of magnitude, the billing model breaks before the engineering does. The constraint stops being "is the code correct" and becomes "can I afford to ask." Second, the fix: I moved validation and deploys onto a workstation, and the workaround turned out to be a better default than the thing it replaced. Faster feedback, zero marginal cost per run, and a budget that physically cannot run away.

What actually broke#

The platform is nine repositories, eight of them private, so nearly every CI minute is billable. The org runs on a GitHub Team plan whose included allowance is 3,000 Actions minutes a month. A human team merging five PRs a day never sees the edge of that number. This workflow ran into it mid-May, and when the included minutes were gone, CI hard-paused across the entire org. Jobs did not fail usefully. They died at setup in a few seconds, which looks exactly like a broken pipeline until you realize the pipeline is fine and the meter is empty.

That is the part worth sitting with. The gates were doing their job right up until the moment the account could no longer pay for them to run. Nothing was wrong with the tests. The economics simply assumed a slower team.

Why the reflex answer was wrong#

The obvious move is to raise the spending limit and buy more minutes. I did the arithmetic on that and stopped. At this PR rate, paying per-minute for hosted runners on a fleet of private repos is an open-ended bill that scales with exactly the thing I am trying to increase. Every additional unit of velocity would cost more money, forever, to validate work that a machine sitting three feet away could validate for free.

So the goal changed from "buy enough CI" to "stop renting CI by the minute for work I can run locally." The hosted runner was never doing anything magic. It was running the same checks any laptop can run.

The local CI mirror#

The replacement is a script that mirrors each repo's pr-validation.yml step for step on the workstation: the test suite, the linter, the Lambda layer build, the contract-drift check, and a secret scan. Same checks, same order, same pass or fail, run before anything is pushed. A red result shows up in seconds on the machine where the code was written, instead of minutes later in a tab, billed.

The one real hazard with a local mirror is drift. The moment the workstation script and the committed workflow disagree, "passes locally" stops meaning "passes CI," and you have built a gate that lies. So the mirror carries a drift guard: if a repo's actual pr-validation.yml changes, the local runner warns that it is now out of sync and needs to be reconciled. The mirror is only trustworthy for as long as it provably matches the thing it stands in for, and the guard is what keeps that claim honest.

I will not pretend this is free of cost. A local run skips whatever a hosted runner does that a workstation cannot reproduce, and those skips are coverage gaps you have to know about rather than discover later. The honest framing is not "local CI is strictly better," it is "local CI is better for this fleet, with eyes open about what it does not cover."

Local deploys, same logic#

Validation was half of it. The other half was shipping. When the cap bites, deploys move to the workstation too: build from a mainline checkout, push to S3, invalidate CloudFront, done. One of the products in private development now deploys exclusively from a local script by design, not as a fallback.

This path has its own set of traps, and they are sharp enough that I keep a pre-deploy review for them: deploying with an empty environment file bakes a broken build, a careless sync with --delete can clobber files a Lambda owns, and a handful of settings only work in lockstep pairs where changing one without the other breaks production. None of those are CI's job to catch, which is precisely why moving off CI made them my job to catch deliberately. The trail is still auditable, every frontend deploy ends in a full CloudFront invalidation, so the invalidation logs record what the Actions API no longer sees.

The budget that cannot run away#

The last piece is a guard so the failure mode never recurs silently. An auto-pause workflow watches the Actions spend and disables CI when it approaches the cap, so the budget dies gracefully instead of mid-deploy. The effect is that hosted CI is now a small, bounded, best-effort convenience layered on top of a local process that is the real gate. The monthly Actions bill is capped at single digits and stays there, because the workstation absorbs the volume that used to meter.

The whole trade, in one view:

Hosted CI, rented by the minute Local mirror plus local deploy
Cost model Per-minute, scales with the velocity you want to raise Fixed, capped at single digits a month
Feedback Minutes later, in another tab Seconds, on the machine that wrote the code
Marginal cost per run Billable Zero
Audit trail Actions logs CloudFront invalidation logs
The catch None you manage Needs a drift guard so local never lies

What I would tell someone else#

Do this when most of your repos are private, your merge rate is high enough that per-minute billing is a real line item, and you have a trusted machine to run on. Do not do this if your CI does something a workstation genuinely cannot, if your team is large enough that "run it locally" means "run it inconsistently on twelve different machines," or if you cannot keep the local mirror provably in sync with the real workflow. The drift guard is not optional decoration. It is the only thing standing between "passes locally" and a comfortable lie.

Recap#

Two takeaways, as promised. First, hosted CI defaults are sized for human pace, and at agent velocity the bill breaks before the engineering does, which quietly changes your constraint from correctness to affordability. Second, moving validation and deploys onto a workstation, with a drift guard to keep the local mirror honest and an auto-pause guard to keep the budget bounded, was not a grudging workaround. It was a faster, cheaper, more controllable default that I would now choose on purpose. The workflow scaled. The defaults did not, so I replaced them.


Related:

Keep reading

Post

Prompt caching is a prefix match, not a flag

Prompt caching looks like a flag you flip for a cheaper bill. It is really the reuse of a stored prompt prefix, governed by three rules, and applying it across four parts of my own system showed where it pays, where it quietly does nothing, and where it is not even my decision. With the token counts I measured to check.

Read
Post

Hello Again, Opus

Four days after I said goodbye to Opus, an export-control directive pulled Fable 5 offline and the fallback became the workhorse again. What I shipped in the window, what it cost, and the model-tiering plan for when Fable comes back.

Read
Post

Ten days of June: the SDD velocity numbers, seven weeks in

In April I published one week of SDD production numbers. The same data trail rerun for June 1 through 10 shows the velocity curve: 309 PRs opened, 293 merged, about 185 production deploys, and one footnote about outrunning GitHub Actions' default limits.

Read
Post

Tell Me Everything That's Wrong: Validation as a Batch Operation

Why good validation reports every problem at once instead of failing on the first one, and how to build the accumulator, phasing, and structured errors that make it work.

Read
Post

The caskeycoding.com tech stack at a glance

A high-level tour of the technologies running this site: Next.js on CloudFront, Python Lambdas behind API Gateway, DynamoDB plus S3, Anthropic's API with a Bedrock fallback, and AWS CDK wiring it together.

Read
Post

Welcome: Building Platforms for Scale

An introduction to the blog, reflections on infrastructure monitoring, platform leadership, and building systems that empower organizations to innovate safely at scale.

Read
Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.