Tell Me Everything That's Wrong: Validation as a Batch Operation

By Eric Caskey · June 4, 2026 · 5 min read

There is a particular kind of slow torture that software inflicts on its users, and most of us have stopped noticing it. You submit a form, a config file, a deploy. It rejects the first thing it finds. You fix that one thing and submit again. It rejects the second thing. You fix that. It rejects the third. Each round trip costs you a context switch, and if the feedback loop runs over a network or a build, each one costs you minutes you will not get back.

The system knew about all three problems the first time. It just didn't tell you.

I keep coming back to CloudFormation as the counter-example. When you submit a template with ten mistakes, it does not stop at the first missing property. It validates the entire template and hands you back the full set. You fix ten mistakes in one pass and move on. That is a design decision about whose time matters, and it is the difference between a tool people trust and a tool people dread.

The Cost You're Actually Optimizing#

The instinct to fail on the first error comes from a good place. In a hot request path, fail-fast is correct: stop early, shed load, don't waste cycles on work you're going to throw away. But validation is not a hot path, and the resource you're spending is not CPU. It's human attention.

The cost model is the whole argument. When you fail fast, you are optimizing for the machine's time on the unhappy path. When you collect every error, you are optimizing for the human's time across the entire fix cycle. A first-error-only validator turns a single ten-second review into ten separate submit-wait-read-fix loops, and the wait in the middle is where trust goes to die. People start to experience your system as flaky even when it is behaving deterministically. The real problem is partial reporting, but that is not how it feels from the outside.

This is the same principle that makes fail-fast the wrong default for safety guardrails: the goal is not to exit quickly, it is to surface the full set of blocking issues so a person can fix them in one pass. Validation is that idea pointed at the user instead of the operator.

Collect, Then Decide#

The structural change is small and it is always the same shape. Instead of returning the moment a check fails, you run every independent check, accumulate the failures, and make the proceed-or-reject decision once, at the end, against the complete set.

def validate_order(order):
    errors = []

    if not order.get("customer_id"):
        errors.append(Error("customer_id", "required", "Customer ID is missing"))

    if order.get("quantity", 0) <= 0:
        errors.append(Error("quantity", "out_of_range",
                            "Quantity must be greater than 0"))

    if order.get("ship_date") and order["ship_date"] < today():
        errors.append(Error("ship_date", "in_past",
                            "Ship date cannot be in the past"))

    if errors:
        raise ValidationError(errors)   # all of them, not the first
    return order

The trap to avoid is the early return or raise buried inside each check. The instant one check can short-circuit the function, you are back to first-error-only and you have quietly recreated the torture. Independent checks must not be allowed to serialize the user's time.

Don't Drown Them in Cascades#

There is a failure mode on the other side, and it's worth naming because it scares people away from batching in the first place. If you naively run every check against malformed input, one root cause can spawn fifty downstream errors. A single missing closing brace in a config file, reported as forty "unexpected token" errors, is worse than failing fast. You have traded a short loop for a wall of noise that buries the one thing that matters.

The answer is phased validation. You validate in layers, and you only advance to the next layer once the current one is clean. Parse first. If the syntax is broken, report the syntax errors and stop, because nothing downstream is trustworthy yet. Once it parses, run the structural and type checks as a batch. Once the shape is valid, run the semantic and cross-reference checks as a batch. Within each phase you report everything; between phases you gate, because errors in a later phase are only meaningful when the earlier phase held.

Compilers have done this for decades, and it's why a good one gives you a screen of real errors instead of one cryptic line or a thousand garbage ones. The skill is knowing which checks are independent, so they can be batched, and which are derived, so they should be suppressed until their precondition holds.

Make Each Error Worth Reading#

Batching only pays off if the batch is legible. Ten errors that each say "invalid input" are not ten times more useful than one. A useful error answers three questions: what is wrong, where it is, and what would make it pass. If a check can't answer those, it is only a boolean, and booleans are how you lose trust faster than you lose data.

That means structured errors, not concatenated strings. Give each one a stable machine-readable code, a path or pointer to the exact location, and a human-readable message. The structure is what lets a UI highlight all the bad fields at once, a CLI print a tidy table, and a CI job fail with a diagnostic someone can act on without re-running anything.

{
  "valid": false,
  "errors": [
    { "path": "items[2].quantity", "code": "out_of_range",
      "message": "Quantity must be greater than 0", "got": -1 },
    { "path": "ship_date", "code": "in_past",
      "message": "Ship date cannot be in the past", "got": "2026-06-01" }
  ]
}

A path like items[2].quantity is the difference between "your order is invalid" and a cursor the user can jump straight to. Multiply that across a batch and you've turned a guessing game into a checklist.

Where the Line Is#

Batching is the right default for validation, and it is genuinely wrong in a few places, so hold it honestly. Authentication should fail fast and say little; enumerating everything wrong with a login attempt is a gift to an attacker. Anything with side effects that compound, or any check that is genuinely expensive and gated behind a cheap one, belongs in a later phase or behind fail-fast on purpose. And when checks are truly dependent, suppress the derived ones rather than reporting noise.

But for the ordinary case, the form, the config, the API payload, the deploy, the bar is simple. Before you ship a validator, submit something with three mistakes in it. If it only tells you about one, you haven't finished building it. You've just moved the rest of the work onto whoever has to use it.

Designing Safety Guardrails for Distributed Workflow Orchestration: why fail-fast is the wrong default for safety systems, and parallel evaluation with aggregation

Keep reading

Post

A Check You Never See Fail Is Already Dead

A scheduled job on my fleet reported success for weeks while the program inside it failed every run. The watchdog that should have caught it was broken too, and its silence read as health. What I now require from every check that guards something I care about: three independent signals, and a scheduled proof that the checker itself can still say no.

Read

Post

The Pocket Quant

I built a quant research platform, then built an agent to operate it: a scheduled Claude session that reads the boards, keeps a pre-registered track record, and texts me three times a day without ever saying buy.

Read

Post

When CI Costs More Than It Saves

GitHub Actions' default minute allowance is priced for a team that types at human speed. At agent velocity the bill breaks before the engineering does. Here is how a forced workaround, a local CI mirror plus local deploys, became the better default.

Read

Post

Prompt caching is a prefix match, not a flag

Prompt caching looks like a flag you flip for a cheaper bill. It is really the reuse of a stored prompt prefix, governed by three rules, and applying it across four parts of my own system showed where it pays, where it quietly does nothing, and where it is not even my decision. With the token counts I measured to check.

Read

Post

Designing Safety Guardrails for Distributed Workflow Orchestration

Patterns for pre-execution safety checks, parallel validation, opt-out design, and extensible guardrail architecture on workflow platforms.

Read

Post

The schema is the product

SpecSelf looks like a set of features: coherence checks, persona rotation, review cadences, an audit trail. Every one of them was implied by ten frontmatter fields decided on day one. A life-OS is not a feature list. It is a schema decision, and the features are what fall out of making it correctly.

Read

Follow the work

New tools and writing as they ship — pick a channel.

RSS feed LinkedIn

Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.

The Cost You're Actually Optimizing#

Collect, Then Decide#

Don't Drown Them in Cascades#

Make Each Error Worth Reading#

Where the Line Is#

Related#

Keep reading

Follow the work