← Back to Blog

Goodbye Opus, Hello Fable

Anthropic shipped Claude Fable 5 and Mythos 5 today, and Opus is no longer the top of the lineup. Fable 5 and Mythos 5 are the same underlying model. The difference is safeguards: Fable 5 ships with them on and is generally available now, while Mythos 5 has the cybersecurity safeguards removed and is restricted to Project Glasswing partners and vetted researchers. Even the names admit it. Fable comes from the Latin fabula, "that which is told," the etymological sibling of the Greek mythos. Same root, same model, two names. Naming the safety boundary instead of pretending it doesn't exist is the right call.

For anyone running agents in production, this is the announcement that matters this year.

What changed

Opus 4.x was already good enough that I delegated implementation authority to it across two product lines. Fable 5 moves the ceiling on the thing that actually constrains agent platforms: long-horizon work.

The numbers worth caring about:

  • Pricing: $10/M input, $50/M output. Less than half of Mythos Preview.
  • Scale: Stripe ran a 50-million-line codebase migration in one day. Their manual estimate was two months.
  • Benchmarks: state-of-the-art on essentially everything tested, including the top score among frontier models on Cognition's FrontierCode.
  • Vision: it completed Pokémon FireRed from raw screenshots only. No scaffolding, no game-state API. That's the grounding test that matters for computer use.
  • Context: long-horizon reasoning across millions of tokens, with persistent memory improving outputs over time.
  • Research: first Claude model that generates compelling scientific hypotheses consistently. Mythos 5 reportedly sped up protein design roughly 10x for one partner.

One partner quote sums up the shift: long-horizon problems that were "out of reach for earlier models" are now in reach. That tracks with what I've seen. Earlier models didn't fail on hard tasks, they failed on long ones. Drift, lost invariants, forgotten constraints around hour three. If Fable 5 fixes that, the bottleneck moves from the model back to the harness.

The safeguards, plainly. Fable 5 ships with three classifiers, and when one trips the response falls back to Opus 4.8:

  1. Cyber: blocks exploitation assistance and offensive tasks.
  2. Bio/chem: blocks dual-use research with pandemic or weapons potential.
  3. Distillation: blocks capability-extraction attempts.

They trigger in fewer than 5% of sessions, and there's a 30-day data retention requirement for safety monitoring (not training). For my workloads, agents writing TypeScript and Terraform, the practical impact rounds to zero. If you need the unguarded model for security research, Mythos 5 via Glasswing is the path. The rest of us don't get it, and that's fine.

So Opus doesn't fully retire. It's the fallback model behind the classifiers. A fitting epilogue: the workhorse becomes the safety net.

What I'm doing about it

Caskey's Builder runs on Claude Code, so the swap is a config change, not a migration:

  • Point Builder and Critic at claude-fable-5 once the subscription rollout reaches my tier (staged through June 23).
  • Re-run the dogfood specs and compare against the Opus baselines, particularly the multi-hour items that previously needed human re-anchoring.
  • Re-evaluate human_gate placement in the backlog. Gates exist where the model historically lost the plot. If Fable 5 holds invariants over long horizons, some of those gates are now just latency.

Goodbye, Opus. You opened a lot of PRs. Fable, start telling.

Keep reading

Post

The Orange Pi That Maintains Itself

A small ARM box that started as a local LLM experiment and ended up a self-governing node: private retrieval, a resident agent under a written constitution, a code-enforced safety fence, and a nightly job where it audits itself and files its own backlog.

Read
Post

Context architecture beats documentation dumps

Dumping the whole corpus into an AI agent makes it worse, not better. The fix is architectural: each task loads a curated slice, not everything you have. Here is the method, and the same move at three different layers: specs, sensor data, and evaluation lenses.

Read
Post

An orchestration mode is only as good as its backlog

Anthropic published a guide on building a session-level orchestration mode. I built it two ways, on the CLI and on the API, and then hit the part the guide does not cover: an orchestrator that fans out is useless without a backlog of real work to fan out over.

Read
Post

Wiring Garmin Into My Marathon Coach: A Live Data Integration Without an Official API

How I replaced manual CSV exports with a live Garmin data feed for my AI marathon coach: a scheduled unofficial-API poller, resilient session handling, and the design calls that keep training and recovery data fresh and trustworthy.

Read
Post

A Boring Design Let Me Run a Black Swan on a Tuesday

Two posts ago I bet that keeping my portfolio reviewer's engine deterministic and auditable was worth it. This is where that bet paid off: because the engine is replayable, I could run a simulated market crash through the real production code and catch a money-losing flaw on paper, before it could ever cost a real dollar.

Read
Post

Building a Personal Finance Reviewer: What Survived the Rewrite

A personal portfolio reviewer where the scoring is deterministic and the AI only narrates. The architecture that held up after I had to rewrite the model it was built on, and why that boundary is the whole point.

Read
Written by Eric Caskey. I build AI tools you can actually use. Explore the Tools or see the case studies.