ClaudeFable 5AnthropicLLMAI

What's Different About Claude Fable 5 — How It Compares to Existing Models

How Anthropic's most capable released model, Claude Fable 5, differs from Opus, Sonnet, Haiku, and prior models. We cover the specs and pricing, the API behavior changes (always-on thinking, protected thinking, a new tokenizer, the refusal stop reason), and the long-horizon agentic capabilities plus the shifts in prompting and operations.

Data DynamicsJune 12, 20267 min read

Anthropic's most capable released model is currently Claude Fable 5 (claude-fable-5) — a tier above Opus 4.8. But it's more than "a smarter model": much of the API behavior and operating model itself has changed. This post lays out, from a developer's perspective, exactly how Fable 5 differs from Opus, Sonnet, Haiku, and prior models.

Model IDs, pricing, and behaviors below follow Anthropic's official API reference.

1. Overview and positioning

Fable 5 is built for the most demanding reasoning and long-horizon agentic work. Its key specs:

Model ID: claude-fable-5
Context window: 1M tokens (the maximum is also the default)
Max output: 128K tokens
Pricing: $10 input / $50 output (per 1M tokens)

An important positioning point: Fable 5 is not the default path for "upgrade to the latest model." Its pricing exceeds Opus-tier and its tokenizer changes the cost baseline, so the recommended target for a typical Opus upgrade is still claude-opus-4-8. Fable 5 is the model you choose explicitly when you need the difficulty ceiling above that.

Note: Claude Mythos 5 (claude-mythos-5) offers the same capabilities, pricing, and API as Fable 5, available through Project Glasswing. Everything below applies to both models.

2. The lineup compared

Model	Model ID	Context	Max output	Input $/1M	Output $/1M
Claude Fable 5	`claude-fable-5`	1M	128K	$10	$50
Claude Opus 4.8	`claude-opus-4-8`	1M	128K	$5	$25
Claude Sonnet 4.6	`claude-sonnet-4-6`	1M	64K	$3	$15
Claude Haiku 4.5	`claude-haiku-4-5`	200K	64K	$1	$5

Even at the same 1M context, Fable 5 has the highest intelligence ceiling and the highest price. Sonnet remains sensible when speed/cost efficiency matters, Haiku for simple high-volume work.

3. Key API behavior differences (developer view)

The first thing you feel when porting code isn't model intelligence — it's the request/response contract.

Thinking is always on

On Fable 5, thinking is always on. Omit the thinking parameter entirely and adaptive thinking applies. Any other setting is rejected — thinking: {type: "disabled"} and thinking: {type: "enabled", budget_tokens: N} both return a 400. Control depth with output_config.effort (low through xhigh and max), not budget_tokens.

# Fable 5 — no thinking param; control depth with effort
client.messages.create(
    model="claude-fable-5",
    max_tokens=16000,
    output_config={"effort": "high"},   # low | medium | high | xhigh | max
    messages=[...],
)

Protected thinking — the raw chain of thought is never exposed

Responses include regular thinking blocks, but the raw chain of thought is never returned under any setting. display: "summarized" returns a readable summary; "omitted" (the default) returns an empty thinking field. When continuing multi-turn, pass thinking blocks back exactly as received on the same model (modifying them is rejected). Hand them to a different model and those blocks are silently dropped from the prompt — and the dropped portion isn't billed.

A new tokenizer — re-baseline tokens and cost

Fable 5 uses a new tokenizer. The same content tokenizes to roughly 30% more tokens than on Opus-tier models. Billing is per token, so an unchanged workload can cost more even before the per-token price difference. Don't reuse token counts, max_tokens, or context budgets measured on Opus/Sonnet — re-measure.

# count_tokens returns counts under both tokenizers
resp = client.messages.count_tokens(model="claude-fable-5", messages=[...])
resp.input_tokens                  # new tokenizer (what you're billed)
resp.input_tokens_prior_tokenizer  # the same request under the prior-gen tokenizer

The `refusal` stop reason

Fable 5 runs safety classifiers on incoming requests (targeting research biology and most cybersecurity content). When a classifier declines, it responds with HTTP 200 and stop_reason: "refusal". A pre-output refusal has empty content and isn't billed; a mid-stream refusal bills the already-streamed output — discard it. Check stop_reason before reading content.

response = client.messages.create(model="claude-fable-5", max_tokens=1024, messages=[...])
if response.stop_reason == "refusal":
    handle_refusal()          # content empty or partial — discard
else:
    print(response.content[0].text)

To retry on another model, the server-side fallbacks parameter (beta) or the SDK's fallback middleware handles it in one shot.

Other contract differences

No assistant prefill — a last-assistant-turn prefill returns 400. Force output format with output_config.format (structured outputs) or a system prompt instead (same as the 4.6+ family).
30-day data retention required — Fable 5 is not available under zero data retention (ZDR). If retention doesn't meet the requirement, every request returns 400. If you get a 400 with a perfectly valid payload, check the org's data-retention configuration first.

4. Capability differences — what it does better

Fable 5's strengths show up most at difficulty levels prior models couldn't handle.

Long-horizon autonomous execution — state-of-the-art at agentic work that runs long without human correction, like complex refactors or overnight coding runs. A single request on a hard task running many minutes is normal.
Effort matters more — low effort still performs very well on Fable 5; even low often exceeds the xhigh/max performance of prior models. If a task finishes correctly but takes longer than necessary, lowering effort is the first adjustment.
High-resolution vision — trained to actively use bash and crop tools even on flipped, blurry, or noisy images.
Asynchronous sub-agent collaboration — reliably sustains ongoing communication with long-running sub-agents. Lean into delegation rather than suppressing it.
File-based memory — letting it write learnings to a file and consult them in later sessions noticeably improves performance.
Code review and debugging — finds real bugs better and explains them more clearly (security-focused analysis excepted, due to the safety classifiers above).

5. Shifts in prompting and operations

Fable 5 makes you rethink not just code contracts but prompts and operating patterns.

Design for long turns — a single request can exceed several minutes, so plan timeouts, streaming, and progress UX up front, and structure work to check in asynchronously rather than blocking inside a synchronous request.
Over-prescriptive prompts actually lower quality — prompts and skills that spelled out every step for prior models can reduce Fable 5's output quality. Rewrite them around goals and constraints rather than step lists, and A/B the difference.
Include low/medium in your effort sweep — routine work is handled well — sometimes better — at lower effort.
Suppress unrequested tidying and over-engineering — at higher effort it may add refactors or abstractions beyond the request. "Only what was asked, the simplest thing that works" in the system prompt is effective.

6. When Fable 5, and when Opus 4.8 is enough

Situation	Recommendation
A typical "upgrade to the latest Opus"	Opus 4.8 (`claude-opus-4-8`)
High-volume workloads where cost/speed balance matters	Sonnet 4.6 / Haiku 4.5
Long autonomous agents with no human in the loop; the hardest reasoning, planning, implementation	Fable 5
Orgs that require ZDR (zero retention)	Fable 5 unavailable → Opus-tier

In short, Fable 5 is the model you turn on when you're explicitly going after "what prior models couldn't do." Otherwise, Opus 4.8 is the simpler, cheaper choice.

Wrapping up

Fable 5's differences reduce to three layers. (1) Specs and pricing — 1M context, 128K output, at a per-token rate above Opus-tier; (2) API contract — always-on thinking, protected thinking, a new tokenizer, the refusal stop reason, no prefill, 30-day retention; (3) Capability and operations — minutes-long long-horizon autonomous execution and a "the less you prescribe, the better" prompting style. If you're evaluating it, start by picking your single hardest task and sweeping effort across it.