Updated June 12, 2026
Claude Fable 5 launched June 9, 2026 as a new tier above Opus 4.8 — priced at $10/$50/MTok (2× Opus). This guide now covers all four models. Full Fable 5 breakdown →
Anthropic’s Claude model lineup in 2026 now spans four tiers: Fable 5 at the top for maximum capability ($10/$50/MTok), Opus 4.8 for serious production work ($5/$25), Sonnet 4.6 for the best balance of performance and cost ($3/$15), and Haiku 4.5 for speed and high-volume work ($1/$5). Picking the wrong model costs money or performance — sometimes both. This guide covers every meaningful difference so you can make the right call.
Quick answer: Sonnet 4.6 handles 80–90% of tasks at a fraction of the cost of higher tiers. Use Fable 5 for the hardest engineering and long-horizon agentic work ($10/$50/MTok). Use Opus 4.8 for serious production work with zero data retention requirements ($5/$25). Use Sonnet 4.6 as your daily driver ($3/$15). Use Haiku 4.5 when speed and cost dominate ($1/$5).
The Current Claude Model Lineup (June 2026)
Claude Fable 5 vs Opus 4.8 vs Sonnet 4.6 vs Haiku 4.5: side-by-side
| Feature |
Claude Fable 5 🆕 |
Claude Opus 4.8 |
Claude Sonnet 4.6 |
Claude Haiku 4.5 |
| Best for |
Hardest engineering, long-horizon autonomy |
Production work, zero-data-retention |
Best speed/intelligence balance |
Fastest responses, high-volume tasks |
| Input price |
$10 / MTok |
$5 / MTok |
$3 / MTok |
$1 / MTok |
| Output price |
$50 / MTok |
$25 / MTok |
$15 / MTok |
$5 / MTok |
| Context window |
1M tokens |
1M tokens |
1M tokens |
200k tokens |
| Max output |
128k tokens |
128k tokens |
64k tokens |
64k tokens |
| Extended thinking |
No (adaptive always on) |
No |
Yes |
Yes |
| Adaptive thinking |
Always on |
Yes |
Yes |
No |
| Zero data retention |
No (30-day mandatory) |
Yes |
Yes |
Yes |
| Latency |
Slow–Moderate |
Moderate |
Fast |
Fastest |
| API ID |
claude-fable-5 |
claude-opus-4-8 |
claude-sonnet-4-6 |
claude-haiku-4-5 |
As of June 2026, Anthropic’s three recommended models are Claude Opus 4.8, Claude Sonnet 4.6, and Claude Haiku 4.5. All three support text and image input, multilingual output, and vision processing. They differ significantly in pricing, context window, output limits, and capability.
| Feature |
Fable 5 🆕 |
Opus 4.8 |
Sonnet 4.6 |
Haiku 4.5 |
| Input price |
$10 / MTok |
$5 / MTok |
$3 / MTok |
$1 / MTok |
| Output price |
$50 / MTok |
$25 / MTok |
$15 / MTok |
$5 / MTok |
| Context window |
1M tokens |
1M tokens |
1M tokens |
200K tokens |
| Max output |
128K tokens |
128K tokens |
64K tokens |
64K tokens |
| Extended thinking |
No (adaptive always on) |
No |
Yes |
Yes |
| Adaptive thinking |
Always on |
Yes |
Yes |
No |
| Latency |
Slow–Moderate |
Moderate |
Fast |
Fastest |
| Reliable knowledge cutoff |
2026 |
Jan 2026 |
Aug 2025 (reliable) |
Feb 2025 (reliable) |
Pricing is per million tokens (MTok) via the Claude API. Source: Anthropic Models Overview, June 2026.
Claude Fable 5: The New Top Tier (June 9, 2026)
Fable 5 is Anthropic’s first Mythos-class model released for general availability. It landed June 9, 2026 and sits above Opus 4.8 in capability — scoring 95.0% on SWE-bench Verified (vs 88.6% for Opus 4.8) and 80.0% on SWE-bench Pro (vs 69.2%). On the Senior Engineer benchmark, Fable 5 scores 91/100 vs approximately 63/100 for Opus 4.8.
Key differentiators for Fable 5:
- Adaptive thinking always on — Fable 5 doesn’t have an extended thinking toggle. It always reasons adaptively, scaling depth to task complexity.
- 128K max output — same as Opus 4.8, twice Sonnet’s 64K cap.
- 1M token context window — same as Opus 4.8 and Sonnet 4.6.
Two constraints that matter:
- Mandatory 30-day data retention. Fable 5 is not available under zero data retention. If your use case requires ZDR (healthcare, legal, finance with strict data handling), use Opus 4.8.
- Safety classifier routing. Prompts touching cybersecurity, biology, chemistry, and distillation route to an Opus 4.8 fallback — at Fable 5 pricing. If your workload is in these domains, the upgrade is less impactful.
Use Fable 5 for: large migrations or refactors, multi-agent orchestration at frontier quality, long-horizon agentic work, complex scientific analysis, and any task where quality on hard problems justifies 2x cost over Opus.
Skip Fable 5 for: well-scoped routine work, high-volume pipelines (2x cost compounds), ZDR-required use cases, or domains where the safety classifier fallback applies.
Claude Opus 4.8: The Production Standard
Opus 4.8 is Anthropic’s most capable model supporting zero data retention (ZDR) — the right default for most production API work. Fable 5 has since surpassed it in raw capability, but Opus 4.8 remains the better choice for ZDR workloads, cost-sensitive pipelines, and domains where Fable 5’s safety classifier routing applies. Anthropic describes it as a step-change improvement in agentic coding over Opus 4.8, with a new tokenizer that contributes to improved performance on a range of tasks. Note that this new tokenizer may use up to 35% more tokens for the same text compared to previous models — a cost consideration worth factoring in for high-volume workflows.
Key differentiators for Opus 4.8 over the other two models:
- 128K max output tokens — double Sonnet and Haiku’s 64K cap. This matters for generating long-form code, detailed reports, or complete document drafts in a single call.
- 1M token context window — same as Sonnet 4.6, meaning Opus can process entire codebases or book-length documents in a single session.
- Adaptive thinking — Opus 4.8 and Sonnet 4.6 both support adaptive thinking, which lets the model adjust reasoning depth based on task complexity.
- Most recent knowledge cutoff — January 2026, versus August 2025 (reliable) for Sonnet and February 2025 (reliable) for Haiku.
Opus does not support extended thinking — that capability lives on Sonnet 4.6 and Haiku 4.5 Extended thinking lets the model reason step-by-step before generating output, which is particularly useful for complex math, science, and multi-step logic problems.
Use Opus 4.8 for: complex architecture decisions, large codebase analysis, multi-agent orchestration tasks, outputs that require more than 64K tokens, tasks demanding the latest possible knowledge, and any work where you need the absolute frontier of Anthropic’s reasoning capability.
Skip Opus 4.8 for: routine content generation, customer support pipelines, high-volume classification or extraction, real-time applications requiring low latency, or any task where Sonnet scores within your acceptable quality threshold.
Claude Sonnet 4.6: The Workhorse
Sonnet 4.6 is the model Anthropic recommends as the best combination of speed and intelligence. Released in February 2026, it delivers a 1M token context window at $3 input / $15 output per million tokens — the same context window as Opus at 40% lower cost.
Sonnet 4.6 also uniquely offers extended thinking, which Opus 4.8 does not. When extended thinking is enabled, Sonnet can perform additional internal reasoning before generating its response — useful for reasoning-heavy tasks like complex debugging, multi-step research, and technical problem-solving where chain-of-thought depth matters.
For developers and teams using Claude Code, Sonnet 4.6 is the standard daily driver. It handles tool calling, agentic workflows, and multi-file code reasoning reliably, at a price point that makes heavy daily use economically viable.
Use Sonnet 4.6 for: most production workloads, Claude Code sessions, long-document analysis, content generation, coding tasks, research synthesis, customer-facing applications, and any workflow requiring the 1M context window where Opus’s premium isn’t justified.
Skip Sonnet 4.6 for: high-volume pipelines where Haiku’s lower cost is acceptable, simple classification or extraction tasks, or real-time applications where Haiku’s faster latency is required.
Claude Haiku 4.5: Speed and Volume
Haiku 4.5 is the fastest model in the Claude family and the most cost-efficient at $1 input / $5 output per million tokens. It has a 200K token context window — smaller than Opus and Sonnet’s 1M, but still substantial for most single-task work. It supports extended thinking but not adaptive thinking.
The 200K context limit is the most important practical constraint. Most single-document, single-task workflows fit within 200K. Multi-file codebases, long books, or extended conversation histories that push past that threshold need Sonnet or Opus.
Haiku 4.5 has the oldest knowledge cutoff of the three: February 2025. For tasks requiring awareness of events or developments from mid-2025 onward, Haiku won’t have that context baked in.
Use Haiku 4.5 for: content moderation, classification pipelines, entity extraction, customer support triage, real-time chat interfaces, simple Q&A, high-volume API workflows where cost and speed dominate, and any task where quality requirements are modest.
Skip Haiku 4.5 for: complex reasoning, large codebase analysis, tasks requiring recent knowledge (post-February 2025), multi-step agent workflows, or any output requiring more than 200K tokens of input context.
Pricing: What the Numbers Actually Mean in Practice
All three models price output tokens at 5x the input rate — a ratio that holds across the entire Claude lineup. This means verbose, long-form outputs cost significantly more than short, targeted responses. Minimizing generated output length is the highest-leverage cost optimization available before you touch model routing or caching.
To put the pricing in concrete terms: generating one million output tokens (roughly 750,000 words of generated text) costs $25 on Opus, $15 on Sonnet, and $5 on Haiku. For input-heavy workloads like document analysis where you’re feeding in large amounts of text but getting shorter responses, the cost gap narrows.
Three additional pricing levers apply across all models:
- Prompt caching: Cuts cache-read input costs by up to 90% for repeated system prompts or documents. If your application reuses a large system prompt across many requests, caching is the single highest-impact cost reduction available.
- Batch API: Provides a 50% discount for non-time-sensitive workloads processed asynchronously. Combine with prompt caching for up to 95% savings on qualifying workflows.
- Model routing: Running a mix of Haiku for simple tasks, Sonnet for production workloads, and Opus for complex reasoning — rather than using one model for everything — can reduce total API costs by 60–70% without meaningful quality loss on the tasks that don’t require a flagship model.
Context Windows: 1M Tokens vs. 200K
Opus 4.8 and Sonnet 4.6 both offer a 1M token context window at standard pricing — no premium surcharge for extended context. For reference, 1 million tokens is roughly 750,000 words, enough to hold a large codebase, a full academic textbook, or months of business communications in a single conversation.
Haiku 4.5 has a 200K token context window. That’s still roughly 150,000 words — sufficient for most single-document tasks, but it creates a hard ceiling for anything requiring multi-file code review, book-length document analysis, or lengthy conversation histories.
If your workflow consistently requires more than 200K tokens of input, Sonnet 4.6 is the cost-efficient choice. Opus 4.8 is the right call only when the input load requires the additional reasoning capability Opus provides, not just the context window size — because Sonnet gets you the same 1M window at 40% lower cost.
Extended Thinking vs. Adaptive Thinking
These are two distinct features that appear together in the comparison table but serve different purposes.
Extended thinking (available on Sonnet 4.6 and Haiku 4.5, not Opus 4.8) lets Claude perform additional internal reasoning before generating its response. When enabled, the model produces a “thinking” content block that exposes its reasoning process — step-by-step problem decomposition before the final answer. Extended thinking tokens are billed as standard output tokens at the model’s output rate. A minimum thinking budget of 1,024 tokens is required when enabling this feature.
Adaptive thinking (available on Opus 4.8 and Sonnet 4.6, not Haiku 4.5) adjusts reasoning depth dynamically based on task complexity — the model allocates more reasoning for harder problems and less for simpler ones, without requiring explicit configuration.
The practical implication: if you need transparent, controllable step-by-step reasoning that you can inspect and use in your application, Sonnet 4.6’s extended thinking is often the right tool — and at lower cost than Opus.
Which Claude Model Should You Choose?
The right framework for model selection in mid-2026 is a four-tier stack: Fable 5 for the hardest problems, Opus 4.8 as the production standard, Sonnet 4.6 as the daily driver, Haiku 4.5 for volume. Start with Sonnet 4.6 and escalate selectively. Most production workloads — coding, writing, analysis, customer-facing applications — are well-served by Sonnet. Opus 4.8 earns its premium when you need ZDR, outputs over 64K tokens, or the January 2026 knowledge cutoff. Fable 5 earns its 2x premium when the task is genuinely hard enough that 10+ percentage points on SWE-bench matters for your outcome.
Haiku 4.5 belongs in any pipeline where you’ve identified tasks that don’t require Sonnet’s capability. High-volume routing, triage, classification, and real-time response scenarios are Haiku’s natural territory. The optimal production routing split is roughly 70% Haiku 4.5, 20% Sonnet 4.6, 8% Opus 4.8, 2% Fable 5 — rather than using a single model for everything. That ratio cuts costs by 60–70% without meaningful quality loss on the tasks that don’t need a flagship model.
Frequently Asked Questions
What is the difference between Claude Opus 4.8, Sonnet, and Haiku?
Opus is Anthropic’s most capable model, optimized for complex reasoning, large outputs, and agentic tasks. Sonnet offers a balance of capability and cost, handling most production workloads at lower price. Haiku is the fastest and cheapest option, suited for high-volume, lower-complexity tasks. All three share the same core Claude architecture and safety training.
Is Claude Opus 4.8 worth the extra cost over Sonnet?
For most tasks, no. Sonnet 4.6 handles the majority of coding, writing, and analysis work at 40% lower cost. Opus 4.8 is worth the premium when you need outputs longer than 64K tokens, maximum agentic coding capability, or the most recent knowledge cutoff (January 2026 vs. Sonnet’s August 2025).
Which Claude model is best for coding?
Sonnet 4.6 is the standard recommendation for most coding work, including Claude Code sessions. Opus 4.8 is preferred for large codebase analysis, complex architecture decisions, or multi-agent coding workflows where maximum reasoning depth is required. Haiku 4.5 can handle simple code edits and explanations at much lower cost.
What is the Claude context window?
Claude Opus 4.8 and Sonnet 4.6 both have a 1 million token context window — roughly 750,000 words of combined input and conversation history. Claude Haiku 4.5 has a 200,000 token context window. Context window size determines how much information Claude can hold and reference in a single conversation.
Does Claude Opus 4.8 support extended thinking?
No. Extended thinking is available on Claude Sonnet 4.6 and Claude Haiku 4.5, but not on Claude Opus 4.8 Opus 4.8 supports adaptive thinking instead, which dynamically adjusts reasoning depth based on task complexity.
What is the cheapest Claude model?
Claude Haiku 4.5 is the least expensive model at $1 per million input tokens and $5 per million output tokens. It is also the fastest Claude model, making it well-suited for high-volume, latency-sensitive applications.
Can I use Claude through Amazon Bedrock or Google Vertex AI?
Yes. All three current Claude models — Opus 4.8, Sonnet 4.6, and Haiku 4.5 — are available through Amazon Bedrock and Google Vertex AI in addition to the direct Anthropic API. Bedrock and Vertex AI offer regional and global endpoint options. Pricing on third-party platforms may vary from direct Anthropic API rates.
Claude vs GPT-4o: Which Model Wins for Everyday Work?
Claude Sonnet 4.6 and GPT-4o are the primary head-to-head competitors in 2026 for professional daily use. They price similarly ($3 vs $3.00 per MTok input) but perform differently depending on task type.
| Task Type |
Claude Sonnet 4.6 |
GPT-4o |
| Long-document analysis (200K+ tokens) |
✓ 1M context window |
128K limit |
| Multi-step reasoning |
Extended thinking available |
o1 series for reasoning |
| Code generation |
Strong; Claude Code natively |
Strong; GitHub Copilot integration |
| Instruction following |
Very consistent |
Consistent |
| API cost (output) |
$15/MTok |
$10/MTok |
| Context window |
1M tokens |
128K tokens |
The clearest differentiator is context window size. If your workflow involves analyzing full codebases, long contracts, or book-length documents in a single call, Claude Sonnet 4.6’s 1M token window eliminates chunking overhead that GPT-4o requires at 128K. For shorter tasks, either model performs comparably.
Claude vs Gemini 2.5 Pro: How Do They Compare?
Google’s Gemini 2.5 Pro competes directly with Claude Sonnet 4.6 on price and capability. Key differences:
| Feature |
Claude Sonnet 4.6 |
Gemini 2.5 Pro |
| Input price |
$3.00/MTok |
$3.00/MTok (under 200K tokens) |
| Output price |
$15.00/MTok |
$10.00/MTok |
| Context window |
1M tokens |
1M tokens |
| Extended thinking |
Yes |
Yes (2.5 Pro) |
| Agentic coding |
Claude Code native |
Via Gemini API / IDX |
Gemini 2.5 Pro is cheaper on paper, especially for prompts under 200K tokens. Claude Sonnet 4.6’s advantage is instruction-following consistency on complex multi-step tasks and the Claude Code ecosystem for engineering teams already in the Anthropic stack.
Which Claude Model Should You Use in Claude Code?
Claude Code supports all four models. The recommended routing for most teams:
- Fable 5 — Use for the hardest agentic tasks: large migrations, complex multi-file refactors, long-horizon autonomous workflows. Enable with
claude --model claude-fable-5.
- Opus 4.8 — Default for serious work: multi-agent orchestration, large codebase analysis, outputs over 64K tokens.
- Sonnet 4.6 — Daily driver. Best cost-to-performance ratio for most coding tasks. Extended thinking handles complex architecture decisions.
- Haiku 4.5 — High-frequency, low-complexity tasks: formatting, renaming, boilerplate, pipeline steps where speed matters more than depth.
The Max plan (available on claude.ai) unlocks 1M token context in Claude Code at no additional charge, which is the practical differentiator for large codebase work.
Frequently Asked Questions: Claude Model Comparison
What is the best Claude model in 2026?
Claude Sonnet 4.6 is the recommended default for most tasks — it delivers 80-90% of Opus 4.8’s capability at 40% lower cost. Use Opus 4.8 when you need maximum reasoning depth, outputs longer than 64K tokens, or the most recent knowledge cutoff (January 2026). Use Haiku 4.5 for high-volume, speed-sensitive work.
Is Claude Opus 4.8 better than Sonnet?
Claude Opus 4.8 has a higher capability ceiling than Sonnet 4.6: larger output window (128K vs 64K tokens), the most recent knowledge cutoff, and stronger performance on complex agentic coding tasks. However, Sonnet 4.6 uniquely offers extended thinking which Opus does not support, and it costs 40% less. For most users, Sonnet 4.6 is the better practical choice.
What is Claude Haiku 4.5 used for?
Claude Haiku 4.5 is optimized for speed and cost efficiency at $1 input / $5 output per million tokens. It is best suited for high-volume pipelines, classification, metadata generation, social media content, and any task where fast response time matters more than maximum reasoning depth. It has a 200K token context window.
Which Claude model supports extended thinking?
Claude Sonnet 4.6 and Claude Haiku 4.5 both support extended thinking. Claude Opus 4.8 does not. Extended thinking allows the model to reason step-by-step internally before generating output, which improves performance on complex math, science, and multi-step logic problems.
Frequently Asked Questions
What is the difference between Claude Opus, Sonnet, and Haiku?
Claude Opus 4.8 is the most capable model in the standard tier — best for complex reasoning, long-horizon agentic coding, and tasks requiring high autonomy. Claude Sonnet 4.6 balances intelligence and speed for production workloads — it supports extended thinking and adaptive thinking while costing less than Opus. Claude Haiku 4.5 is the fastest and cheapest option, suited for high-volume tasks where speed and cost matter more than maximum capability.
Which Claude model should I use in 2026?
Start with Claude Sonnet 4.6 for most production applications — it offers near-Opus intelligence at $3/$15 per million tokens and supports extended thinking. Use Claude Opus 4.8 for complex multi-step reasoning, long-horizon agentic work, or tasks where quality is worth the higher cost ($5/$25 per MTok). Use Claude Haiku 4.5 for high-volume, latency-sensitive tasks where cost is the primary concern. For maximum capability above Opus 4.8, Claude Fable 5 launched June 9, 2026.
How much does Claude Opus 4.8 cost?
Claude Opus 4.8 is priced at $5 per million input tokens and $25 per million output tokens on the Claude API (per platform.claude.com as of June 2026). Batch API offers 50% discounts. For comparison: Claude Sonnet 4.6 is $3/$15 per MTok and Claude Haiku 4.5 is $1/$5 per MTok.
Does Claude Sonnet support extended thinking?
Yes. Claude Sonnet 4.6 supports both extended thinking and adaptive thinking (per platform.claude.com/docs/en/about-claude/models/overview). Extended thinking lets the model reason through complex problems before answering. Claude Haiku 4.5 also supports extended thinking. Claude Opus 4.8 does not use extended thinking but does support adaptive thinking.
What is Claude Fable 5 and how does it compare to Opus?
Claude Fable 5 (API ID: claude-fable-5) is Anthropic’s most capable widely-released model as of June 9, 2026. It uses adaptive thinking (always on), has a 1M token context window, 128k max output, and is priced at $10 input / $50 output per million tokens. Fable 5 is positioned above Opus 4.8 in the model lineup for the most demanding reasoning and long-horizon agentic work.
What is the context window for each Claude model?
Claude Opus 4.8 and Claude Sonnet 4.6 both support 1 million token context windows. Claude Haiku 4.5 supports 200,000 tokens. All three are dramatically larger than the 200k context window that was standard in previous generations. The 1M context window allows Opus and Sonnet to process entire codebases, long research documents, or extended conversations without truncation.
Get alerted when Claude pricing or limits change
We track Anthropic’s models, pricing, and limits daily and send a short note when something changes that affects what you pay or build. Occasional, no spam.