Tag: Anthropic Models

  • Claude Context Window Size 2026: What 1 Million Tokens Actually Means

    Claude Context Window Size 2026: What 1 Million Tokens Actually Means

    Looking for quick answers? The FAQ version covers every common question directly.

    → Context Window FAQ

    Claude’s context window is one of those specs that sounds simple until you actually need to use it. “1 million tokens” means almost nothing without a frame of reference. This is the guide we wish existed when we started building on Claude — written from our own experience running it in production, with numbers pulled directly from Anthropic’s official documentation.

    Quick Definition

    The context window is Claude’s working memory for a conversation. It holds everything Claude can see and reason about at once: your messages, Claude’s responses, any documents you’ve shared, and system prompts. When the window fills up, earlier content drops out.

    Current Context Window Sizes by Model (May 2026)

    These numbers come directly from Anthropic’s official models page, fetched May 9, 2026. Model strings are exact API identifiers:

    Model API String Context Window Max Output
    Claude Opus 4.7 claude-opus-4-7 1,000,000 tokens 128,000 tokens
    Claude Sonnet 4.6 claude-sonnet-4-6 1,000,000 tokens 64,000 tokens
    Claude Haiku 4.5 claude-haiku-4-5-20251001 200,000 tokens 64,000 tokens

    Opus 4.7 and Sonnet 4.6 both have the full 1M token context window. Haiku 4.5 is 200K. The key difference between Opus 4.7 and Sonnet 4.6 in this table is the max output — Opus 4.7 can write up to 128K tokens in a single response, Sonnet 4.6 caps at 64K.

    What Does 1 Million Tokens Actually Hold?

    Token counts are an abstraction. Here’s what 1 million tokens translates to in practical terms:

    • About 750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
    • A full mid-size codebase — a 50,000-line Python project with comments fits comfortably
    • Hours of meeting transcripts — a full workday of recorded calls, transcribed, fits in one context window
    • Multiple large documents simultaneously — 10 research PDFs at 30 pages each, all in the same conversation
    • Long conversation histories — hundreds of back-and-forth exchanges before anything starts dropping off

    We’ve loaded entire Notion exports, full project histories, and multi-document research packs into a single Claude session. At 1M tokens, you’re unlikely to hit the ceiling in a normal working session. You hit it when you’re doing things like: loading your entire codebase plus documentation plus conversation history and then asking Claude to do a full architectural review.

    Context Window vs. Memory: What’s the Difference?

    This is where a lot of people get confused. The context window and memory are not the same thing:

    • Context window: What Claude can see right now, in this session. Once a session ends, it’s gone.
    • Memory (in claude.ai): A separate system that extracts and stores key information from past sessions. It surfaces relevant facts into future conversations as a snippet in the context.
    • Managed Agents memory stores: A developer-layer construct where agents maintain and update knowledge bases across sessions — distinct from both the context window and the consumer memory feature.

    The 1M token context window is your working memory for one session. It doesn’t persist. Memory systems are what carry information across sessions — but they work by injecting a summary into the context window of the new session, not by giving Claude access to the full history.

    Does a Bigger Context Window Mean Better Performance?

    Mostly yes, with one important nuance. More context means Claude has more information to reason about, which generally produces better outputs for tasks that benefit from full context — code reviews, document synthesis, long-form writing, multi-document comparison.

    The nuance: performance can degrade on tasks involving specific information buried deep in a very long context. This is sometimes called the “lost in the middle” problem — models tend to pay more attention to the beginning and end of a long context than the middle. Anthropic has worked on this with Claude’s architecture, and it performs well on long-context tasks, but it’s worth structuring important information at natural reference points rather than burying it in the middle of a 500-page document.

    How We Actually Use the 1M Token Window

    We run Claude in production for content operations, site management, and agentic coding workflows. Here’s where the 1M context window makes a concrete difference in our work:

    • Full site audits: Loading every post from a WordPress site (200+ posts worth of content) into one session for comprehensive SEO analysis — without having to chunk and re-prompt
    • Cross-session context: Pasting in long Notion briefings, prior session transcripts, and the current task in one go. The window is large enough that we don’t have to decide what to leave out.
    • Codebase-wide reasoning: In Claude Code, having the full project context means Claude can make changes that account for how files interact rather than reasoning only about the current file
    • Multi-document synthesis: Research projects where we load 10-15 source documents and ask Claude to synthesize across them — something that was impossible at 100K context windows

    The practical shift from 200K to 1M tokens wasn’t just “more room.” It changed what we could ask Claude to do in a single session.

    Context Window on the API: Batch Output Extension

    For API users: on the Message Batches API, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 300K output tokens using the output-300k-2026-03-24 beta header. This is relevant for batch generation tasks where you need very long outputs — documentation generation, large codebases, book-length content.

    Frequently Asked Questions

    What is Claude’s context window in 2026?

    Claude Opus 4.7 and Claude Sonnet 4.6 both have 1,000,000 token (1M token) context windows as of May 2026. Claude Haiku 4.5 has a 200,000 token context window. These are the current generally available models.

    How many pages can Claude read at once?

    At 1M tokens, Claude can hold roughly 750,000 words of English text — equivalent to approximately 3,000 average pages. In practice, a typical 20-page PDF is roughly 10,000-15,000 tokens, so you could load 60-100 such documents in a single session before approaching the limit.

    Does the context window reset between messages?

    No — the context window accumulates across an entire conversation session. Every message you send and every response Claude gives adds to the total. The window doesn’t reset between individual messages; it resets when you start a new conversation.

    What happens when Claude hits the context window limit?

    When a conversation reaches the context window limit, earlier messages begin to drop out of the active context. Claude can no longer reference information from those earlier messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification when you’re approaching the limit.

    Is the 1M context window available on the free plan?

    The model available to free plan users has access to the 1M context window. However, free plan usage limits mean long-context sessions hit rate limits faster than paid plans. The window is technically available, but sustained heavy use of it is more practical on paid tiers.

    What’s the difference between Claude Opus 4.7 and Sonnet 4.6 context windows?

    Both have the same 1M token input context window. The difference is max output: Opus 4.7 can generate up to 128,000 tokens in a single response; Sonnet 4.6 caps at 64,000 tokens. For most tasks this distinction doesn’t matter, but for very long document generation or large code outputs, Opus 4.7 has the higher output ceiling.

  • Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Anthropic’s Claude model lineup in 2026 breaks down into three distinct tiers: Opus 4.7 for maximum capability, Sonnet 4.6 for the best balance of performance and cost, and Haiku 4.5 for speed and high-volume work. Picking the wrong model costs money or performance — sometimes both. This guide covers every meaningful difference so you can make the right call for your use case.

    Quick answer: Sonnet 4.6 handles 80–90% of tasks at 40% less cost than Opus. Use Opus 4.7 when you need maximum reasoning depth, the largest output window, or agentic coding at frontier quality. Use Haiku 4.5 when speed and cost are the priority and the task is straightforward.

    The Current Claude Model Lineup (April 2026)

    As of April 2026, Anthropic’s three recommended models are Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5. All three support text and image input, multilingual output, and vision processing. They differ significantly in pricing, context window, output limits, and capability.

    Feature Opus 4.7 Sonnet 4.6 Haiku 4.5
    Input price $5 / MTok $3 / MTok $1 / MTok
    Output price $25 / MTok $15 / MTok $5 / MTok
    Context window 1M tokens 1M tokens 200K tokens
    Max output 128K tokens 64K tokens 64K tokens
    Extended thinking No Yes Yes
    Adaptive thinking Yes Yes No
    Latency Moderate Fast Fastest
    Reliable knowledge cutoff Jan 2026 Aug 2025 (reliable) Feb 2025 (reliable)

    Pricing is per million tokens (MTok) via the Claude API. Source: Anthropic Models Overview, April 2026.

    Claude Opus 4.7: When to Use It

    Opus 4.7 is Anthropic’s most capable generally available model as of April 2026. Anthropic describes it as a step-change improvement in agentic coding over Opus 4.6, with a new tokenizer that contributes to improved performance on a range of tasks. Note that this new tokenizer may use up to 35% more tokens for the same text compared to previous models — a cost consideration worth factoring in for high-volume workflows.

    Key differentiators for Opus 4.7 over the other two models:

    • 128K max output tokens — double Sonnet and Haiku’s 64K cap. This matters for generating long-form code, detailed reports, or complete document drafts in a single call.
    • 1M token context window — same as Sonnet 4.6, meaning Opus can process entire codebases or book-length documents in a single session.
    • Adaptive thinking — Opus 4.7 and Sonnet 4.6 both support adaptive thinking, which lets the model adjust reasoning depth based on task complexity.
    • Most recent knowledge cutoff — January 2026, versus August 2025 (reliable) for Sonnet and February 2025 (reliable) for Haiku.

    Opus does not support extended thinking — that capability lives on Sonnet 4.6 and Haiku 4.5. Extended thinking lets the model reason step-by-step before generating output, which is particularly useful for complex math, science, and multi-step logic problems.

    Use Opus 4.7 for: complex architecture decisions, large codebase analysis, multi-agent orchestration tasks, outputs that require more than 64K tokens, tasks demanding the latest possible knowledge, and any work where you need the absolute frontier of Anthropic’s reasoning capability.

    Skip Opus 4.7 for: routine content generation, customer support pipelines, high-volume classification or extraction, real-time applications requiring low latency, or any task where Sonnet scores within your acceptable quality threshold.

    Claude Sonnet 4.6: The Workhorse

    Sonnet 4.6 is the model Anthropic recommends as the best combination of speed and intelligence. Released in February 2026, it delivers a 1M token context window at $3 input / $15 output per million tokens — the same context window as Opus at 40% lower cost.

    Sonnet 4.6 also uniquely offers extended thinking, which Opus 4.7 does not. When extended thinking is enabled, Sonnet can perform additional internal reasoning before generating its response — useful for reasoning-heavy tasks like complex debugging, multi-step research, and technical problem-solving where chain-of-thought depth matters.

    For developers and teams using Claude Code, Sonnet 4.6 is the standard daily driver. It handles tool calling, agentic workflows, and multi-file code reasoning reliably, at a price point that makes heavy daily use economically viable.

    Use Sonnet 4.6 for: most production workloads, Claude Code sessions, long-document analysis, content generation, coding tasks, research synthesis, customer-facing applications, and any workflow requiring the 1M context window where Opus’s premium isn’t justified.

    Skip Sonnet 4.6 for: high-volume pipelines where Haiku’s lower cost is acceptable, simple classification or extraction tasks, or real-time applications where Haiku’s faster latency is required.

    Claude Haiku 4.5: Speed and Volume

    Haiku 4.5 is the fastest model in the Claude family and the most cost-efficient at $1 input / $5 output per million tokens. It has a 200K token context window — smaller than Opus and Sonnet’s 1M, but still substantial for most single-task work. It supports extended thinking but not adaptive thinking.

    The 200K context limit is the most important practical constraint. Most single-document, single-task workflows fit within 200K. Multi-file codebases, long books, or extended conversation histories that push past that threshold need Sonnet or Opus.

    Haiku 4.5 has the oldest knowledge cutoff of the three: February 2025. For tasks requiring awareness of events or developments from mid-2025 onward, Haiku won’t have that context baked in.

    Use Haiku 4.5 for: content moderation, classification pipelines, entity extraction, customer support triage, real-time chat interfaces, simple Q&A, high-volume API workflows where cost and speed dominate, and any task where quality requirements are modest.

    Skip Haiku 4.5 for: complex reasoning, large codebase analysis, tasks requiring recent knowledge (post-February 2025), multi-step agent workflows, or any output requiring more than 200K tokens of input context.

    Pricing: What the Numbers Actually Mean in Practice

    All three models price output tokens at 5x the input rate — a ratio that holds across the entire Claude lineup. This means verbose, long-form outputs cost significantly more than short, targeted responses. Minimizing generated output length is the highest-leverage cost optimization available before you touch model routing or caching.

    To put the pricing in concrete terms: generating one million output tokens (roughly 750,000 words of generated text) costs $25 on Opus, $15 on Sonnet, and $5 on Haiku. For input-heavy workloads like document analysis where you’re feeding in large amounts of text but getting shorter responses, the cost gap narrows.

    Three additional pricing levers apply across all models:

    • Prompt caching: Cuts cache-read input costs by up to 90% for repeated system prompts or documents. If your application reuses a large system prompt across many requests, caching is the single highest-impact cost reduction available.
    • Batch API: Provides a 50% discount for non-time-sensitive workloads processed asynchronously. Combine with prompt caching for up to 95% savings on qualifying workflows.
    • Model routing: Running a mix of Haiku for simple tasks, Sonnet for production workloads, and Opus for complex reasoning — rather than using one model for everything — can reduce total API costs by 60–70% without meaningful quality loss on the tasks that don’t require a flagship model.

    Context Windows: 1M Tokens vs. 200K

    Opus 4.7 and Sonnet 4.6 both offer a 1M token context window at standard pricing — no premium surcharge for extended context. For reference, 1 million tokens is roughly 750,000 words, enough to hold a large codebase, a full academic textbook, or months of business communications in a single conversation.

    Haiku 4.5 has a 200K token context window. That’s still roughly 150,000 words — sufficient for most single-document tasks, but it creates a hard ceiling for anything requiring multi-file code review, book-length document analysis, or lengthy conversation histories.

    If your workflow consistently requires more than 200K tokens of input, Sonnet 4.6 is the cost-efficient choice. Opus 4.7 is the right call only when the input load requires the additional reasoning capability Opus provides, not just the context window size — because Sonnet gets you the same 1M window at 40% lower cost.

    Extended Thinking vs. Adaptive Thinking

    These are two distinct features that appear together in the comparison table but serve different purposes.

    Extended thinking (available on Sonnet 4.6 and Haiku 4.5, not Opus 4.7) lets Claude perform additional internal reasoning before generating its response. When enabled, the model produces a “thinking” content block that exposes its reasoning process — step-by-step problem decomposition before the final answer. Extended thinking tokens are billed as standard output tokens at the model’s output rate. A minimum thinking budget of 1,024 tokens is required when enabling this feature.

    Adaptive thinking (available on Opus 4.7 and Sonnet 4.6, not Haiku 4.5) adjusts reasoning depth dynamically based on task complexity — the model allocates more reasoning for harder problems and less for simpler ones, without requiring explicit configuration.

    The practical implication: if you need transparent, controllable step-by-step reasoning that you can inspect and use in your application, Sonnet 4.6’s extended thinking is often the right tool — and at lower cost than Opus.

    Which Claude Model Should You Choose?

    The right framework for model selection in 2026 is to start with Sonnet 4.6 as your default and escalate selectively. Most production workloads — coding, writing, analysis, research, customer-facing applications — are well-served by Sonnet. Opus 4.7 earns its premium in specific scenarios: tasks requiring more than 64K output tokens, agent workflows demanding maximum reasoning depth, or applications where Anthropic’s latest knowledge cutoff is a meaningful factor.

    Haiku 4.5 belongs in any pipeline where you’ve identified tasks that don’t require Sonnet’s capability. High-volume routing, triage, classification, and real-time response scenarios are Haiku’s natural territory. Building a 70/20/10 routing split across Haiku 4.5, Sonnet 4.6, and Opus 4.7 — rather than using a single model for everything — is the standard approach for cost-efficient production deployments.

    Frequently Asked Questions

    What is the difference between Claude Opus 4.7, Sonnet, and Haiku?

    Opus is Anthropic’s most capable model, optimized for complex reasoning, large outputs, and agentic tasks. Sonnet offers a balance of capability and cost, handling most production workloads at lower price. Haiku is the fastest and cheapest option, suited for high-volume, lower-complexity tasks. All three share the same core Claude architecture and safety training.

    Is Claude Opus 4.7 worth the extra cost over Sonnet?

    For most tasks, no. Sonnet 4.6 handles the majority of coding, writing, and analysis work at 40% lower cost. Opus 4.7 is worth the premium when you need outputs longer than 64K tokens, maximum agentic coding capability, or the most recent knowledge cutoff (January 2026 vs. Sonnet’s August 2025).

    Which Claude model is best for coding?

    Sonnet 4.6 is the standard recommendation for most coding work, including Claude Code sessions. Opus 4.7 is preferred for large codebase analysis, complex architecture decisions, or multi-agent coding workflows where maximum reasoning depth is required. Haiku 4.5 can handle simple code edits and explanations at much lower cost.

    What is the Claude context window?

    Claude Opus 4.7 and Sonnet 4.6 both have a 1 million token context window — roughly 750,000 words of combined input and conversation history. Claude Haiku 4.5 has a 200,000 token context window. Context window size determines how much information Claude can hold and reference in a single conversation.

    Does Claude Opus 4.7 support extended thinking?

    No. Extended thinking is available on Claude Sonnet 4.6 and Claude Haiku 4.5, but not on Claude Opus 4.7. Opus 4.7 supports adaptive thinking instead, which dynamically adjusts reasoning depth based on task complexity.

    What is the cheapest Claude model?

    Claude Haiku 4.5 is the least expensive model at $1 per million input tokens and $5 per million output tokens. It is also the fastest Claude model, making it well-suited for high-volume, latency-sensitive applications.

    Can I use Claude through Amazon Bedrock or Google Vertex AI?

    Yes. All three current Claude models — Opus 4.7, Sonnet 4.6, and Haiku 4.5 — are available through Amazon Bedrock and Google Vertex AI in addition to the direct Anthropic API. Bedrock and Vertex AI offer regional and global endpoint options. Pricing on third-party platforms may vary from direct Anthropic API rates.

    Claude vs GPT-4o: Which Model Wins for Everyday Work?

    Claude Sonnet 4.6 and GPT-4o are the primary head-to-head competitors in 2026 for professional daily use. They price similarly ($3 vs $3.00 per MTok input) but perform differently depending on task type.

    Task Type Claude Sonnet 4.6 GPT-4o
    Long-document analysis (200K+ tokens) ✓ 1M context window 128K limit
    Multi-step reasoning Extended thinking available o1 series for reasoning
    Code generation Strong; Claude Code natively Strong; GitHub Copilot integration
    Instruction following Very consistent Consistent
    API cost (output) $15/MTok $10/MTok
    Context window 1M tokens 128K tokens

    The clearest differentiator is context window size. If your workflow involves analyzing full codebases, long contracts, or book-length documents in a single call, Claude Sonnet 4.6’s 1M token window eliminates chunking overhead that GPT-4o requires at 128K. For shorter tasks, either model performs comparably.

    Claude vs Gemini 2.5 Pro: How Do They Compare?

    Google’s Gemini 2.5 Pro competes directly with Claude Sonnet 4.6 on price and capability. Key differences:

    Feature Claude Sonnet 4.6 Gemini 2.5 Pro
    Input price $3.00/MTok $3.00/MTok (under 200K tokens)
    Output price $15.00/MTok $10.00/MTok
    Context window 1M tokens 1M tokens
    Extended thinking Yes Yes (2.5 Pro)
    Agentic coding Claude Code native Via Gemini API / IDX

    Gemini 2.5 Pro is cheaper on paper, especially for prompts under 200K tokens. Claude Sonnet 4.6’s advantage is instruction-following consistency on complex multi-step tasks and the Claude Code ecosystem for engineering teams already in the Anthropic stack.

    Which Claude Model Should You Use in Claude Code?

    Claude Code supports all three models. The recommended routing for most teams:

    • Sonnet 4.6 — Default daily driver for all coding tasks. Best cost-to-performance ratio. Extended thinking handles complex architecture decisions.
    • Opus 4.7 — Use for multi-agent orchestration, large codebase analysis across many files, or when output length exceeds 64K tokens (Opus has a 128K output cap vs 64K for Sonnet).
    • Haiku 4.5 — Use for high-frequency, low-complexity tasks: formatting, renaming, boilerplate generation, and pipeline steps where speed matters more than reasoning depth.

    The Max plan (available on claude.ai) unlocks 1M token context in Claude Code at no additional charge, which is the practical differentiator for large codebase work.

    Frequently Asked Questions: Claude Model Comparison

    What is the best Claude model in 2026?

    Claude Sonnet 4.6 is the recommended default for most tasks — it delivers 80-90% of Opus 4.7’s capability at 40% lower cost. Use Opus 4.7 when you need maximum reasoning depth, outputs longer than 64K tokens, or the most recent knowledge cutoff (January 2026). Use Haiku 4.5 for high-volume, speed-sensitive work.

    Is Claude Opus 4.7 better than Sonnet?

    Claude Opus 4.7 has a higher capability ceiling than Sonnet 4.6: larger output window (128K vs 64K tokens), the most recent knowledge cutoff, and stronger performance on complex agentic coding tasks. However, Sonnet 4.6 uniquely offers extended thinking which Opus does not support, and it costs 40% less. For most users, Sonnet 4.6 is the better practical choice.

    What is Claude Haiku 4.5 used for?

    Claude Haiku 4.5 is optimized for speed and cost efficiency at $1 input / $5 output per million tokens. It is best suited for high-volume pipelines, classification, metadata generation, social media content, and any task where fast response time matters more than maximum reasoning depth. It has a 200K token context window.

    Which Claude model supports extended thinking?

    Claude Sonnet 4.6 and Claude Haiku 4.5 both support extended thinking. Claude Opus 4.7 does not. Extended thinking allows the model to reason step-by-step internally before generating output, which improves performance on complex math, science, and multi-step logic problems.