Claude Context Window Size 2026: What 1 Million Tokens Actually Means

Last refreshed: May 15, 2026

Looking for quick answers? The FAQ version covers every common question directly.

Claude’s context window is one of those specs that sounds simple until you actually need to use it. “1 million tokens” means almost nothing without a frame of reference. This is the guide we wish existed when we started building on Claude — written from our own experience running it in production, with numbers pulled directly from Anthropic’s official documentation.

Quick Definition

The context window is Claude’s working memory for a conversation. It holds everything Claude can see and reason about at once: your messages, Claude’s responses, any documents you’ve shared, and system prompts. When the window fills up, earlier content drops out.

Current Context Window Sizes by Model (May 2026)

These numbers come directly from Anthropic’s official models page, fetched May 9, 2026. Model strings are exact API identifiers:

Model	API String	Context Window	Max Output
Claude Opus 4.7	claude-opus-4-7	1,000,000 tokens	128,000 tokens
Claude Sonnet 4.6	claude-sonnet-4-6	1,000,000 tokens	64,000 tokens
Claude Haiku 4.5	claude-haiku-4-5-20251001	200,000 tokens	64,000 tokens

Opus 4.7 and Sonnet 4.6 both have the full 1M token context window. Haiku 4.5 is 200K. The key difference between Opus 4.7 and Sonnet 4.6 in this table is the max output — Opus 4.7 can write up to 128K tokens in a single response, Sonnet 4.6 caps at 64K.

What Does 1 Million Tokens Actually Hold?

Token counts are an abstraction. Here’s what 1 million tokens translates to in practical terms:

About 750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
A full mid-size codebase — a 50,000-line Python project with comments fits comfortably
Hours of meeting transcripts — a full workday of recorded calls, transcribed, fits in one context window
Multiple large documents simultaneously — 10 research PDFs at 30 pages each, all in the same conversation
Long conversation histories — hundreds of back-and-forth exchanges before anything starts dropping off

We’ve loaded entire Notion exports, full project histories, and multi-document research packs into a single Claude session. At 1M tokens, you’re unlikely to hit the ceiling in a normal working session. You hit it when you’re doing things like: loading your entire codebase plus documentation plus conversation history and then asking Claude to do a full architectural review.

Context Window vs. Memory: What’s the Difference?

This is where a lot of people get confused. The context window and memory are not the same thing:

Context window: What Claude can see right now, in this session. Once a session ends, it’s gone.
Memory (in claude.ai): A separate system that extracts and stores key information from past sessions. It surfaces relevant facts into future conversations as a snippet in the context.
Managed Agents memory stores: A developer-layer construct where agents maintain and update knowledge bases across sessions — distinct from both the context window and the consumer memory feature.

The 1M token context window is your working memory for one session. It doesn’t persist. Memory systems are what carry information across sessions — but they work by injecting a summary into the context window of the new session, not by giving Claude access to the full history.

Does a Bigger Context Window Mean Better Performance?

Mostly yes, with one important nuance. More context means Claude has more information to reason about, which generally produces better outputs for tasks that benefit from full context — code reviews, document synthesis, long-form writing, multi-document comparison.

The nuance: performance can degrade on tasks involving specific information buried deep in a very long context. This is sometimes called the “lost in the middle” problem — models tend to pay more attention to the beginning and end of a long context than the middle. Anthropic has worked on this with Claude’s architecture, and it performs well on long-context tasks, but it’s worth structuring important information at natural reference points rather than burying it in the middle of a 500-page document.

How We Actually Use the 1M Token Window

We run Claude in production for content operations, site management, and agentic coding workflows. Here’s where the 1M context window makes a concrete difference in our work:

Full site audits: Loading every post from a WordPress site (200+ posts worth of content) into one session for comprehensive SEO analysis — without having to chunk and re-prompt
Cross-session context: Pasting in long Notion briefings, prior session transcripts, and the current task in one go. The window is large enough that we don’t have to decide what to leave out.
Codebase-wide reasoning: In Claude Code, having the full project context means Claude can make changes that account for how files interact rather than reasoning only about the current file
Multi-document synthesis: Research projects where we load 10-15 source documents and ask Claude to synthesize across them — something that was impossible at 100K context windows

The practical shift from 200K to 1M tokens wasn’t just “more room.” It changed what we could ask Claude to do in a single session.

Context Window on the API: Batch Output Extension

For API users: on the Message Batches API, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 300K output tokens using the output-300k-2026-03-24 beta header. This is relevant for batch generation tasks where you need very long outputs — documentation generation, large codebases, book-length content.

Frequently Asked Questions

What is Claude’s context window in 2026?

Claude Opus 4.7 and Claude Sonnet 4.6 both have 1,000,000 token (1M token) context windows as of May 2026. Claude Haiku 4.5 has a 200,000 token context window. These are the current generally available models.

How many pages can Claude read at once?

At 1M tokens, Claude can hold roughly 750,000 words of English text — equivalent to approximately 3,000 average pages. In practice, a typical 20-page PDF is roughly 10,000-15,000 tokens, so you could load 60-100 such documents in a single session before approaching the limit.

Does the context window reset between messages?

No — the context window accumulates across an entire conversation session. Every message you send and every response Claude gives adds to the total. The window doesn’t reset between individual messages; it resets when you start a new conversation.

What happens when Claude hits the context window limit?

When a conversation reaches the context window limit, earlier messages begin to drop out of the active context. Claude can no longer reference information from those earlier messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification when you’re approaching the limit.

Is the 1M context window available on the free plan?

The model available to free plan users has access to the 1M context window. However, free plan usage limits mean long-context sessions hit rate limits faster than paid plans. The window is technically available, but sustained heavy use of it is more practical on paid tiers.

What’s the difference between Claude Opus 4.7 and Sonnet 4.6 context windows?

Both have the same 1M token input context window. The difference is max output: Opus 4.7 can generate up to 128,000 tokens in a single response; Sonnet 4.6 caps at 64,000 tokens. For most tasks this distinction doesn’t matter, but for very long document generation or large code outputs, Opus 4.7 has the higher output ceiling.