What is the difference between Claude's context window and memory?

Context window is everything Claude can see in the current session — temporary, disappears when the session ends. Memory (in claude.ai) extracts facts from past conversations and injects a summary into new sessions. They are distinct systems.

Claude Context Window — Every Question Answered (Complete FAQ 2026)

Q: What is Claude's context window size in 2026?

Claude Opus 4.7 and Sonnet 4.6 both have 1,000,000 token context windows. Claude Haiku 4.5 has 200,000 tokens. Opus 4.7 supports 128K max output; Sonnet 4.6 and Haiku 4.5 support 64K max output.

Q: What does 1 million tokens actually hold?

Roughly 750,000 words of English text — about 10 full-length novels, a 50,000-line codebase, or 60-100 research PDFs of 20-30 pages each, all simultaneously.

Q: Does performance degrade at very long context lengths?

Yes, somewhat. The 'lost in the middle' pattern is real — models weight the beginning and end of very long contexts more heavily than the middle. Structure important information at natural reference points for best results.

Q: Does context window size affect API cost?

Only indirectly. You pay for tokens consumed, not for context window capacity. Window size determines whether a request is possible — not what it costs per token.

Updated May 9, 2026 · Sizes verified from Anthropic’s official models page · Based on production use

Context window questions answered from someone who actually uses the 1M token window in production — not from a spec sheet alone.

Covers window sizes by model, what 1M tokens holds, the memory vs context distinction, performance at long context, and API-specific details. Full explainer: Claude Context Window Size 2026

Size Questions

What is Claude’s context window size in 2026?

Model	API String	Context Window	Max Output
Claude Opus 4.7	`claude-opus-4-7`	1,000,000 tokens	128,000 tokens
Claude Sonnet 4.6	`claude-sonnet-4-6`	1,000,000 tokens	64,000 tokens
Claude Haiku 4.5	`claude-haiku-4-5-20251001`	200,000 tokens	64,000 tokens

Source: Anthropic’s official models page, verified May 9, 2026.

What does 1 million tokens actually hold?

~750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
A full mid-size codebase — a 50,000-line Python project with comments
~60–100 research PDFs at 20–30 pages each, all simultaneously
Hours of meeting transcripts — a full workday of recorded calls, transcribed
Our full WordPress site audit — 200+ posts worth of content loaded in one session for comprehensive SEO analysis

The shift from 200K to 1M wasn’t just “more room.” It changed what we could ask Claude to do in a single session — whole-codebase reasoning, multi-document synthesis, full-history context.

How many pages can Claude read at once?

A typical 20-page PDF is roughly 10,000–15,000 tokens, so at 1M tokens you could load 60–100 such documents simultaneously. A 300-page book runs roughly 150,000–200,000 tokens — Claude can hold 5–6 full books in context at once. In practice, the constraint is usually time to upload and your session structure, not the window ceiling.

What’s the difference between context window and memory?

Three distinct things that get conflated:

Context window: Everything Claude can see right now in this session. Temporary — disappears when the session ends.
claude.ai memory: Facts extracted from past conversations and injected as a summary into new sessions. Persistent but compressed — a small snippet in the context, not the full history.
Managed Agents memory stores / Dreaming: Developer-layer knowledge graphs that agents build and refine between sessions. More structured than consumer memory, requires API implementation.

The 1M context window is your working memory for one session. Memory systems are what carry information across sessions — they work by injecting a summary into the new session’s context, not by giving Claude access to the full prior history.

Performance Questions

Does performance degrade at very long context lengths?

The honest answer: yes, somewhat, and it depends on the task. The “lost in the middle” pattern is real — models tend to weight the beginning and end of very long contexts more heavily than the middle. For tasks that require pinpointing specific information buried deep in a 500-page document, performance is lower than for shorter contexts. For tasks that benefit from broad synthesis across a large body of material — architectural review, theme identification, cross-document comparison — long context is a net positive. Structure important information at natural reference points rather than burying it in the middle of a large document.

How does Opus 4.7’s context window differ from Sonnet 4.6?

Same 1M input context window. The difference is max output: Opus 4.7 can generate up to 128,000 tokens in a single response; Sonnet 4.6 caps at 64,000. For most tasks this doesn’t matter. It matters for generating very long documents, large codebases in a single pass, or batch outputs that need to be very long. If you’re not generating 64K+ token outputs, choose between models on capability and cost, not on output ceiling.

What happens when I hit the context window limit?

Earlier messages begin dropping out of the active context. Claude can no longer reference information from those dropped messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification as you approach the limit. In API usage, the context window limit is enforced hard — requests exceeding it return an error.

API and Technical Questions

Is the 1M context window available on the free plan?

The model available to free plan users supports the 1M window technically, but free plan rate limits mean sustained heavy long-context use hits limits quickly. The window is available; using it intensively for extended periods is more practical on paid tiers.

What’s the extended output option on the Batch API?

On the Message Batches API, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 300,000 output tokens using the output-300k-2026-03-24 beta header. This applies only to batch processing — not to synchronous API calls. Useful for large documentation generation, book-length content, or large codebase outputs in batch.

Can I query context window limits programmatically?

Yes. The Models API returns max_input_tokens, max_tokens, and a capabilities object for every available model. If you’re building systems that need to programmatically enforce context limits or route by capability, this is the right way to get current values rather than hardcoding from documentation.

Does context window size affect API cost?

Only indirectly — you pay for tokens consumed, not for context window capacity. A 1M token window doesn’t cost more than a 200K window. You pay for the tokens you actually send and receive. Loading a 500K-token document into context costs the same per token regardless of whether the model has a 200K or 1M window. The window size determines whether the request is possible at all — not what it costs per token.