Claude Token Limit: Context Windows, Output Limits Explained (2026)

Q: How long can Claude's responses be?

Max output is model-dependent: up to 128,000 tokens on Opus 4.8 and Fable 5, 64,000 on Sonnet 4.6 and Haiku 4.5. Outputs above ~16,000 tokens should be streamed.

Last verified: June 13, 2026

Model Accuracy Note — Updated June 9, 2026

Current models (June 2026): Fable 5 · Opus 4.8 · Sonnet 5 · Haiku 4.5. Current model tracker →

Claude AI · Fitted Claude

Claude’s token limits depend on which model you’re using and whether you’re on the web interface or the API. Here are the exact numbers — context window, output limits, and what they mean in practice.

Key distinction: The context window is the total tokens Claude can process in one conversation (input + output combined). The output limit is the maximum tokens in a single response. These are different limits and both matter depending on your use case.

Claude Token Limits by Model (June 2026)

Model	Context Window	Max Output
Claude Fable 5	1,000,000 tokens	128,000 tokens
Claude Opus 4.8	1,000,000 tokens	128,000 tokens
Claude Sonnet 4.6	1,000,000 tokens	64,000 tokens
Claude Haiku 4.5	200,000 tokens	64,000 tokens

Context windows are at standard API pricing (no long-context premium). Claude Opus 4.7 and 4.6 are legacy models that match the 4.8 context limits (1M / 128K) but are no longer the recommended Opus tier â€” use Opus 4.8 for new projects. Outputs above ~16,000 tokens should be streamed to avoid request timeouts. The original Opus 4 and Sonnet 4 were retired on June 15, 2026 — see the deprecation guide. The asynchronous Message Batches API runs at 50% cost and supports larger per-request outputs for document-generation workloads.

What a Token Is

A token is roughly 3–4 characters of English text — about 0.75 words. One page of text is approximately 500–700 tokens. A 200-page book is roughly 100,000–140,000 tokens.

Content	Approx. tokens
1 word	~1.3 tokens
1 page of text (~500 words)	~650 tokens
Short novel (80,000 words)	~104,000 tokens
Full codebase (10,000 lines)	~100,000–200,000 tokens
1M token context (Sonnet/Opus)	~750,000 words / ~1,500 pages

Context Window vs. Output Limit

The context window is the total working memory for a session — everything Claude can “see” at once, including the system prompt, all previous messages in the conversation, uploaded files, and Claude’s own prior responses. At 1M tokens, the current models (Fable 5, Opus 4.8, Sonnet 4.6) can hold roughly 1,500 pages of text in context simultaneously.

The output limit is how long Claude’s individual response can be, and it is model-dependent: up to 128,000 tokens on Claude Opus 4.8 and Fable 5, and 64,000 tokens on Sonnet 4.6 and Haiku 4.5. Outputs above roughly 16,000 tokens should be streamed to avoid request timeouts.

Rate Limits: Separate From Token Limits

Token limits are per-conversation. Rate limits are per-time-period — how many tokens (and requests) you can send across multiple conversations in a given minute or day. Rate limits scale with your API usage tier. If you’re hitting errors in production that look like limits, check whether you’re hitting the context window, the output limit, or a rate limit — they produce different error codes. For the full rate limit breakdown, see Claude Rate Limits: What They Are and How to Work Around Them.

What Happens When You Hit the Context Limit

In claude.ai conversations, you’ll see a warning when the conversation is approaching the context window. Claude may summarize earlier parts of the conversation to stay within limits. In the API, sending more tokens than the context window allows returns an error. For very long sessions, breaking work into multiple conversations or using prompt caching (which stores static context at a discount) are the standard approaches.

Frequently Asked Questions

What is Claude’s token limit?

Claude Fable 5, Opus 4.8, and Sonnet 4.6 have a 1 million token context window. Claude Haiku 4.5 has a 200,000 token context window. Maximum output per response is model-dependent — 128,000 tokens on Opus 4.8 and Fable 5, 64,000 on Sonnet 4.6 and Haiku 4.5. These are different limits — context window is total working memory, output limit is maximum response length.

How long can Claude’s responses be?

The maximum output is model-dependent — up to 128,000 tokens on Claude Opus 4.8 and Fable 5, and 64,000 tokens on Sonnet 4.6 and Haiku 4.5 (roughly 48,000–96,000 words). Outputs above ~16,000 tokens should be streamed. In practice, claude.ai conversations have shorter limits than the raw API.

How many tokens is a page of text?

Approximately 650 tokens per page (roughly 500 words). A 200-page document is around 130,000 tokens — well within Claude’s 1M context window for Sonnet and Opus, and within Haiku’s 200K window as well.

How many tokens can Claude Opus 4.8 handle?

Claude Opus 4.8 supports a 1 million token context window for input and up to 128,000 tokens for output. This means you can feed in very large documents, entire codebases, or long conversation histories in a single request.

What happens when you hit Claude’s context window limit?

When your input exceeds Claude’s context window, the API returns an error and the request fails. In claude.ai, the interface warns you before sending. The fix is to summarize earlier conversation turns, split the input into chunks, or use a model with a larger context window.

Is Claude Haiku 4.5 context window smaller than Opus?

Yes. Claude Haiku 4.5 has a 200,000 token context window, compared to 1 million tokens for Opus 4.8 and Sonnet 4.6. For most tasks Haiku’s 200K window is sufficient, but for very long documents or large codebases, Sonnet 4.6 or Opus 4.8 is required.

Need this set up for your team?
Talk to Will →

What to explore next

AI Strategy

Current Claude Model Version Tracker — June 2026

Same room

AI Strategy

Claude Code Plan Mode: How to Use It, When to Skip It (2026 Guide)

Same room

The Archive

The Expert-in-the-Loop Imperative: Why 95% of Enterprise AI Fails Without Human Circuit Breakers

You may also explore

Deep dive

Everett Food & Drink

Middleton Brewing: South Everett’s Nano-Brewpub Is the Fruit Ale Spot the Rest of the City Forgot to Tell You About

Deep dive

Track the AI tools you actually use

Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.

See the live AI tracker →or set up your alerts

Claude Token Limit: Context Windows, Output Limits, and What They Mean in Practice

Claude Token Limits by Model (June 2026)

What a Token Is

Context Window vs. Output Limit

Rate Limits: Separate From Token Limits

What Happens When You Hit the Context Limit

Frequently Asked Questions

What is Claude’s token limit?

How long can Claude’s responses be?

How many tokens is a page of text?

How many tokens can Claude Opus 4.8 handle?

What happens when you hit Claude’s context window limit?

Is Claude Haiku 4.5 context window smaller than Opus?

Comments

Leave a Reply Cancel reply

More posts

AI Agents Are Learning to Check Instead of Guess: The GitHub Context Problem

Logic Apps vs Cloud Workflows: No-Code Automation Across Two Clouds

Azure Static Web Apps vs Firebase Hosting: A Dashboard on Each

Cosmos DB vs Firestore: A Free-Tier Operations Ledger on Both Clouds