Claude Token Limit: Context Windows, Output Limits, and What They Mean in Practice

Claude’s token limits depend on which model you’re using and whether you’re on the web interface or the API. Here are the exact numbers — context window, output limits, and what they mean in practice.

Key distinction: The context window is the total tokens Claude can process in one conversation (input + output combined). The output limit is the maximum tokens in a single response. These are different limits and both matter depending on your use case.

Claude Token Limits by Model (April 2026)

Model Context Window Max Output (API) Max Output (Batch)
Claude Opus 4.6 1,000,000 tokens 32,000 tokens 300,000 tokens*
Claude Sonnet 4.6 1,000,000 tokens 32,000 tokens 300,000 tokens*
Claude Haiku 4.5 200,000 tokens 16,000 tokens 16,000 tokens

* 300K output requires the output-300k-2026-03-24 beta header on the Message Batches API.

What a Token Is

A token is roughly 3–4 characters of English text — about 0.75 words. One page of text is approximately 500–700 tokens. A 200-page book is roughly 100,000–140,000 tokens.

Content Approx. tokens
1 word ~1.3 tokens
1 page of text (~500 words) ~650 tokens
Short novel (80,000 words) ~104,000 tokens
Full codebase (10,000 lines) ~100,000–200,000 tokens
1M token context (Sonnet/Opus) ~750,000 words / ~1,500 pages

Context Window vs. Output Limit

The context window is the total working memory for a session — everything Claude can “see” at once, including the system prompt, all previous messages in the conversation, uploaded files, and Claude’s own prior responses. At 1M tokens, Opus 4.6 and Sonnet 4.6 can hold roughly 1,500 pages of text in context simultaneously.

The output limit is how long Claude’s individual response can be. The standard API limit is 32,000 tokens per response — about 24,000 words, enough for a substantial document. The Batch API with the beta header extends this to 300,000 tokens for document-generation workloads.

Rate Limits: Separate From Token Limits

Token limits are per-conversation. Rate limits are per-time-period — how many tokens (and requests) you can send across multiple conversations in a given minute or day. Rate limits scale with your API usage tier. If you’re hitting errors in production that look like limits, check whether you’re hitting the context window, the output limit, or a rate limit — they produce different error codes. For the full rate limit breakdown, see Claude Rate Limits: What They Are and How to Work Around Them.

What Happens When You Hit the Context Limit

In claude.ai conversations, you’ll see a warning when the conversation is approaching the context window. Claude may summarize earlier parts of the conversation to stay within limits. In the API, sending more tokens than the context window allows returns an error. For very long sessions, breaking work into multiple conversations or using prompt caching (which stores static context at a discount) are the standard approaches.

Frequently Asked Questions

What is Claude’s token limit?

Claude Opus 4.6 and Sonnet 4.6 have a 1 million token context window. Claude Haiku 4.5 has a 200,000 token context window. The maximum output per response is 32,000 tokens on the standard API. These are different limits — context window is total working memory, output limit is maximum response length.

How long can Claude’s responses be?

The standard API output limit is 32,000 tokens per response — approximately 24,000 words. In practice, Claude.ai conversations have shorter limits than the raw API. The Message Batches API with the beta header supports up to 300,000 token outputs for Opus 4.6 and Sonnet 4.6.

How many tokens is a page of text?

Approximately 650 tokens per page (roughly 500 words). A 200-page document is around 130,000 tokens — well within Claude’s 1M context window for Sonnet and Opus, and within Haiku’s 200K window as well.

Need this set up for your team?
Talk to Will →

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *