What are the rate limits for Claude Managed Agents?

60 requests per minute for create endpoints (session creation) and 600 requests per minute for read endpoints. Organization-level API limits also apply.

Will 60 create requests per minute limit most agent workloads?

No, for most production workloads. 60 sessions/minute means 3,600 sessions per hour — very high volume. Most architectures run fewer, longer sessions rather than many short sessions. Token throughput limits typically bind before create limits for typical workloads.

What should I do if I need more than 60 Managed Agents sessions per minute?

Two options: redesign to batch more work within each session rather than creating new sessions per task, or contact Anthropic enterprise sales for rate limit increases for high-volume applications.

Do Agent Teams count against the create rate limit?

Verify against current documentation for precise treatment of Agent Team member session creation vs. standard session creation. Subagents operate within the parent session context and behave differently from independent sessions.

Claude AI Limits - Tygart Media

Last refreshed: June 20, 2026

Looking for quick answers? The FAQ version covers every common question directly.

→ Context Window FAQ

Claude’s context window is one of those specs that sounds simple until you actually need to use it. “1 million tokens” means almost nothing without a frame of reference. This is the guide we wish existed when we started building on Claude — written from our own experience running it in production, with numbers pulled directly from Anthropic’s official documentation.

Quick Definition

The context window is Claude’s working memory for a conversation. It holds everything Claude can see and reason about at once: your messages, Claude’s responses, any documents you’ve shared, and system prompts. When the window fills up, earlier content drops out.

Current Context Window Sizes by Model (June 2026)

These numbers come directly from Anthropic’s official models page, fetched May 9, 2026. Model strings are exact API identifiers:

Model	API String	Context Window	Max Output
Claude Fable 5	claude-fable-5	1,000,000 tokens	128,000 tokens
Claude Opus 4.8	claude-opus-4-8	1,000,000 tokens	128,000 tokens
Claude Sonnet 5	claude-sonnet-4-6	1,000,000 tokens	64,000 tokens
Claude Haiku 4.5	claude-haiku-4-5-20251001	200,000 tokens	64,000 tokens

Fable 5, Opus 4.8, and Sonnet 5 all have the full 1M token context window. Haiku 4.5 is 200K. The key difference between Opus 4.8 and Sonnet 5 in this table is the max output — Opus 4.8 can write up to 128K tokens in a single response, Sonnet 5 caps at 64K.

What Does 1 Million Tokens Actually Hold?

Token counts are an abstraction. Here’s what 1 million tokens translates to in practical terms:

About 750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
A full mid-size codebase — a 50,000-line Python project with comments fits comfortably
Hours of meeting transcripts — a full workday of recorded calls, transcribed, fits in one context window
Multiple large documents simultaneously — 10 research PDFs at 30 pages each, all in the same conversation
Long conversation histories — hundreds of back-and-forth exchanges before anything starts dropping off

We’ve loaded entire Notion exports, full project histories, and multi-document research packs into a single Claude session. At 1M tokens, you’re unlikely to hit the ceiling in a normal working session. You hit it when you’re doing things like: loading your entire codebase plus documentation plus conversation history and then asking Claude to do a full architectural review.

Context Window vs. Memory: What’s the Difference?

This is where a lot of people get confused. The context window and memory are not the same thing:

Context window: What Claude can see right now, in this session. Once a session ends, it’s gone.
Memory (in claude.ai): A separate system that extracts and stores key information from past sessions. It surfaces relevant facts into future conversations as a snippet in the context.
Managed Agents memory stores: A developer-layer construct where agents maintain and update knowledge bases across sessions — distinct from both the context window and the consumer memory feature.

The 1M token context window is your working memory for one session. It doesn’t persist. Memory systems are what carry information across sessions — but they work by injecting a summary into the context window of the new session, not by giving Claude access to the full history.

Does a Bigger Context Window Mean Better Performance?

Mostly yes, with one important nuance. More context means Claude has more information to reason about, which generally produces better outputs for tasks that benefit from full context — code reviews, document synthesis, long-form writing, multi-document comparison.

The nuance: performance can degrade on tasks involving specific information buried deep in a very long context. This is sometimes called the “lost in the middle” problem — models tend to pay more attention to the beginning and end of a long context than the middle. Anthropic has worked on this with Claude’s architecture, and it performs well on long-context tasks, but it’s worth structuring important information at natural reference points rather than burying it in the middle of a 500-page document.

How We Actually Use the 1M Token Window

We run Claude in production for content operations, site management, and agentic coding workflows. Here’s where the 1M context window makes a concrete difference in our work:

Full site audits: Loading every post from a WordPress site (200+ posts worth of content) into one session for comprehensive SEO analysis — without having to chunk and re-prompt
Cross-session context: Pasting in long Notion briefings, prior session transcripts, and the current task in one go. The window is large enough that we don’t have to decide what to leave out.
Codebase-wide reasoning: In Claude Code, having the full project context means Claude can make changes that account for how files interact rather than reasoning only about the current file
Multi-document synthesis: Research projects where we load 10-15 source documents and ask Claude to synthesize across them — something that was impossible at 100K context windows

The practical shift from 200K to 1M tokens wasn’t just “more room.” It changed what we could ask Claude to do in a single session.

Context Window on the API: Batch Output Extension

For API users: on the Message Batches API, Fable 5, Opus 4.8, and Sonnet 5 support up to 300K output tokens using the output-300k-2026-03-24 beta header. This is relevant for batch generation tasks where you need very long outputs — documentation generation, large codebases, book-length content.

Frequently Asked Questions

What is Claude’s context window in 2026?

Claude Fable 5, Claude Opus 4.8, and Claude Sonnet 5 all have 1,000,000 token (1M token) context windows as of June 2026. Claude Haiku 4.5 has a 200,000 token context window. These are the current generally available models.

How many pages can Claude read at once?

At 1M tokens, Claude can hold roughly 750,000 words of English text — equivalent to approximately 3,000 average pages. In practice, a typical 20-page PDF is roughly 10,000-15,000 tokens, so you could load 60-100 such documents in a single session before approaching the limit.

Does the context window reset between messages?

No — the context window accumulates across an entire conversation session. Every message you send and every response Claude gives adds to the total. The window doesn’t reset between individual messages; it resets when you start a new conversation.

What happens when Claude hits the context window limit?

When a conversation reaches the context window limit, earlier messages begin to drop out of the active context. Claude can no longer reference information from those earlier messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification when you’re approaching the limit.

Is the 1M context window available on the free plan?

The model available to free plan users has access to the 1M context window. However, free plan usage limits mean long-context sessions hit rate limits faster than paid plans. The window is technically available, but sustained heavy use of it is more practical on paid tiers.

What’s the difference between Claude Opus 4.8 and Sonnet 5 context windows?

Both have the same 1M token input context window. The difference is max output: Opus 4.8 can generate up to 128,000 tokens in a single response; Sonnet 5 caps at 64,000 tokens. For most tasks this distinction doesn’t matter, but for very long document generation or large code outputs, Opus 4.8 has the higher output ceiling.

METHODS · OBSERVATIONS · RESULTS

You’re planning to run Claude Managed Agents at scale. You’ve modeled the token costs, the session-hour charge, the workload cadence. Then you hit the actual constraint: rate limits. Here’s what 60 requests per minute actually means in practice, and whether it’s going to be your ceiling.

The Two Limits You Need to Know

Managed Agents has two endpoint-specific rate limits, separate from your standard Claude API limits:

Create endpoints: 60 requests per minute
Read endpoints: 600 requests per minute

Your organization-level API limits apply on top of these. If your org is on a tier with a lower requests-per-minute ceiling, that’s the actual binding constraint.

What “60 Create Requests Per Minute” Actually Means

A create request, in Managed Agents context, is typically a session creation call — starting a new agent session. 60/minute means you can start 60 sessions per minute maximum. For almost all real workloads, this is not the binding constraint. Here’s why:

Think about what generates create requests. If you’re running a batch pipeline that starts one new agent session per content item, processing 60 items per minute would saturate the limit. But a 60-item-per-minute content pipeline is running 3,600 items per hour — a genuinely high-volume operation. Most production agent workloads don’t look like this. They look like one session that runs for minutes or hours, processes multiple tasks within that session, and terminates when done.

The create limit matters most for architectures where you’re spinning up a new session per task rather than running tasks within a persistent session. If that’s your pattern, 60/minute is a hard ceiling you’ll need to design around.

What “600 Read Requests Per Minute” Actually Means

Read requests include polling session status, reading agent output, checking checkpoints, and retrieving session state. 600/minute is a relatively generous limit — that’s 10 reads per second. For a monitoring dashboard polling 10 active sessions every second, you’d hit this. For most production monitoring patterns (checking status every 5-30 seconds per session), you’re well under the ceiling.

The read limit becomes relevant in high-concurrency architectures where many sessions are running in parallel and all being polled aggressively. If you’re running 50 concurrent agents and checking each one every 2 seconds, that’s 25 reads/second — still within the 10 reads/second limit per second, but compressing toward it.

The Limit That’s More Likely to Actually Stop You

For most agent workloads, token throughput limits hit before request rate limits do. The reasoning: a long-running agent session processing significant context generates a lot of tokens. If you’re running many such sessions in parallel, you’ll hit your organization’s token-per-minute limit before you hit 60 sessions created per minute.

Token limits depend on your API tier. Higher tiers have higher token throughput limits. Rate limit increases and custom limits for high-volume enterprise customers are negotiated with Anthropic’s sales team.

Designing Around the 60 Create Limit

If your architecture genuinely needs more than 60 new sessions per minute, the primary design pattern is batching more work within each session rather than creating more sessions. A single Managed Agents session can handle sequential tasks — you don’t need a new session per task if your tasks can be queued and processed within one session’s lifecycle.

The tradeoff: longer-running sessions accumulate more runtime charge ($0.08/hr active). For most workloads, the efficiency gains from batching outweigh the marginal runtime cost.

The Agent Teams Implication

Agent Teams — Managed Agents’ multi-agent coordination feature — coordinate multiple Claude instances with independent contexts. Each instance in an Agent Team is a separate entity from a context standpoint. How Agent Team member sessions count against the create rate limit is worth verifying against current documentation if you’re architecting a high-concurrency Agent Teams deployment.

For Enterprise Workloads

If you’re evaluating Managed Agents for enterprise-scale deployment and the published limits don’t fit your volume requirements, contact Anthropic’s enterprise sales team. Rate limit increases for high-volume applications are a documented option — they’re negotiated, not self-serve.

Contact: [email protected] or through the Claude Console.

Frequently Asked Questions

Does the 60 requests/minute limit apply to all API calls or just session creation?

The 60/minute limit applies to create endpoints — session creation being the primary one. Read operations have a separate 600/minute limit. Standard Messages API calls are governed by your organization’s standard tier limits, not these Managed Agents-specific limits.

Do subagents count against the create rate limit separately from the parent session?

Subagents operate within the parent session’s context and report results upward — they’re architecturally different from new sessions. Verify current documentation for precise billing treatment of subagent creation calls vs. Agent Team session creation.

What happens when I hit the rate limit?

Standard API rate limit behavior applies — requests over the limit receive a 429 response. Implement exponential backoff in your session creation logic for any high-volume pattern that approaches the 60/minute ceiling.

How does this compare to OpenAI’s Agents API limits?

Rate limit structures differ by product and tier. Direct comparison requires checking both providers’ current documentation for your specific tier. The full comparison: Claude Managed Agents vs. OpenAI Agents API.

Full pricing context including rate limits: Claude Managed Agents Complete Pricing Reference. All questions: Claude Managed Agents FAQ.

Tag: Claude AI Limits

Claude Context Window Size 2026: What 1 Million Tokens Actually Means

Current Context Window Sizes by Model (June 2026)

What Does 1 Million Tokens Actually Hold?

Context Window vs. Memory: What’s the Difference?

Does a Bigger Context Window Mean Better Performance?

How We Actually Use the 1M Token Window

Context Window on the API: Batch Output Extension

Frequently Asked Questions

What is Claude’s context window in 2026?

How many pages can Claude read at once?

Does the context window reset between messages?

What happens when Claude hits the context window limit?

Is the 1M context window available on the free plan?

What’s the difference between Claude Opus 4.8 and Sonnet 5 context windows?

Claude Managed Agents Rate Limits — What 60 Requests Per Minute Means in Practice

The Two Limits You Need to Know

What “60 Create Requests Per Minute” Actually Means

What “600 Read Requests Per Minute” Actually Means

The Limit That’s More Likely to Actually Stop You

Designing Around the 60 Create Limit

The Agent Teams Implication

For Enterprise Workloads

Frequently Asked Questions

Does the 60 requests/minute limit apply to all API calls or just session creation?

Do subagents count against the create rate limit separately from the parent session?

What happens when I hit the rate limit?

How does this compare to OpenAI’s Agents API limits?