Claude AI · Fitted Claude
Claude rate limits are the single most complained-about aspect of the product. A viral Reddit post on the topic received over 1,060 upvotes. This guide explains what the limits are at every plan tier, why they exist, and every community-tested strategy for getting more out of your plan before hitting the wall.
Why Rate Limits Exist
Claude’s rate limits are primarily about compute capacity, not money. Running Claude Opus 4.8 on complex tasks requires enormous GPU resources. Anthropic limits usage to ensure consistent performance for all users. The limits are enforced per rolling time window, not per calendar day.
Rate Limits by Plan
Free Plan
Access to Claude Sonnet 4.6 with limited daily usage. Heavy users hit limits after 5-10 substantive prompts. Anthropic adjusts dynamically based on system load.
Claude Pro ($20/month)
Roughly 5x the usage of free. Community consensus: approximately 12 heavy prompts per session before throttling. Light prompts run much longer before hitting limits.
Claude Max 5x ($100/month)
Approximately 5x Pro limit. Claude Code users get roughly 44,000-220,000 tokens per 5-hour window depending on model and task.
Claude Max 20x ($200/month)
20x the Pro limit. Introduced for developers running Claude Code for extended sessions and professionals processing large document volumes daily.
API Rate Limits (Tier 1–4)
API limits are measured in requests per minute (RPM), input tokens per minute (ITPM), and output tokens per minute (OTPM), enforced per model class at the organization level. Your usage tier advances automatically as your cumulative API credit purchases cross each threshold:
| Usage tier |
Credit purchase to advance |
Monthly spend limit |
| Tier 1 |
$5 |
$500 |
| Tier 2 |
$40 |
$500 |
| Tier 3 |
$200 |
$1,000 |
| Tier 4 |
$400 |
$200,000 |
| Monthly Invoicing |
— |
No limit |
Rate limits apply separately per model, so you can run different models up to their respective limits simultaneously. The Opus limit is a single combined pool across all Opus 4.x versions; the Sonnet limit is combined across all Sonnet 4.x versions.
Tier 1
| Model |
RPM |
ITPM |
OTPM |
| Claude Fable 5 |
50 |
100,000 |
20,000 |
| Claude Opus 4.x |
50 |
500,000 |
80,000 |
| Claude Sonnet 4.x |
50 |
30,000 |
8,000 |
| Claude Haiku 4.5 |
50 |
50,000 |
10,000 |
Tier 2
| Model |
RPM |
ITPM |
OTPM |
| Claude Fable 5 |
1,000 |
500,000 |
100,000 |
| Claude Opus 4.x |
1,000 |
2,000,000 |
200,000 |
| Claude Sonnet 4.x |
1,000 |
450,000 |
90,000 |
| Claude Haiku 4.5 |
1,000 |
450,000 |
90,000 |
Tier 3
| Model |
RPM |
ITPM |
OTPM |
| Claude Fable 5 |
2,000 |
1,500,000 |
300,000 |
| Claude Opus 4.x |
2,000 |
5,000,000 |
400,000 |
| Claude Sonnet 4.x |
2,000 |
800,000 |
160,000 |
| Claude Haiku 4.5 |
2,000 |
1,000,000 |
200,000 |
Tier 4
| Model |
RPM |
ITPM |
OTPM |
| Claude Fable 5 |
4,000 |
4,000,000 |
800,000 |
| Claude Opus 4.x |
4,000 |
10,000,000 |
800,000 |
| Claude Sonnet 4.x |
4,000 |
2,000,000 |
400,000 |
| Claude Haiku 4.5 |
4,000 |
4,000,000 |
800,000 |
Cache-aware ITPM: for current models, only uncached input tokens count toward your ITPM limit — cache_read_input_tokens do not. With an 80% cache-hit rate against a 2,000,000 ITPM limit you can effectively process ~10,000,000 total input tokens per minute, so prompt caching is the single best lever for raising effective throughput.
When you hit a limit, the API returns a 429 with a retry-after header (seconds to wait), plus anthropic-ratelimit-* headers showing remaining requests/tokens and reset times. Limits use a token-bucket algorithm — capacity replenishes continuously rather than resetting at a fixed clock time. The Message Batches API and Managed Agents endpoints have their own separate limits.
Community-Tested Workarounds
- Use Projects with persistent system prompts — reduces token overhead per conversation
- Use Sonnet for routine tasks, Opus 4.8 for complex ones, and Fable 5 for the most demanding work — don’t burn your limit budget on tasks Sonnet handles equally well
- Batch related work into single long sessions — starting five conversations uses more overhead than one long one
- Compress your inputs — extract only relevant sections from long documents before pasting
- Use the API for high-volume predictable workflows — more limit-efficient than the consumer interface for automated tasks
Frequently Asked Questions
How many messages can I send on Claude Pro?
No published exact number — depends on message complexity. Community estimates suggest roughly 12 heavy messages per session before throttling begins on Pro.
Do Claude rate limits reset daily?
Rate limits use a rolling time window, not a fixed midnight reset.
Get alerted when Claude pricing or limits change
We track Anthropic’s models, pricing, and limits daily and send a short note when something changes. Occasional, no spam.