Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

About Will

I run a multi-site content operation on Claude and Notion with autonomous agents — and I write about what we do, including what breaks.

Connect on LinkedIn →

Last verified: June 13, 2026

The Message Batches API lets you submit up to 100,000 Claude requests in a single call and receive results asynchronously — at exactly 50% of standard token prices. Most batches finish in under an hour. Results remain downloadable for 29 days. This page covers every verified limit, the per-tier rate limit tables, and how batch pricing stacks with prompt caching.

Pricing: 50% off standard rates

Every token processed through the Message Batches API is billed at half the standard input and output price. No quality difference from synchronous requests — only timing. The table below shows verified batch prices for active models.

Model Batch input (per MTok) Batch output (per MTok) Standard input (per MTok) Standard output (per MTok)
Claude Fable 5 $5.00 $25.00 $10.00 $50.00
Claude Opus 4.8 $2.50 $12.50 $5.00 $25.00
Claude Opus 4.7 $2.50 $12.50 $5.00 $25.00
Claude Opus 4.6 $2.50 $12.50 $5.00 $25.00
Claude Opus 4.5 $2.50 $12.50 $5.00 $25.00
Claude Sonnet 4.6 $1.50 $7.50 $3.00 $15.00
Claude Sonnet 4.5 $1.50 $7.50 $3.00 $15.00
Claude Haiku 4.5 $0.50 $2.50 $1.00 $5.00

Source: platform.claude.com/docs/en/build-with-claude/batch-processing

Key limits at a glance

Limit Value
Maximum requests per batch 100,000
Maximum batch payload size 256 MB
Typical completion time Under 1 hour
Hard expiration window 24 hours from creation
Result retention period 29 days after creation
Zero Data Retention eligible No
Results format JSONL, streamed via results_url
Supported models All active Claude models

A batch expires if processing has not completed within 24 hours. Any individual request within that batch that did not finish is marked expired — you are not billed for expired or errored requests. Batch results (the JSONL file) are accessible for download for 29 days after the batch was created; after that the batch object itself is still visible but results can no longer be downloaded.

Message Batches API rate limits by tier

The Message Batches API has its own rate-limit pool, shared across all models, separate from the standard Messages API limits. The “processing queue” count refers to individual batch requests (not batches) that have been submitted but not yet completed by the model.

Tier RPM (API calls) Max batch requests in processing queue Max batch requests per batch
Tier 1 50 100,000 100,000
Tier 2 1,000 200,000 100,000
Tier 3 2,000 300,000 100,000
Tier 4 4,000 500,000 100,000

Source: platform.claude.com/docs/en/api/rate-limits

RPM here limits how fast you can make HTTP requests to the Batches API endpoints (create, retrieve, list, cancel). It does not limit how many individual requests inside a batch are processed per minute — that is governed by the queue cap above. If high demand causes processing to slow, more individual requests within a batch may reach the 24-hour expiration limit.

Stacking batch pricing with prompt caching

The Batches API documentation explicitly states that the 50% batch discount and prompt caching discounts stack. Cache writes incur a one-time cost at 1.25x the base input rate (5-minute TTL) or 2x (1-hour TTL); subsequent cache reads cost 0.1x the base input rate. Because batches process asynchronously and may take longer than 5 minutes, Anthropic recommends using the 1-hour cache duration for batch requests that share large context.

The following example uses Claude Opus 4.8 (standard input: $5.00/MTok) to show what each token type costs in a batch with a 1-hour cached system prompt.

Token type Multiplier applied Effective price per MTok How calculated
Uncached input (standard) 1x $5.00 Baseline
Uncached input (batch) 0.5x $2.50 50% batch discount
Cache write — 1h TTL (batch) 2x × 0.5x = 1x $5.00 2x write cost, then 50% batch
Cache read (batch) 0.1x × 0.5x = 0.05x $0.25 10% read cost, then 50% batch
Output (batch) 0.5x of $25.00 $12.50 50% batch discount on output

In practice: if you cache a 50,000-token system prompt once and then read it across 1,000 batch requests, the cache write costs $0.25 (50K tokens at $5.00/MTok effective), while 1,000 cache reads cost $12.50 total (50M tokens at $0.25/MTok). The same 50 million tokens without caching would cost $125 in batch input (50 MTok at the $2.50/MTok batch rate). Cache hit rates on batches vary; Anthropic’s documentation notes typical rates of 30% to 98% depending on traffic patterns, since batch requests are processed concurrently rather than sequentially.

How results come back

When the batch finishes (or the 24-hour limit is reached), a results_url property is set on the batch object. Results are in JSONL format — one JSON object per line, in any order (not necessarily matching submission order). Each result carries the custom_id you assigned, plus a result object of type succeeded, errored, canceled, or expired. Streaming the results file rather than downloading it all at once is recommended for large batches. You are not billed for errored, canceled, or expired requests.

Does the Batches API count against my standard Messages API rate limits?

No. The Message Batches API has its own rate-limit pool that is tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.

What happens if my batch does not finish within 24 hours?

Any individual requests within the batch that did not complete are marked expired. You are not billed for those requests. The batch itself moves to ended status and whatever results did complete are available at the results_url.

Can I use extended thinking, tool use, or vision in a batch?

Yes. The Batches API supports vision, tool use (including server tools such as web search and code execution), system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode (speed), Threads parameters, and max_tokens: 0.

How long are batch results available for download?

Results are available for 29 days after the batch was created. After that window, the batch object remains visible in the Console and via the API, but the results file can no longer be downloaded.

Is the Batches API eligible for Zero Data Retention?

No. The Message Batches API is explicitly excluded from Zero Data Retention (ZDR). Data is retained under the feature’s standard retention policy regardless of your organization’s ZDR settings.

Track the AI tools you actually use
Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.
See the live AI tracker →or set up your alerts

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *