Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

Q: Does the Batches API count against my standard Messages API rate limits?

No. The Message Batches API has its own rate-limit pool tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.

Q: Can I use extended thinking, tool use, or vision in a batch?

Yes. The Batches API supports vision, tool use including server tools, system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode, Threads parameters, and max_tokens: 0.

Last verified: June 13, 2026

The Message Batches API lets you submit up to 100,000 Claude requests in a single call and receive results asynchronously â€” at exactly 50% of standard token prices. Most batches finish in under an hour. Results remain downloadable for 29 days. This page covers every verified limit, the per-tier rate limit tables, and how batch pricing stacks with prompt caching.

Pricing: 50% off standard rates

Every token processed through the Message Batches API is billed at half the standard input and output price. No quality difference from synchronous requests â€” only timing. The table below shows verified batch prices for active models.

Model	Batch input (per MTok)	Batch output (per MTok)	Standard input (per MTok)	Standard output (per MTok)
Claude Fable 5	$5.00	$25.00	$10.00	$50.00
Claude Opus 4.8	$2.50	$12.50	$5.00	$25.00
Claude Opus 4.7	$2.50	$12.50	$5.00	$25.00
Claude Opus 4.6	$2.50	$12.50	$5.00	$25.00
Claude Opus 4.5	$2.50	$12.50	$5.00	$25.00
Claude Sonnet 4.6	$1.50	$7.50	$3.00	$15.00
Claude Sonnet 4.5	$1.50	$7.50	$3.00	$15.00
Claude Haiku 4.5	$0.50	$2.50	$1.00	$5.00

Source: platform.claude.com/docs/en/build-with-claude/batch-processing

Key limits at a glance

Limit	Value
Maximum requests per batch	100,000
Maximum batch payload size	256 MB
Typical completion time	Under 1 hour
Hard expiration window	24 hours from creation
Result retention period	29 days after creation
Zero Data Retention eligible	No
Results format	JSONL, streamed via `results_url`
Supported models	All active Claude models

A batch expires if processing has not completed within 24 hours. Any individual request within that batch that did not finish is marked expired â€” you are not billed for expired or errored requests. Batch results (the JSONL file) are accessible for download for 29 days after the batch was created; after that the batch object itself is still visible but results can no longer be downloaded.

Message Batches API rate limits by tier

The Message Batches API has its own rate-limit pool, shared across all models, separate from the standard Messages API limits. The “processing queue” count refers to individual batch requests (not batches) that have been submitted but not yet completed by the model.

Tier	RPM (API calls)	Max batch requests in processing queue	Max batch requests per batch
Tier 1	50	100,000	100,000
Tier 2	1,000	200,000	100,000
Tier 3	2,000	300,000	100,000
Tier 4	4,000	500,000	100,000

Source: platform.claude.com/docs/en/api/rate-limits

RPM here limits how fast you can make HTTP requests to the Batches API endpoints (create, retrieve, list, cancel). It does not limit how many individual requests inside a batch are processed per minute â€” that is governed by the queue cap above. If high demand causes processing to slow, more individual requests within a batch may reach the 24-hour expiration limit.

Stacking batch pricing with prompt caching

The Batches API documentation explicitly states that the 50% batch discount and prompt caching discounts stack. Cache writes incur a one-time cost at 1.25x the base input rate (5-minute TTL) or 2x (1-hour TTL); subsequent cache reads cost 0.1x the base input rate. Because batches process asynchronously and may take longer than 5 minutes, Anthropic recommends using the 1-hour cache duration for batch requests that share large context.

The following example uses Claude Opus 4.8 (standard input: $5.00/MTok) to show what each token type costs in a batch with a 1-hour cached system prompt.

Token type	Multiplier applied	Effective price per MTok	How calculated
Uncached input (standard)	1x	$5.00	Baseline
Uncached input (batch)	0.5x	$2.50	50% batch discount
Cache write â€” 1h TTL (batch)	2x Ã— 0.5x = 1x	$5.00	2x write cost, then 50% batch
Cache read (batch)	0.1x Ã— 0.5x = 0.05x	$0.25	10% read cost, then 50% batch
Output (batch)	0.5x of $25.00	$12.50	50% batch discount on output

In practice: if you cache a 50,000-token system prompt once and then read it across 1,000 batch requests, the cache write costs $0.25 (50K tokens at $5.00/MTok effective), while 1,000 cache reads cost $12.50 total (50M tokens at $0.25/MTok). The same 50 million tokens without caching would cost $125 in batch input (50 MTok at the $2.50/MTok batch rate). Cache hit rates on batches vary; Anthropic’s documentation notes typical rates of 30% to 98% depending on traffic patterns, since batch requests are processed concurrently rather than sequentially.

How results come back

When the batch finishes (or the 24-hour limit is reached), a results_url property is set on the batch object. Results are in JSONL format â€” one JSON object per line, in any order (not necessarily matching submission order). Each result carries the custom_id you assigned, plus a result object of type succeeded, errored, canceled, or expired. Streaming the results file rather than downloading it all at once is recommended for large batches. You are not billed for errored, canceled, or expired requests.

Does the Batches API count against my standard Messages API rate limits?

No. The Message Batches API has its own rate-limit pool that is tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.

What happens if my batch does not finish within 24 hours?

Any individual requests within the batch that did not complete are marked expired. You are not billed for those requests. The batch itself moves to ended status and whatever results did complete are available at the results_url.

Can I use extended thinking, tool use, or vision in a batch?

Yes. The Batches API supports vision, tool use (including server tools such as web search and code execution), system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode (speed), Threads parameters, and max_tokens: 0.

How long are batch results available for download?

Results are available for 29 days after the batch was created. After that window, the batch object remains visible in the Console and via the API, but the results file can no longer be downloaded.

Is the Batches API eligible for Zero Data Retention?

No. The Message Batches API is explicitly excluded from Zero Data Retention (ZDR). Data is retained under the feature’s standard retention policy regardless of your organization’s ZDR settings.

What to explore next

AI Strategy

Claude Models Explained: Haiku vs Sonnet vs Opus (June 2026)

Same room

AI in Restoration

How to Evaluate Restoration AI Tools Without Getting Fooled: The Buyer Framework for a Difficult Vendor Environment

Same room

Agency Playbook

Claude Cowork Shows Real Estate Agents Every Angle They Miss in Listing Preparation

You may also explore

Deep dive

AI in Restoration

The Senior Restoration Operator Compensation Question: Why the Old Math Is Producing the Wrong Numbers in 2026

Deep dive

Track the AI tools you actually use

Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.

See the live AI tracker →or set up your alerts

Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

Pricing: 50% off standard rates

Key limits at a glance

Message Batches API rate limits by tier

Stacking batch pricing with prompt caching

How results come back

Does the Batches API count against my standard Messages API rate limits?

What happens if my batch does not finish within 24 hours?

Can I use extended thinking, tool use, or vision in a batch?

How long are batch results available for download?

Is the Batches API eligible for Zero Data Retention?

Comments

Leave a Reply Cancel reply

More posts

AI Agents Are Learning to Check Instead of Guess: The GitHub Context Problem

Logic Apps vs Cloud Workflows: No-Code Automation Across Two Clouds

Azure Static Web Apps vs Firebase Hosting: A Dashboard on Each

Cosmos DB vs Firestore: A Free-Tier Operations Ledger on Both Clouds