What is the rate limit for Claude Managed Agents?

60 requests per minute for create endpoints, 600 requests per minute for read endpoints. Organization-level limits still apply.

Are there volume discounts for Claude Managed Agents?

Volume discounts are available for high-volume users but are negotiated case-by-case through Anthropic enterprise sales.

What does web search cost in a Managed Agents session?

$10 per 1,000 searches ($0.01 per search), billed separately from session runtime and token costs.

Claude Managed Agents — Complete Pricing Reference + Dreaming Update (May 2026)

Q: Is the $0.08/session-hour charge in addition to token costs?

Yes, in addition to. You pay both standard token rates for all input and output tokens, plus $0.08 per hour of active session runtime as separate line items.

Last refreshed: May 15, 2026

May 2026 Update — Dreaming Feature + Beta Status

Anthropic introduced Dreaming at Code w/ Claude (May 6, 2026) — a new Managed Agents capability where agents review their own session history overnight to improve future performance. Harvey (legal AI) reported a roughly 6× task completion rate increase after implementing it. Dreaming is developer-access preview only. Multiagent Orchestration and Outcomes are now in public beta. See the new Dreaming section below.

What Is Claude Managed Agents? (Current Status, May 2026)

Claude Managed Agents is Anthropic’s framework for long-running, stateful AI agents — agents that can maintain context across sessions, hand off between sub-agents, and now, improve themselves by reviewing their own work history. Here’s the current status of each component:

Component	Status	Who Has Access
Multiagent Orchestration	Public Beta	All API developers
Outcomes	Public Beta	All API developers
Dreaming	Developer Preview	Selected developers only

Dreaming: The Feature the Press Mostly Missed

Announced at Code w/ Claude on May 6, 2026, Dreaming is a Managed Agents capability that lets agents review and reorganize their own memory between sessions. The mechanism:

After a session ends, the agent reads its existing memory store alongside the session transcripts
It produces a new, reorganized memory store: duplicates merged, stale entries replaced, new patterns surfaced
The next session starts with a higher-quality knowledge base — capturing insights no single session could hold

This is meaningfully different from simply persisting conversation history. The agent isn’t just remembering what happened — it’s synthesizing what it learned. Think of it as the difference between taking notes and actually reviewing and reorganizing your notes the next morning.

The Harvey Result

Harvey, the legal AI company, reported approximately a 6× task completion rate increase after implementing Dreaming in their Managed Agents workflow. Harvey’s use case — complex legal research that spans multiple sessions with evolving context — is exactly the kind of work Dreaming was designed for. Sessions build on each other rather than starting fresh each time.

Dreaming is developer-access preview as of May 2026. Docs: platform.claude.com/docs/en/managed-agents/dreams.

What Dreaming Is Not

A few clarifications worth making explicit:

Dreaming is not available to end users — it’s a developer-layer capability requiring implementation
It’s not persistent memory in the claude.ai chat interface
It’s not available to free or standard Pro subscribers through any interface
It’s a developer preview, not GA — expect it to evolve before full release

Our Take: Why This Architecture Matters

We run Managed Agents in our own Cowork workflows. The Dreaming announcement is the first time Anthropic has shipped something that resembles how expert human knowledge actually compounds over time — not by accumulating raw notes, but by periodically synthesizing and reorganizing what’s been learned into a cleaner structure.

The Harvey 6× result is a real-world data point from a production legal AI workflow. That’s not a benchmark number — it’s a deployed system showing measurable improvement from session-to-session memory refinement. Whether that 6× figure holds across different use cases is unknown, but the direction of the effect is the signal: agents that learn from their own history outperform agents that don’t.

For non-developer users watching this space: Dreaming is the preview of what agentic AI will look like when it becomes mainstream. The groundwork being laid now in developer preview will eventually surface in subscription-tier products.

Model Accuracy Note — Updated May 2026

Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

By Will Tygart
• Long-form Position
• Practitioner-grade

You opened this tab because you need a number you can actually use. Not a vibe, not “it depends.” A real pricing breakdown you can put in a spreadsheet, a budget request, or a Slack message to your CTO.

This is that page. Every pricing variable for Claude Managed Agents in one place, verified against Anthropic’s current documentation as of April 2026. Bookmark it. The beta will update; so will this.

Quick Reference: The Formula

Total Cost = Token Costs + Session Runtime ($0.08/hr) + Optional Tools
Session runtime only accrues while status = running. Idle time is free.

The Two Cost Dimensions

Claude Managed Agents bills on exactly two dimensions: tokens and session runtime. Every pricing question you have collapses into one of these two buckets.

Dimension 1: Token Costs

These are identical to standard Claude API pricing. You pay the same rates you’d pay calling the Messages API directly. No Managed Agents markup on tokens. Current rates for the models most commonly used in agent work:

Claude Sonnet 4.6: ~$3/million input tokens, ~$15/million output tokens
Claude Opus 4.7: higher rates apply — check platform.claude.com/docs/en/about-claude/pricing for current figures
Prompt caching: same multipliers as standard API — cache hits dramatically reduce input token costs on long sessions with stable system prompts

The implication: a token-heavy agent with a large system prompt that runs the same context repeatedly benefits significantly from prompt caching, and that benefit carries over unchanged into Managed Agents.

Dimension 2: Session Runtime — $0.08/Session-Hour

This is the Managed Agents-specific charge. You pay $0.08 per hour of active session runtime, metered to the millisecond.

The critical word is active. Runtime only accrues while your session’s status is running. The following do not count toward your bill:

Time spent waiting for your next message
Time waiting for a tool confirmation
Idle time between tasks
Rescheduling delays
Terminated session time

This is not how you’d bill a virtual machine. It’s closer to how AWS Lambda bills — you pay for execution, not reservation. An agent that “runs” for 8 hours but spends 6 of those hours waiting on human input has a very different bill than one running continuous autonomous loops.

Optional Tool Costs

Web Search: $10 per 1,000 Searches

If your agent uses web search, each search costs $10/1,000 — that’s $0.01 per search. For most agents, this is negligible. For a research agent running hundreds of searches per session, it becomes a line item worth modeling separately.

Code Execution: Included in Session Runtime

Code execution containers are included in your $0.08/session-hour charge. You’re not separately billed for container hours on top of session runtime. This is explicitly stated in Anthropic’s docs and represents meaningful savings versus provisioning your own compute.

Worked Cost Examples

Example 1: Daily Research Agent

Runs once per day. 30 minutes of active execution. Processes 10 documents, outputs a summary report. Moderate token volume.

Session runtime: 0.5 hrs × $0.08 = $0.04/day (~$1.20/month)
Tokens (estimate): 50K input + 5K output with Sonnet 4.6 = ~$0.23/run (~$7/month)
Total: ~$8–10/month

Example 2: Weekly Batch Content Pipeline

Runs 3x/week. 2-hour active sessions. Processes multiple documents, generates structured outputs.

Session runtime: 2 hrs × $0.08 × 12 sessions/month = $1.92/month
Tokens: depends on content volume — typically $10–40/month
Total: ~$12–42/month

Example 3: Customer Support Agent (Business Hours)

Active during business hours, handling tickets. 8 hours/day active, 5 days/week.

Session runtime: 8 hrs × $0.08 × 22 days = $14.08/month in runtime
Tokens: highly variable by ticket volume — the dominant cost driver at scale
Runtime cost alone: ~$14/month — tokens are likely 5–20x this depending on volume

Example 4: 24/7 Always-On Agent

The maximum theoretical runtime exposure. Continuous operation, no idle time.

Session runtime: 24 hrs × $0.08 × 30 days = $57.60/month
In practice, no agent has zero idle time — real cost will be lower
Token costs at this scale become the dominant factor by a wide margin

Anthropic’s Official Example (from their docs)

A one-hour coding session using Claude Opus 4.7 consuming 50,000 input tokens and 15,000 output tokens: session runtime = $0.08. With prompt caching active and 40,000 of those tokens as cache reads, the token costs drop significantly. The runtime charge stays flat at $0.08 regardless of caching.

What’s Not Billed in Managed Agents

A few things that might seem like costs but aren’t:

Infrastructure provisioning: Anthropic handles hosting, scaling, and monitoring at no additional charge
Container hours: Explicitly not separately billed on top of session runtime
State management and checkpointing: Included in the session runtime charge
Error recovery and retry logic: Anthropic’s infrastructure problem, not yours

Rate Limits

Managed Agents has specific rate limits separate from standard API limits:

Create endpoints: 60 requests/minute
Read endpoints: 600 requests/minute
Organization-level limits still apply
For higher limits, contact Anthropic enterprise sales

How to Access Managed Agents Pricing

Managed Agents is available to all Anthropic API accounts in public beta. No separate signup, no premium tier gate. You need the managed-agents-2026-04-01 beta header in your API requests — the Claude SDK adds this automatically.

For high-volume agent applications, Anthropic’s enterprise sales team negotiates custom pricing arrangements. Contact them at [email protected] or through the Claude Console.

The Pricing Signals Worth Noting

Anthropic recently ended Claude subscription access (Pro/Max) for third-party agent frameworks, requiring those users to switch to pay-as-you-go API pricing. This signals a deliberate strategy: consumer subscriptions are for human-paced interactions; agent workloads route through the API. The $0.08/session-hour rate exists in that context — it’s infrastructure pricing for compute that runs beyond human attention spans.

The session-hour model also signals something about Anthropic’s infrastructure cost structure. They’re pricing on active execution time because that’s what actually taxes their systems. Idle sessions don’t cost them much; active agents do. The billing model follows the actual resource consumption pattern.

Frequently Asked Questions

Is the $0.08/session-hour charge in addition to token costs, or does it replace them?

In addition to. You pay both: standard token rates for all input and output tokens, plus $0.08 per hour of active session runtime. They’re separate line items.

Does prompt caching work in Managed Agents sessions?

Yes. Prompt caching multipliers apply identically to Managed Agents sessions as they do to standard API calls. If your agent has a large, stable system prompt, caching it can significantly reduce input token costs.

What happens if my session crashes? Am I billed for the crashed time?

Runtime accrues only while status is running. Terminated sessions stop accruing. Anthropic’s infrastructure handles checkpointing and crash recovery — the session state is preserved even if the session terminates unexpectedly.

Can I use Managed Agents on the free API tier?

Managed Agents is available to all Anthropic API accounts in public beta, but standard tier access and rate limits apply. Free API tier users receive a small credit for testing.

How does this compare to running agents on my own infrastructure?

See our full breakdown: Build vs. Buy: The Real Infrastructure Cost of Claude Managed Agents. Short version: the $0.08/hour is almost certainly cheaper than provisioning and maintaining equivalent compute, but you trade control and data locality for that simplicity.

Are there volume discounts?

Volume discounts are available for high-volume users but negotiated case-by-case. Contact Anthropic enterprise sales.

Does web search billing count against the $10/1,000 rate if the search returns no results?

Anthropic’s current docs don’t explicitly address failed searches. Treat any triggered search as billable until confirmed otherwise.

For the full session-hour math worked out by workload type, see: Claude Managed Agents Pricing, Decoded: What a Session-Hour Actually Costs You. For the build-vs-buy infrastructure comparison: Build vs. Buy: The Real Infrastructure Cost. For enterprise deployment patterns: Rakuten Stood Up 5 Enterprise Agents in a Week.

What to explore next

Anthropic

Claude AI Review 2026: Honest Assessment After 6 Months

Same room

Anthropic

Managed Agents Now Have Built-In Memory — What Builders Should Test Before OpenAI Ships Its Version

Same room

The Machine Room

I Taught My Laptop to Work the Night Shift

You may also explore

Deep dive

The Lab

The Agentic Convergence: How A2A, MCP, and World Models Are Rewriting the Internet