Category: Local AI & Automation

Building autonomous AI systems that run locally. Zero cloud cost, full data control, infinite scale.

  • The Solo Operator’s Content Stack: How One Person Runs a Multi-Site Network with AI

    Solo Content Operator: A single person running a multi-site content operation using AI as the execution layer — producing, optimizing, and publishing at scale by building systems rather than hiring teams.

    There is a version of content marketing that requires an editor, a team of writers, a project manager, a technical SEO lead, and a social media coordinator. That version exists. It also costs more than most small businesses can justify, and it produces content at a pace that rarely matches the actual opportunity in search.

    There is another version. One person. A deliberate system. AI as the execution layer. The output of a team, without the overhead of one.

    This is not a hypothetical. It is a description of how a growing number of solo operators are running content operations across multiple client sites — producing, optimizing, and publishing at scale without hiring a single writer. Here is how the stack works.

    The Mental Model: Operator, Not Author

    The first shift is in how you think about your role. A solo content operator is not a writer who also does some SEO and sometimes publishes things. That framing puts writing at the center and treats everything else as overhead.

    The correct frame is: you are a systems operator who uses writing as the output. The center of gravity is the system — the keyword map, the pipeline, the taxonomy architecture, the publishing cadence, the audit schedule. Writing is what the system produces.

    This distinction matters because it changes what you optimize. An author optimizes the quality of individual pieces. An operator optimizes the throughput and intelligence of the system. Both matter, but operators scale. Authors do not.

    Layer 1: The Intelligence Layer (Research and Strategy)

    Before anything gets written, the system needs to know what to write and why. This layer answers three questions for every article:

    What is the target keyword? Not a guess — a researched position. Keyword tools surface what terms are being searched, how competitive they are, and which queries sit in near-miss positions where ranking is achievable with the right content.

    What is the search intent? A keyword is a clue. The intent behind it is the brief. Someone searching “how to choose a cold storage provider” wants a comparison framework. Someone searching “cold storage temperature requirements” wants a technical reference. The same topic, two completely different articles.

    What does the competitive landscape look like? What is already ranking? What does it cover? What does it miss? The answer to the third question is the editorial angle.

    This layer produces a content brief: keyword, intent, angle, target word count, target taxonomy, and a note on what the competitive content is missing.

    Layer 2: The Generation Layer (Writing at Scale)

    With a brief in hand, AI handles the first draft. Not a rough draft — a structurally complete draft with headings, a definition block, supporting sections, and a FAQ set.

    The operator’s role in this layer is not to write. It is to direct, review, and elevate. The questions at this stage:

    • Does the opening make a real argument, or does it hedge?
    • Are the H2s building toward something, or just organizing paragraphs?
    • Is there a sentence in here that is genuinely worth reading, or is it all competent filler?
    • Does the conclusion land, or does it trail into a generic call to action?

    World-class content has a point of view. It takes a position. It says something that a reasonable person might disagree with, and then makes the case. The operator’s job is to ensure the generation layer produces that kind of content — not just competent coverage of the topic.

    Layer 3: The Optimization Layer (SEO, AEO, GEO)

    A well-written article that no one finds is a waste. The optimization layer ensures every piece of content is structured to be found, read, and cited — by humans and machines. Three passes:

    SEO Pass

    Title optimized for the target keyword. Meta description written to earn the click. Slug cleaned. Headings structured correctly. Primary keyword in the first 100 words. Semantic variations woven throughout.

    AEO Pass

    Answer Engine Optimization. Definition box near the top. Key sections reformatted as direct answers to questions. FAQ section added. This is the layer that chases featured snippets and People Also Ask placements.

    GEO Pass

    Generative Engine Optimization. Named entities identified and enriched. Vague claims replaced with specific, attributable statements. Structure applied so AI systems can parse the content correctly. Speakable markup added to key passages.

    Layer 4: The Publishing Layer (Infrastructure and Taxonomy)

    Content that lives in a document is not content. It is a draft. Publishing is the act of inserting a structured record into the site database with every field populated correctly.

    The publishing layer handles taxonomy assignment, schema injection, internal linking, and direct publishing via REST API. Every post field is populated in a single operation — no manual CMS login, no copy-paste, no incomplete records.

    Orphan records do not get created. Every post that publishes has at least one internal link pointing to it and links out to relevant existing content.

    Layer 5: The Maintenance Layer (Audits and Freshness)

    The system does not stop at publish. A content database requires maintenance. On a quarterly cadence, the maintenance layer runs a site-wide audit to surface missing metadata, thin content, and orphan posts — then applies fixes systematically.

    This layer is what separates a content operation from a content dump. The dump publishes and forgets. The operation publishes and maintains.

    The Real Leverage: Systems Over Output

    The counterintuitive truth about this stack is that the leverage is not in how fast it produces articles. The leverage is in the system’s ability to treat every piece of content as part of a structured, maintained, interconnected database.

    A single operator running this system on ten sites is not doing ten times the work. They are running ten instances of the same system. Each instance shares the same mental model, the same pipeline stages, the same optimization passes, the same maintenance cadence. The marginal cost of adding a site is far lower than staffing it with a human team.

    What gets eliminated: the briefing meeting, the draft review cycle, the back-and-forth on edits, the manual CMS copy-paste, the post-publish social scheduling that happens three days late because everyone was busy.

    What remains: intelligence and judgment — the things that actually require a human.

    Frequently Asked Questions

    How does a solo operator manage content for multiple websites?

    A solo operator manages multiple content sites by building a replicable system across five layers: research and strategy, AI-assisted generation, SEO/AEO/GEO optimization, direct publishing via REST API, and ongoing maintenance audits. The same system runs across every site with site-specific briefs as inputs.

    What is the difference between a content operation and a content dump?

    A content dump publishes articles and forgets them. A content operation publishes articles as database records, maintains them over time, connects them via internal linking, and runs regular audits to keep the database fresh and complete. The operation compounds; the dump decays.

    What is AEO and GEO in content optimization?

    AEO stands for Answer Engine Optimization — structuring content to appear in featured snippets and direct answer placements. GEO stands for Generative Engine Optimization — structuring content to be cited by AI search tools like Google AI Overviews and Perplexity.

    How do you maintain content quality at scale without a writing team?

    Quality at scale comes from having a clear editorial standard, applying it at the review stage of the generation layer, and running every piece through optimization passes before publish. The standard is set by the operator; the system enforces it.

    What does publishing via REST API mean for content operations?

    Publishing via REST API means writing directly to the WordPress database without manual CMS interaction. Every post field is populated in a single automated call, eliminating the manual copy-paste bottleneck and ensuring every record is complete at publish.

    Related: The database model that makes this stack possible — Your WordPress Site Is a Database, Not a Brochure.

  • Why AI Agents Are Different From Chatbots, Automations, and APIs

    These terms get used interchangeably. They’re not the same thing. Here’s the actual distinction between each one, where the lines get genuinely blurry, and which category fits what you’re actually trying to build.

    Chatbots

    A chatbot is a software interface designed to simulate conversation. The defining characteristic: it’s stateless and reactive. You send a message; it responds; the exchange is complete. Each interaction is largely independent.

    Traditional chatbots (pre-LLM) operated on decision trees — “if the user says X, respond with Y.” Modern LLM-powered chatbots use language models to generate responses, which makes them dramatically more capable and flexible — but the fundamental architecture is the same: you ask, it answers, you ask again.

    What chatbots are good at: answering questions, providing information, routing conversations, handling defined service scenarios with natural language flexibility. What they’re not: action-takers. A chatbot can tell you how to cancel your subscription. An agent can cancel it.

    Automations

    Automations are rule-based workflows that execute when triggered. Zapier, Make, and similar tools are the canonical examples. When event A happens, do B, then C, then D.

    The key characteristic: the path is predefined. Every step is specified by the person who built the automation. If an unexpected situation arises that the automation wasn’t built for, it either fails or skips the step. There’s no reasoning about what to do — there’s only executing the specified path or not.

    Automations are highly reliable for well-defined, stable processes. They break when edge cases arise that weren’t anticipated. They scale perfectly for the exact task they were built for; they don’t generalize.

    APIs

    An API (Application Programming Interface) is a communication contract — a defined way for software systems to talk to each other. APIs are infrastructure, not agents or automations. They’re the mechanism through which agents and automations take action in external systems.

    When an AI agent “uses Slack,” it’s calling Slack’s API. When an automation “posts to Twitter,” it’s calling Twitter’s API. The API is the door; agents and automations are the things that open it.

    Conflating APIs with agents is a category error. An API is a tool, not a behavior pattern.

    AI Agents

    An AI agent takes a goal and figures out how to accomplish it, using tools available to it, handling unexpected situations along the way, without a human specifying each step.

    The distinguishing characteristics versus the above:

    • vs. Chatbots: Agents take action in the world; chatbots respond to messages. An agent can book the flight, not just tell you how to book it.
    • vs. Automations: Agents reason about what to do next; automations execute predefined paths. When an unexpected situation arises, an agent adapts; an automation fails or skips.
    • vs. APIs: APIs are tools an agent uses; they’re not the agent itself. The agent is the reasoning layer that decides which API to call and what to do with the result.

    Where the Lines Actually Blur

    In practice, real systems often combine these categories:

    LLM-powered chatbots with tool access: A customer service chatbot that can look up your order status, initiate a return, and send a confirmation email is starting to look like an agent — it’s taking actions, not just responding. The boundary between “advanced chatbot” and “limited agent” is genuinely fuzzy.

    Automations with AI decision steps: A Zapier workflow with an OpenAI or Claude step in the middle isn’t purely rule-based anymore — the AI step can produce variable outputs that affect what the automation does next. This is a hybrid: mostly automation, partly agentic.

    Agents with constrained scopes: An agent restricted to a single tool and a narrow task class starts to look like a sophisticated automation. The more constrained the scope, the more the distinction collapses in practice.

    The useful question isn’t “what category is this?” but “is this system reasoning about what to do, or executing a predefined path?” That’s the actual distinction that matters for how you build, monitor, and trust it.

    Why the Distinction Matters Operationally

    Reliability profile: Automations fail predictably — when an edge case hits a path that wasn’t built. Agents fail unpredictably — when their reasoning goes wrong in a way you didn’t anticipate. Different failure modes require different monitoring approaches.

    Maintenance overhead: Automations require explicit updates when processes change. Agents adapt to process changes automatically — but may adapt in unexpected ways that need to be caught and corrected.

    Auditability: Automations are fully auditable — you can read the workflow and know exactly what it does. Agents are less auditable — you can inspect their actions, but not fully predict them in advance. For compliance-sensitive contexts, this matters significantly.

    Build cost: Automations are faster to build for well-defined, stable processes. Agents are faster to deploy when the process is complex, variable, or not fully specified — because you’re specifying a goal rather than a procedure.

    For what agents can actually do in production: What AI Agents Actually Do. For a business owner’s introduction: AI Agents Explained for Business Owners. For hosted agent infrastructure: Claude Managed Agents FAQ.


    Hosted agent infrastructure pricing: Claude Managed Agents Pricing Reference.

  • What AI Agents Actually Do (Not the Hype Version)

    Not the version where AI agents are going to replace all human jobs by 2030. The actual version, right now, based on what’s deployed in production.

    The Actual Definition

    What an AI agent is

    Software that takes a goal, breaks it into steps, uses tools to execute those steps, handles errors along the way, and keeps working without you directing every action. The distinguishing characteristic is autonomous multi-step execution — not just answering a question, but completing a task.

    The Key Distinction: One-Shot vs. Agentic

    Most people’s experience with AI is one-shot: you type something, the AI responds, the exchange is complete. That’s a language model doing inference. An AI agent is different in one specific way: it takes actions, checks results, and takes more actions based on what it found — often dozens of steps — without you approving each one.

    Example of one-shot AI: “Summarize this document.” You paste the document, the AI returns a summary. Done.

    Example of an AI agent doing the same task: “Research this topic and produce a summary with verified sources.” The agent searches the web, reads multiple pages, identifies conflicts between sources, runs additional searches to resolve them, synthesizes findings, and returns a summary with citations — without you specifying each search query or each page to read. You gave it a goal; it handled the steps.

    What Agents Can Actually Do

    The tools an agent can use define its capability surface. Common tool categories in production agents:

    • Web search: Query search engines and retrieve current information
    • Code execution: Write and run code in a sandboxed environment, use results to inform next steps
    • File operations: Read, write, and modify files — documents, spreadsheets, data files
    • API calls: Interact with external services — CRMs, databases, project management tools, communication platforms
    • Browser control: Navigate web pages, fill forms, extract information
    • Memory: Store and retrieve information across steps within a session, sometimes across sessions

    The combination of these tools is what makes agents capable of genuinely autonomous work. An agent that can search, write code, execute it, check the results, and write findings to a document can complete a research and analysis task that would otherwise require hours of human work — without you steering each step.

    What “Autonomous” Actually Means in Practice

    Autonomous doesn’t mean unsupervised indefinitely. Production agents are typically configured with:

    • Defined scope: The tools the agent can use, the systems it can access, the actions it’s allowed to take
    • Guardrails: Actions that require human confirmation before proceeding — making a payment, sending an email externally, modifying a production database
    • Reporting: Checkpoints where the agent surfaces what it’s done and asks whether to continue

    Autonomy is a dial, not a switch. You set how much the agent handles independently versus checks in. Most production deployments start more supervised and reduce oversight as trust in the agent’s behavior is established.

    Real Production Examples (Not Hypotheticals)

    Concrete examples from confirmed public deployments as of April 2026:

    • Rakuten: Deployed five enterprise Claude agents in one week on Anthropic’s Managed Agents platform — handling tasks across their e-commerce operations including data processing, content tasks, and operational workflows
    • Notion: Background agents that autonomously update workspace pages, synthesize database content, and process meeting notes into structured summaries without manual triggers
    • Sentry: Agents integrated into developer workflows — monitoring error streams, triaging issues, and surfacing relevant context to engineers
    • Asana: Project management agents that update task statuses, synthesize project health, and move work items based on defined triggers

    These are not pilots. These are production systems handling real operational load.

    How They’re Built

    An agent is built from three components:

    1. A language model: The reasoning layer — the part that decides what to do next, interprets tool results, and determines when the task is complete
    2. Tools: The action layer — APIs, code execution environments, file systems, or anything else the model can call to take action in the world
    3. Orchestration: The loop that connects them — manages the sequence of model calls and tool executions, maintains state between steps, handles errors

    Historically, builders had to construct the orchestration layer themselves — a significant engineering investment. Hosted platforms like Claude Managed Agents handle the orchestration layer, letting builders focus on defining the agent’s goals, tools, and guardrails rather than the mechanics of running the loop.

    What Agents Are Not Good At (Yet)

    Honest calibration on current limitations:

    • Long-horizon planning with many unknowns: Agents perform best on tasks with relatively defined scope. Open-ended exploratory work over many days with fundamentally uncertain requirements is still better handled by humans in the loop at each major decision point.
    • Tasks requiring physical world interaction: No production general-purpose physical agent exists. Software agents operating through APIs and interfaces are the current state.
    • Tasks where errors are catastrophic: Agents make mistakes. For any irreversible, high-stakes action — financial transactions, production data modifications, external communications to important relationships — human confirmation steps should remain in the loop.

    For how hosted agent infrastructure works: Claude Managed Agents FAQ. For the difference between agents and chatbots: AI Agents vs. Chatbots, Automations, and APIs. For an SMB-focused explanation: AI Agents Explained for Business Owners.


    For pricing specifics on hosted agent infrastructure: Claude Managed Agents Complete Pricing Reference.

  • The Real Monthly Cost of Running Claude Managed Agents 24/7

    If you’re considering running Claude Managed Agents around the clock, you want a number. Not “it depends.” An actual number you can put in a budget. Here’s the math, worked out by scenario, with the honest caveats about where the real costs are.

    The Formula

    Total monthly cost = (Active session hours × $0.08) + token costs + optional tool costs

    The $0.08/session-hour charge only applies during active execution. Idle time — waiting for input, tool confirmations, external API responses — doesn’t count. This matters significantly for 24/7 workloads, because very few agents are active 100% of the time even when “running around the clock.”

    The Maximum Theoretical Cost

    Scenario: Agent running continuously, zero idle time, 24 hours a day, 30 days a month.

    • Session runtime: 24 hrs × $0.08 × 30 days = $57.60/month
    • Token costs: separate, highly variable (see below)

    $57.60/month is the ceiling on session runtime charges. You cannot pay more than this in session fees under any 24/7 scenario. But here’s the reality: that ceiling assumes zero idle time across the entire month, which doesn’t describe any real production agent.

    Realistic 24/7 Scenarios

    Monitoring Agent (High Idle Ratio)

    Runs continuously watching for triggers — error alerts, specific data patterns, incoming requests. Activates on trigger, processes, returns to monitoring state.

    • Assumption: 5% active execution time (watching 95% of the time, executing 5%)
    • Active hours: 24 × 30 × 0.05 = 36 hours/month
    • Session runtime: 36 × $0.08 = $2.88/month
    • Token costs: low — moderate bursts on trigger events
    • Realistic total: $5–15/month

    Customer Support Agent (Business Hours Active)

    “24/7” in the sense of always-available, but actual request volume concentrates in business hours. Waits for tickets, processes them, waits again.

    • Assumption: 8 hours/day active execution, 16 hours waiting
    • Active hours: 8 × 30 = 240 hours/month
    • Session runtime: 240 × $0.08 = $19.20/month
    • Token costs: depends heavily on ticket volume and average length
    • At 100 tickets/day with moderate length: likely $30–80/month in tokens
    • Realistic total: $50–100/month

    Continuous Autonomous Pipeline

    Batch processing agent that runs continuously through a queue with minimal waiting — the closest to true 24/7 active execution.

    • Assumption: 20 hours/day truly active (4 hours queue exhaustion/maintenance)
    • Active hours: 20 × 30 = 600 hours/month
    • Session runtime: 600 × $0.08 = $48/month
    • Token costs: high — continuous processing means continuous token consumption
    • This is where tokens become the dominant cost driver by a significant margin
    • Realistic total: $200–500+/month (tokens dominate)

    The Real Variable: Token Costs

    For any 24/7 workload that’s genuinely busy, token costs will substantially exceed session runtime costs. The math:

    A moderately active agent processing 10,000 input tokens and 2,000 output tokens per hour with Claude Sonnet 4.6:

    • Input: 10,000 tokens × $3/million = $0.03/hour
    • Output: 2,000 tokens × $15/million = $0.03/hour
    • Token cost: $0.06/hour vs. session runtime of $0.08/hour — roughly equal at this volume

    Scale to 100,000 input tokens and 20,000 output tokens per hour (a busy processing agent):

    • Input: $0.30/hour; Output: $0.30/hour
    • Token cost: $0.60/hour vs. session runtime of $0.08/hour — tokens are 7.5× the runtime charge

    The session runtime fee is flat and bounded. Token costs scale with workload volume. For high-volume 24/7 agents, optimize token efficiency (prompt caching, context management, output brevity) before worrying about the session runtime charge.

    Prompt Caching Changes the Token Math

    If your agent has a large, stable system prompt — common in agents with extensive tool definitions or knowledge bases — prompt caching dramatically reduces input token costs. Cache hits cost a fraction of base input rates. For a 24/7 agent with a 20,000-token system prompt hitting the same context repeatedly, caching that prompt can cut input costs by 80–90%. The session runtime charge is unchanged, but the total cost picture improves significantly.

    The Budget Summary

    Agent Type Runtime/mo Typical Total
    Monitoring / low activity ~$3 $5–15
    Support agent (business hours volume) ~$19 $50–100
    Continuous processing pipeline ~$48 $200–500+
    Theoretical maximum (zero idle) $57.60 Unbounded (tokens)

    Complete pricing reference: Claude Managed Agents Pricing Guide. How idle time affects billing: Idle Time and Billing Explained. All questions: FAQ Hub.

  • Claude Managed Agents vs. OpenAI Agents API — A Direct Comparison

    You’re evaluating hosted agent infrastructure. Both Anthropic and OpenAI have one. Before you commit to either, here’s what’s actually different — not the marketing version, the architectural and pricing version.

    Bottom Line Up Front

    If your stack is Claude-native and you want to get to production fast without building orchestration infrastructure, Managed Agents is hard to beat. If you need multi-model flexibility or have OpenAI deeply embedded in your stack, the calculus changes. Lock-in is real on both sides.

    What Each Product Is

    Claude Managed Agents

    Anthropic’s hosted runtime for long-running Claude agent work. You define an agent (model, system prompt, tools, guardrails), configure a cloud environment, and launch sessions. Anthropic handles sandboxing, state management, checkpointing, tool orchestration, and error recovery. Launched April 8, 2026 in public beta.

    OpenAI Agents API

    OpenAI’s hosted agent infrastructure layer, launched earlier in 2026. Provides similar capabilities: hosted execution, tool integration, multi-agent coordination. Supports multiple OpenAI models (GPT-4o, o1, o3, etc.).

    Model Flexibility

    Managed Agents: Claude models only. Sonnet 4.6 and Opus 4.6 are the primary options for agent work. No multi-model mixing within the managed infrastructure.

    OpenAI Agents API: OpenAI models only, but a wider current model lineup (GPT-4o, o1, o3-mini depending on task). Also Claude-only within its own ecosystem — not multi-model in the cross-provider sense.

    The practical implication: If your evaluation is “I want the best model for this specific task regardless of provider,” neither hosted solution gives you that. Both lock you to their provider’s models. The multi-model comparison matters for self-hosted frameworks (LangChain, etc.), not for managed hosted solutions.

    Pricing Structure

    Claude Managed Agents: Standard Claude token rates + $0.08/session-hour of active runtime. Idle time doesn’t bill. Code execution containers included in session runtime — not separately billed.

    OpenAI Agents API: Standard OpenAI token rates + usage-based tooling costs. Pricing structure varies by tool and model tier. Verify current rates at OpenAI’s pricing page — rates have changed multiple times as their agent products have evolved.

    Direct comparison difficulty: Without modeling the same specific workload against both providers’ current rates, headline comparisons mislead. Token rates differ by model, model capabilities differ, and “session runtime” isn’t a category OpenAI uses. Model the workload, not the headline number.

    Infrastructure and Lock-In

    Both solutions create meaningful lock-in. This isn’t a criticism — it’s an honest description of the trade-off you’re making:

    Claude Managed Agents lock-in: Your agents run on Anthropic’s infrastructure with their tools, session format, sandboxing model, and checkpointing. Migrating to OpenAI’s Agents API or self-hosted infrastructure requires rearchitecting session management, tool integrations, and guardrail logic. One developer’s reaction at launch: “Once your agents run on their infra, switching cost goes through the roof.”

    OpenAI Agents API lock-in: Symmetric. Same dynamic in reverse. OpenAI’s session format, tool integration patterns, and infrastructure assumptions create equivalent switching costs to move to Anthropic’s platform.

    The honest framing: You’re not choosing “open” vs. “locked.” You’re choosing which provider’s lock-in you’re more comfortable with, given your existing infrastructure, model preferences, and vendor relationship.

    Data Sovereignty

    Both solutions run your data on provider-managed infrastructure. Neither currently offers native on-premise or multi-cloud deployment for the managed hosted layer. For companies with strict data sovereignty requirements, this is a parallel constraint on both platforms — not a differentiator.

    Production Track Record

    Claude Managed Agents: Launched April 8, 2026. Production users at launch: Notion, Asana, Rakuten (5 agents in one week), Sentry, Vibecode, Allianz. Anthropic’s agent developer segment run-rate exceeds $2.5 billion.

    OpenAI Agents API: Earlier launch gives more time in production, but the product has been revised significantly since initial release. Longer production history, but also more legacy architectural assumptions baked in.

    When to Choose Claude Managed Agents

    • Your stack is already Claude-native (you’re using Sonnet or Opus for most model calls)
    • You want to reach production without building orchestration infrastructure
    • Your tasks are long-running and asynchronous — the session-hour model fits naturally
    • The Notion, Asana, or Sentry integrations are relevant to your workflow
    • You want Anthropic’s specific safety and reliability guarantees

    When to Consider OpenAI’s Agents API Instead

    • Your stack is already heavily OpenAI-integrated (GPT-4o for primary model work, existing tool integrations)
    • You need access to reasoning models (o1, o3) for specific task types — Anthropic’s equivalent is Claude’s extended thinking, which has different characteristics
    • The specific tool integrations in OpenAI’s ecosystem are better matched to your stack
    • You want more production time at scale before committing to a platform

    When to Use Neither (Self-Hosted Frameworks)

    LangChain, LlamaIndex, and similar self-hosted frameworks remain viable — and better — when you genuinely need multi-model flexibility, on-premise execution, or tighter loop control than either hosted solution provides. The trade-off is engineering effort: months of infrastructure work that Managed Agents or OpenAI’s API eliminates.

    Complete pricing breakdown: Claude Managed Agents Pricing Reference. All Managed Agents questions: FAQ Hub. Enterprise deployment example: Rakuten: 5 Agents in One Week.

  • How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    The most counterintuitive thing about Claude Managed Agents pricing is what you don’t pay for. Most people, when they hear “$0.08 per session-hour,” mentally model a virtual machine running continuously. That’s the wrong mental model. Here’s the right one, and why it matters for your bill.

    The Core Distinction: Active vs. Idle

    Managed Agents session runtime only accrues while your session’s status is running. The session can exist — open, initialized, capable of continuing — without accumulating runtime charges when it’s not actively executing.

    The specific states that do not count toward your $0.08/hr charge:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Time waiting on an external API response your tool is calling
    • Rescheduling delays
    • Terminated session time

    This is a meaningful architectural decision by Anthropic. They’re billing on what actually taxes their compute — active execution — not on session existence or wall-clock time.

    Why This Is Different From How You Might Expect Billing to Work

    Compare three billing models:

    Virtual machine billing (what this is not): You pay for every hour the instance exists, whether it’s idle or saturated. A VM running 24/7 with 10% actual utilization still costs 24 hours/day.

    Lambda/function billing (closer analogy): AWS Lambda bills on execution duration and invocation count — you pay when code actually runs, not when a function is “available.” Idle Lambda functions cost nothing.

    Managed Agents billing (what this actually is): Closer to Lambda than VM. You pay $0.08 per hour of active execution. A session that runs for 2 hours of wall-clock time but has 90 minutes of waiting costs $0.08 × 1.5 hours = $0.12, not $0.08 × 2 hours = $0.16.

    A Real Scenario: The Human-in-the-Loop Agent

    Consider an agent that processes your inbox for action items and waits for your approval before sending replies. Wall-clock time: 4 hours open during your workday. Actual active execution: 20 minutes of processing across that 4-hour window, with the rest spent waiting for your review decisions.

    • VM billing equivalent: 4 hours × rate = significant charge
    • Managed Agents billing: 20 minutes × $0.08/hr = $0.027

    The difference is real. For interaction-heavy agents where the agent frequently waits for human decisions, the idle-time exclusion significantly reduces costs versus a naive per-hour model.

    A Real Scenario: The Autonomous Batch Agent

    Now consider an agent running a fully autonomous content pipeline — no human checkpoints, just continuous execution through a queue. Wall-clock time and active execution time are nearly identical because the agent never waits.

    • A 2-hour autonomous batch: 2 hours × $0.08 = $0.16

    Here, the idle-time model provides no benefit — the agent has no idle time. The billing is effectively equivalent to per-hour pricing because execution is continuous.

    Code Execution Containers Are Included

    One more billing nuance worth knowing: when your agent runs code, the execution happens in sandboxed Linux containers. These containers are not separately billed on top of session runtime. The $0.08/hr covers both the session runtime and the container execution. This is explicitly documented by Anthropic and represents meaningful savings if your agent is doing significant code execution work — you’re not paying twice.

    What This Means for Workload Design

    If you’re designing agent workflows and have the choice between architectures, the billing model creates a useful signal:

    • Agents that wait on humans: Metered billing is favorable — you only pay for the actual reasoning and execution time, not the human decision time
    • Fully autonomous agents: Billing approaches equivalent to per-hour rates — optimize these on token efficiency, not idle reduction
    • Scheduled batch agents: Natural fit — run when needed, terminate when done, no idle accumulation

    The 24/7 Agent Math

    For anyone doing the 24/7 always-on calculation: the maximum theoretical runtime exposure is 24 hrs × $0.08 × 30 days = $57.60/month in session fees. But a 24/7 agent with zero idle time is rare in practice. Agents that sleep between triggers, wait on external data, or hold for human decisions have meaningful idle windows that reduce the actual charge below the theoretical ceiling.

    Full monthly cost analysis: The Real Monthly Cost of Running Claude Managed Agents 24/7. Pricing reference: Complete Pricing Guide. All questions: FAQ Hub.

  • Claude Managed Agents Rate Limits — What 60 Requests Per Minute Means in Practice

    You’re planning to run Claude Managed Agents at scale. You’ve modeled the token costs, the session-hour charge, the workload cadence. Then you hit the actual constraint: rate limits. Here’s what 60 requests per minute actually means in practice, and whether it’s going to be your ceiling.

    The Two Limits You Need to Know

    Managed Agents has two endpoint-specific rate limits, separate from your standard Claude API limits:

    • Create endpoints: 60 requests per minute
    • Read endpoints: 600 requests per minute

    Your organization-level API limits apply on top of these. If your org is on a tier with a lower requests-per-minute ceiling, that’s the actual binding constraint.

    What “60 Create Requests Per Minute” Actually Means

    A create request, in Managed Agents context, is typically a session creation call — starting a new agent session. 60/minute means you can start 60 sessions per minute maximum. For almost all real workloads, this is not the binding constraint. Here’s why:

    Think about what generates create requests. If you’re running a batch pipeline that starts one new agent session per content item, processing 60 items per minute would saturate the limit. But a 60-item-per-minute content pipeline is running 3,600 items per hour — a genuinely high-volume operation. Most production agent workloads don’t look like this. They look like one session that runs for minutes or hours, processes multiple tasks within that session, and terminates when done.

    The create limit matters most for architectures where you’re spinning up a new session per task rather than running tasks within a persistent session. If that’s your pattern, 60/minute is a hard ceiling you’ll need to design around.

    What “600 Read Requests Per Minute” Actually Means

    Read requests include polling session status, reading agent output, checking checkpoints, and retrieving session state. 600/minute is a relatively generous limit — that’s 10 reads per second. For a monitoring dashboard polling 10 active sessions every second, you’d hit this. For most production monitoring patterns (checking status every 5-30 seconds per session), you’re well under the ceiling.

    The read limit becomes relevant in high-concurrency architectures where many sessions are running in parallel and all being polled aggressively. If you’re running 50 concurrent agents and checking each one every 2 seconds, that’s 25 reads/second — still within the 10 reads/second limit per second, but compressing toward it.

    The Limit That’s More Likely to Actually Stop You

    For most agent workloads, token throughput limits hit before request rate limits do. The reasoning: a long-running agent session processing significant context generates a lot of tokens. If you’re running many such sessions in parallel, you’ll hit your organization’s token-per-minute limit before you hit 60 sessions created per minute.

    Token limits depend on your API tier. Higher tiers have higher token throughput limits. Rate limit increases and custom limits for high-volume enterprise customers are negotiated with Anthropic’s sales team.

    Designing Around the 60 Create Limit

    If your architecture genuinely needs more than 60 new sessions per minute, the primary design pattern is batching more work within each session rather than creating more sessions. A single Managed Agents session can handle sequential tasks — you don’t need a new session per task if your tasks can be queued and processed within one session’s lifecycle.

    The tradeoff: longer-running sessions accumulate more runtime charge ($0.08/hr active). For most workloads, the efficiency gains from batching outweigh the marginal runtime cost.

    The Agent Teams Implication

    Agent Teams — Managed Agents’ multi-agent coordination feature — coordinate multiple Claude instances with independent contexts. Each instance in an Agent Team is a separate entity from a context standpoint. How Agent Team member sessions count against the create rate limit is worth verifying against current documentation if you’re architecting a high-concurrency Agent Teams deployment.

    For Enterprise Workloads

    If you’re evaluating Managed Agents for enterprise-scale deployment and the published limits don’t fit your volume requirements, contact Anthropic’s enterprise sales team. Rate limit increases for high-volume applications are a documented option — they’re negotiated, not self-serve.

    Contact: [email protected] or through the Claude Console.

    Frequently Asked Questions

    Does the 60 requests/minute limit apply to all API calls or just session creation?

    The 60/minute limit applies to create endpoints — session creation being the primary one. Read operations have a separate 600/minute limit. Standard Messages API calls are governed by your organization’s standard tier limits, not these Managed Agents-specific limits.

    Do subagents count against the create rate limit separately from the parent session?

    Subagents operate within the parent session’s context and report results upward — they’re architecturally different from new sessions. Verify current documentation for precise billing treatment of subagent creation calls vs. Agent Team session creation.

    What happens when I hit the rate limit?

    Standard API rate limit behavior applies — requests over the limit receive a 429 response. Implement exponential backoff in your session creation logic for any high-volume pattern that approaches the 60/minute ceiling.

    How does this compare to OpenAI’s Agents API limits?

    Rate limit structures differ by product and tier. Direct comparison requires checking both providers’ current documentation for your specific tier. The full comparison: Claude Managed Agents vs. OpenAI Agents API.

    Full pricing context including rate limits: Claude Managed Agents Complete Pricing Reference. All questions: Claude Managed Agents FAQ.

  • What Notion’s Claude Managed Agents Integration Actually Does

    When Anthropic launched Claude Managed Agents, Notion was one of four launch partners. That detail got buried in the announcement. Here’s what it actually means for people who use Notion for knowledge work, and why “Notion voice input desktop” keeps showing up as a query against a Managed Agents page.

    Short answer: Managed Agents in Notion is an ambient intelligence layer. It’s not a chatbot in a sidebar. It’s an agent that watches your workspace and acts — without you directing every step.

    What the Notion Integration Actually Does

    Notion’s Claude Managed Agents integration runs as a persistent background agent with access to your workspace. The practical capabilities, as documented at launch:

    • Autonomous page updates: The agent can read, summarize, and rewrite Notion pages without manual triggers. You set a task; it works through it.
    • Cross-database synthesis: Pull data from multiple Notion databases, synthesize it, and write outputs to a target page or database entry
    • Meeting note processing: Ingest raw meeting notes and produce structured summaries, action items, and task entries in your project database
    • Workflow automation: Trigger actions based on database property changes — a status update in one database can kick off agent work in another

    The key difference from Notion AI (which Notion has had for some time): Notion AI is request-response. You ask it something; it answers. Managed Agents in Notion can be configured to run autonomously on a schedule or on trigger, keep working through multi-step tasks, and report back when done. It’s closer to a background employee than an on-demand assistant.

    Why This Showed Up in Search as “Notion Voice Input Desktop”

    This is worth explaining, because that query cluster is real and mildly interesting. The Managed Agents announcement included voice input functionality — the ability to interact with agents via voice in some contexts. People searching “notion voice input desktop” and “notion ai voice input desktop” were looking for whether this voice capability existed in the desktop client for Notion specifically.

    The honest answer as of April 2026: voice input capabilities are in preview or context-dependent. Verify current availability in Notion’s desktop client against their current documentation — this is an area that may have evolved since launch.

    The “Decoupled Brain and Hands” Model Applied to Notion

    Anthropic describes their Managed Agents architecture as decoupling the brain (Claude, the reasoning layer) from the hands (the sandboxed containers where actions execute). In Notion’s context, this maps cleanly:

    • The brain reads your Notion workspace, understands context, makes decisions about what to do
    • The hands execute — writing to pages, updating database entries, moving content between sections

    The brain and hands operate independently. The agent can reason about what your project needs without being tightly coupled to the specific API calls that will implement it. This matters because it means the agent can handle ambiguity — “clean up the Q2 notes and create action items” is a goal, not a procedure, and the agent figures out the procedure.

    What You Actually Configure

    To run Claude Managed Agents in Notion, you’re defining:

    • Task definition: What the agent is supposed to accomplish (in natural language or structured format)
    • Tool access: Which Notion databases, pages, and capabilities the agent can read and write
    • Guardrails: What the agent cannot do — pages it can’t modify, actions it must confirm before taking
    • Trigger: When the agent runs — on schedule, on database trigger, or on demand

    You don’t write the orchestration logic. Anthropic’s infrastructure handles session management, state persistence, and error recovery. If the agent hits an error mid-task, it checkpoints and recovers — you don’t lose progress.

    The Practical Cost of Running Notion Agents

    Using Managed Agents in Notion triggers the same billing as any Managed Agents session: standard token rates plus $0.08/session-hour of active runtime. For typical knowledge work tasks:

    • A daily meeting summary agent running 15 minutes of active execution: ~$0.02/day in runtime (~$0.60/month), plus token costs for the volume of notes processed
    • A weekly database synthesis task running 45 minutes: ~$0.06/run

    For most knowledge workers, the session runtime cost is negligible — the token costs (driven by how much content the agent reads and writes) are the actual variable to model. See the complete pricing reference for worked examples.

    Asana and the Broader Pattern

    Asana was also a Managed Agents launch partner, and the integration pattern is similar: an agent that can read project data, update task statuses, move cards, and generate project summaries without constant human direction. The launch partner list (Notion, Asana, Rakuten, Sentry) suggests Anthropic targeted three categories: knowledge management (Notion), project management (Asana), enterprise operations (Rakuten), and developer tools (Sentry).

    That’s a deliberate wedge. If agents can handle the administrative layer of these four categories, the surface area for autonomous business work expands significantly.

    What This Means for How You Work

    The honest use case for most people reading this: you have a Notion workspace with databases that need regular synthesis, and you’re currently doing that manually. Managed Agents is the path to automating that synthesis without building and maintaining a custom integration.

    The constraint worth naming: you’re running your workspace data through Anthropic’s infrastructure. That’s the trade-off. For most knowledge work, the data sensitivity concern is low. For anything involving client data, legal documents, or proprietary strategy — read Anthropic’s data handling terms before configuring access.

    For the full Managed Agents setup and pricing context: Claude Managed Agents: Every Question Answered. For the enterprise deployment pattern: How Rakuten Deployed 5 Enterprise Agents in a Week.

  • Claude Managed Agents — Every Question Answered (Complete FAQ 2026)

    Everything people actually ask about Claude Managed Agents, answered straight. No preamble about “the exciting world of AI agents.” If you’re here, you already know why this matters — you just need answers.

    This page covers pricing, setup, capabilities, limits, comparisons, and the specific questions that don’t have obvious homes in Anthropic’s documentation. It updates as the beta evolves.

    Context

    Claude Managed Agents launched April 8, 2026 as a public beta. All answers reflect current documentation as of April 2026. Beta details change — verify specifics at platform.claude.com/docs.

    Pricing Questions

    What does Claude Managed Agents cost?

    Two charges: standard Claude API token rates (same as calling the Messages API directly) plus $0.08 per session-hour of active runtime. That’s the complete formula. See the complete pricing reference for worked examples by workload type.

    What exactly is a “session-hour” and when does it start billing?

    A session-hour is one hour of active session runtime — time when your session’s status is running. Billing is metered to the millisecond. It does not accrue during idle time, time waiting for your input, time waiting for tool confirmations, or after session termination.

    What’s included in the $0.08/session-hour charge?

    The session runtime charge covers Anthropic’s managed infrastructure: sandboxed code execution containers, state management, checkpointing, tool orchestration, error recovery, and scaling. You are not separately billed for container hours on top of session runtime.

    Does the $0.08/hr apply even if my agent is just waiting?

    No. Time spent waiting for your message, waiting for tool confirmations, or sitting idle does not accumulate runtime charges. Only active execution time counts.

    What does web search cost inside a Managed Agents session?

    $10 per 1,000 searches ($0.01 per search), billed separately from session runtime and token costs. This is the same rate as web search through the standard API.

    Are there volume discounts?

    Yes, negotiated case-by-case for high-volume users. Contact [email protected] or through the Claude Console.

    How does Managed Agents pricing compare to running my own agent infrastructure?

    The $0.08/session-hour is almost always cheaper than equivalent provisioned compute — but you trade infrastructure control and data locality for that simplicity. For a full comparison: Build vs. Buy: The Real Infrastructure Cost.

    What’s the real monthly cost if I run an agent 24/7?

    Maximum theoretical session runtime: 24 hrs × $0.08 × 30 days = $57.60/month. In practice, no production agent has zero idle time. Token costs become the dominant cost driver long before you hit the runtime ceiling. Detailed breakdown: The Real Monthly Cost of Running Claude Managed Agents 24/7.

    Setup and Access Questions

    How do I get access to Claude Managed Agents?

    Available to all Anthropic API accounts in public beta — no separate signup. You need the managed-agents-2026-04-01 beta header in your API requests. The Claude SDK adds this header automatically.

    Does it work with my existing API key?

    Yes. Same API key you’re already using for the Messages API. Same authentication. The beta header is the only new requirement.

    What three ways can I access Managed Agents?

    Via the Claude SDK (recommended — handles the beta header automatically), via direct API calls with the beta header, or via the Claude Console’s new Managed Agents section for no-code agent configuration and session tracing.

    Can I use Managed Agents through AWS Bedrock or Google Vertex AI?

    Managed Agents runs on Anthropic-managed infrastructure. This is distinct from Bedrock and Vertex AI deployments. Check Anthropic’s current documentation for multi-cloud availability status — this is an area of active development.

    Capability Questions

    What can Claude Managed Agents actually do?

    Run long autonomous sessions with persistent state, execute code in sandboxed Linux containers, use tools including web search and MCP servers, coordinate multiple Claude instances via Agent Teams, and maintain checkpoints for crash recovery. The session can last minutes or hours without you staying in the loop.

    What’s the difference between Agent Teams and subagents?

    Agent Teams coordinate multiple Claude instances with independent contexts, direct agent-to-agent communication, and a shared task list — suited for complex parallel tasks. Subagents operate within the same session as the main agent and only report results upward — more economical for sequential targeted tasks but less capable of true parallelism.

    Does it support MCP servers?

    Yes. MCP servers can be integrated as tool sources in Managed Agents sessions, extending what the agent can access and act on.

    How long can a session run?

    Anthropic’s documentation currently references session durations of minutes to hours. Claude Code’s longest autonomous sessions have reached 45 minutes. Managed Agents is architected for longer-running work. Check current documentation for specific session duration limits as the beta matures.

    What happened to Claude Code — is it the same as Managed Agents?

    No. Claude Code is a separate local coding workflow product. Anthropic’s docs explicitly note partners should not conflate the two. Managed Agents is a hosted API runtime service. Claude Code is a developer tool. Different products, different use cases, different billing.

    Rate Limit Questions

    What are the rate limits for Managed Agents?

    60 requests per minute for create endpoints; 600 requests per minute for read endpoints. Organization-level API limits still apply on top of these. For higher limits, contact Anthropic enterprise sales. Detailed breakdown: Claude Managed Agents Rate Limits Explained.

    Do standard Claude API rate limits still apply inside a session?

    Organization-level limits apply. The session runtime and create/read endpoint limits are Managed Agents-specific. If you’re running many parallel Agent Teams, model token throughput limits will become relevant.

    Comparison Questions

    How does Managed Agents compare to OpenAI’s Agents API?

    Both offer hosted agent infrastructure. Key differences: Managed Agents is Claude-native (no multi-model flexibility), sessions bill on runtime + tokens vs. OpenAI’s different pricing model, and lock-in dynamics differ. Full comparison: Claude Managed Agents vs. OpenAI Agents API.

    Should I use Managed Agents or the Claude Agent SDK?

    Use Managed Agents when you want Anthropic to host the runtime — less infrastructure work, faster to production. Use the SDK when you need tighter loop control, on-premise execution, or multi-cloud flexibility. Anthropic’s own migration docs draw this line clearly: SDK runs in your environment; Managed Agents runs in theirs.

    What companies are already using Managed Agents in production?

    Notion, Asana, Rakuten, Sentry, and Vibecode were launch partners. Rakuten deployed five enterprise agents within a week. Allianz is using Claude for insurance agent workflows. Anthropic’s run-rate from the agent developer segment exceeds $2.5 billion. How Rakuten did it in a week →

    Data and Security Questions

    Where does my data go when running in Managed Agents?

    Execution runs on Anthropic’s infrastructure. This is the explicit trade-off: you get managed infrastructure; they manage the compute. For companies with strict data sovereignty requirements, this is the key constraint to evaluate. On-premise or native multi-cloud deployment is not currently available.

    What are the sandboxing guarantees?

    Anthropic uses disposable Linux containers — “decoupled hands” in their terminology. Each container is a fresh sandboxed environment for code execution. State persistence is managed separately from the execution environment.

    Strategic Questions

    Is this a bet worth making?

    That depends on your switching cost tolerance. Lock-in is real: once your agents run on Anthropic’s infrastructure with their tools, session format, and sandboxing, switching providers isn’t trivial. The counter-argument: the infrastructure you’d otherwise build to match this is months of engineering. One developer’s reaction at launch was blunt: “there goes a whole YC batch.” That captures both the opportunity and the risk. Our take on why we’re staying our course →

    What does this mean for AI citation and visibility?

    Agents running on Anthropic’s infrastructure make decisions about what content to surface, cite, and synthesize. As agent workloads grow, being present in the knowledge sources agents draw from becomes a search strategy question in itself. What AI citation monitoring looks like →

  • Claude Managed Agents — Complete Pricing Reference 2026

    You opened this tab because you need a number you can actually use. Not a vibe, not “it depends.” A real pricing breakdown you can put in a spreadsheet, a budget request, or a Slack message to your CTO.

    This is that page. Every pricing variable for Claude Managed Agents in one place, verified against Anthropic’s current documentation as of April 2026. Bookmark it. The beta will update; so will this.

    Quick Reference: The Formula

    Total Cost = Token Costs + Session Runtime ($0.08/hr) + Optional Tools
    Session runtime only accrues while status = running. Idle time is free.

    The Two Cost Dimensions

    Claude Managed Agents bills on exactly two dimensions: tokens and session runtime. Every pricing question you have collapses into one of these two buckets.

    Dimension 1: Token Costs

    These are identical to standard Claude API pricing. You pay the same rates you’d pay calling the Messages API directly. No Managed Agents markup on tokens. Current rates for the models most commonly used in agent work:

    • Claude Sonnet 4.6: ~$3/million input tokens, ~$15/million output tokens
    • Claude Opus 4.6: higher rates apply — check platform.claude.com/docs/en/about-claude/pricing for current figures
    • Prompt caching: same multipliers as standard API — cache hits dramatically reduce input token costs on long sessions with stable system prompts

    The implication: a token-heavy agent with a large system prompt that runs the same context repeatedly benefits significantly from prompt caching, and that benefit carries over unchanged into Managed Agents.

    Dimension 2: Session Runtime — $0.08/Session-Hour

    This is the Managed Agents-specific charge. You pay $0.08 per hour of active session runtime, metered to the millisecond.

    The critical word is active. Runtime only accrues while your session’s status is running. The following do not count toward your bill:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Idle time between tasks
    • Rescheduling delays
    • Terminated session time

    This is not how you’d bill a virtual machine. It’s closer to how AWS Lambda bills — you pay for execution, not reservation. An agent that “runs” for 8 hours but spends 6 of those hours waiting on human input has a very different bill than one running continuous autonomous loops.

    Optional Tool Costs

    Web Search: $10 per 1,000 Searches

    If your agent uses web search, each search costs $10/1,000 — that’s $0.01 per search. For most agents, this is negligible. For a research agent running hundreds of searches per session, it becomes a line item worth modeling separately.

    Code Execution: Included in Session Runtime

    Code execution containers are included in your $0.08/session-hour charge. You’re not separately billed for container hours on top of session runtime. This is explicitly stated in Anthropic’s docs and represents meaningful savings versus provisioning your own compute.

    Worked Cost Examples

    Example 1: Daily Research Agent

    Runs once per day. 30 minutes of active execution. Processes 10 documents, outputs a summary report. Moderate token volume.

    • Session runtime: 0.5 hrs × $0.08 = $0.04/day (~$1.20/month)
    • Tokens (estimate): 50K input + 5K output with Sonnet 4.6 = ~$0.23/run (~$7/month)
    • Total: ~$8–10/month

    Example 2: Weekly Batch Content Pipeline

    Runs 3x/week. 2-hour active sessions. Processes multiple documents, generates structured outputs.

    • Session runtime: 2 hrs × $0.08 × 12 sessions/month = $1.92/month
    • Tokens: depends on content volume — typically $10–40/month
    • Total: ~$12–42/month

    Example 3: Customer Support Agent (Business Hours)

    Active during business hours, handling tickets. 8 hours/day active, 5 days/week.

    • Session runtime: 8 hrs × $0.08 × 22 days = $14.08/month in runtime
    • Tokens: highly variable by ticket volume — the dominant cost driver at scale
    • Runtime cost alone: ~$14/month — tokens are likely 5–20x this depending on volume

    Example 4: 24/7 Always-On Agent

    The maximum theoretical runtime exposure. Continuous operation, no idle time.

    • Session runtime: 24 hrs × $0.08 × 30 days = $57.60/month
    • In practice, no agent has zero idle time — real cost will be lower
    • Token costs at this scale become the dominant factor by a wide margin

    Anthropic’s Official Example (from their docs)

    A one-hour coding session using Claude Opus 4.6 consuming 50,000 input tokens and 15,000 output tokens: session runtime = $0.08. With prompt caching active and 40,000 of those tokens as cache reads, the token costs drop significantly. The runtime charge stays flat at $0.08 regardless of caching.

    What’s Not Billed in Managed Agents

    A few things that might seem like costs but aren’t:

    • Infrastructure provisioning: Anthropic handles hosting, scaling, and monitoring at no additional charge
    • Container hours: Explicitly not separately billed on top of session runtime
    • State management and checkpointing: Included in the session runtime charge
    • Error recovery and retry logic: Anthropic’s infrastructure problem, not yours

    Rate Limits

    Managed Agents has specific rate limits separate from standard API limits:

    • Create endpoints: 60 requests/minute
    • Read endpoints: 600 requests/minute
    • Organization-level limits still apply
    • For higher limits, contact Anthropic enterprise sales

    How to Access Managed Agents Pricing

    Managed Agents is available to all Anthropic API accounts in public beta. No separate signup, no premium tier gate. You need the managed-agents-2026-04-01 beta header in your API requests — the Claude SDK adds this automatically.

    For high-volume agent applications, Anthropic’s enterprise sales team negotiates custom pricing arrangements. Contact them at [email protected] or through the Claude Console.

    The Pricing Signals Worth Noting

    Anthropic recently ended Claude subscription access (Pro/Max) for third-party agent frameworks, requiring those users to switch to pay-as-you-go API pricing. This signals a deliberate strategy: consumer subscriptions are for human-paced interactions; agent workloads route through the API. The $0.08/session-hour rate exists in that context — it’s infrastructure pricing for compute that runs beyond human attention spans.

    The session-hour model also signals something about Anthropic’s infrastructure cost structure. They’re pricing on active execution time because that’s what actually taxes their systems. Idle sessions don’t cost them much; active agents do. The billing model follows the actual resource consumption pattern.

    Frequently Asked Questions

    Is the $0.08/session-hour charge in addition to token costs, or does it replace them?

    In addition to. You pay both: standard token rates for all input and output tokens, plus $0.08 per hour of active session runtime. They’re separate line items.

    Does prompt caching work in Managed Agents sessions?

    Yes. Prompt caching multipliers apply identically to Managed Agents sessions as they do to standard API calls. If your agent has a large, stable system prompt, caching it can significantly reduce input token costs.

    What happens if my session crashes? Am I billed for the crashed time?

    Runtime accrues only while status is running. Terminated sessions stop accruing. Anthropic’s infrastructure handles checkpointing and crash recovery — the session state is preserved even if the session terminates unexpectedly.

    Can I use Managed Agents on the free API tier?

    Managed Agents is available to all Anthropic API accounts in public beta, but standard tier access and rate limits apply. Free API tier users receive a small credit for testing.

    How does this compare to running agents on my own infrastructure?

    See our full breakdown: Build vs. Buy: The Real Infrastructure Cost of Claude Managed Agents. Short version: the $0.08/hour is almost certainly cheaper than provisioning and maintaining equivalent compute, but you trade control and data locality for that simplicity.

    Are there volume discounts?

    Volume discounts are available for high-volume users but negotiated case-by-case. Contact Anthropic enterprise sales.

    Does web search billing count against the $10/1,000 rate if the search returns no results?

    Anthropic’s current docs don’t explicitly address failed searches. Treat any triggered search as billable until confirmed otherwise.

    For the full session-hour math worked out by workload type, see: Claude Managed Agents Pricing, Decoded: What a Session-Hour Actually Costs You. For the build-vs-buy infrastructure comparison: Build vs. Buy: The Real Infrastructure Cost. For enterprise deployment patterns: Rakuten Stood Up 5 Enterprise Agents in a Week.