Tag: Claude API

  • Anthropic’s Real Play Isn’t a Chatbot — It’s the Invisible Agent Layer Inside Every Tool You Use

    Anthropic’s Real Play Isn’t a Chatbot — It’s the Invisible Agent Layer Inside Every Tool You Use


    Claude Managed Agents is the product. Slack, Notion, Jira, and Asana are just the interface. Anthropic is building the invisible execution layer that powers the next generation of enterprise software.

    There is a pattern emerging in enterprise AI that most people are reading wrong. They see Anthropic launch Claude Tag in Slack and think “chatbot upgrade.” They see Claude show up inside Notion and think “productivity feature.” They see AI agents appear in Jira and Asana and think “automation plugin.”

    They are missing the architecture underneath all of it.

    Anthropic is not building a better chatbot. It is building the invisible agent runtime that sits beneath every collaboration tool your team already uses. The company’s Claude Managed Agents (CMA) platform — launched in public beta on April 8, 2026 — is the infrastructure layer that makes this possible. And the speed at which partners are embedding it tells you everything about where enterprise software is heading.

    What Claude Managed Agents Actually Is

    Claude Managed Agents is a set of composable APIs for building and deploying production AI agents on Anthropic’s cloud infrastructure. The service handles sandboxed code execution, session persistence, credential management, scoped permissions, and end-to-end tracing — all the operational complexity that previously kept agents stuck in proof-of-concept limbo.

    The architecture rests on three primitives: the Agent (configuration and behavior), the Environment (sandboxed execution), and the Session (the event log that tracks everything the agent does). What makes this interesting architecturally is how Anthropic decoupled the “brain” from the “hands.” Claude’s reasoning runs on Anthropic’s own infrastructure while the code execution sandbox spins up independently — and in parallel. The brain starts reasoning immediately while the sandbox provisions, delivering roughly 60% faster time-to-first-token at the p50 level and over 90% faster at p95, according to Anthropic’s engineering team.

    Pricing follows a transparent model: standard Claude API token rates plus $0.08 per session-hour of active runtime during the current beta period. Runtime is measured to the millisecond and only accrues while the agent is actively executing — idle time waiting for input or tool confirmations does not count.

    For teams that need to keep execution inside their own perimeter, CMA supports self-hosted sandboxes through partners including Cloudflare, Daytona, Modal, and Vercel, or custom VPC deployments. MCP tunnels allow agents to connect to private Model Context Protocol servers inside your network without exposing them to the public internet. A Vaults system keeps credentials out of the sandbox entirely using envelope encryption. And a feature called Dreaming runs scheduled reviews of past sessions to curate agent memory — essentially letting agents learn from their own operational history.

    The Embedded Layer: Where CMA Actually Lives

    The real story is not the infrastructure. It is where that infrastructure shows up. In the ten weeks since CMA launched, Anthropic has embedded its agent runtime inside the collaboration tools that enterprises already depend on. This is not a roadmap — these integrations are live or in active beta.

    Slack: Claude Tag as Persistent Team Member

    Claude Tag, launched June 23, 2026, replaces Anthropic’s original Claude in Slack integration with something fundamentally different. This is not a chatbot you summon with a slash command. It is a persistent AI team member that lives in your channels, builds memory across conversations, and can take initiative through what Anthropic calls “ambient mode” — proactively surfacing information, following up on forgotten threads, and keeping teams updated across the organization.

    Claude Tag is multiplayer by design: one Claude identity per channel, accessible to everyone, with the ability to hand off half-finished tasks between team members. It runs on Claude Opus 4.8, Anthropic’s most capable model released May 28, 2026. And internally, Anthropic reports that Claude Tag is already approving and incorporating 65% of the code changes their product team submits. The existing Claude in Slack app will be retired on August 3, 2026. Claude Tag is available on Enterprise and Team plans.

    Notion: Claude as External Agent

    On May 13, 2026, Notion launched its Developer Platform version 3.5, which introduced the External Agents API. This API lets AI agents — including Claude — operate inside your Notion workspace as first-class participants. They can read pages, write to databases, create tasks, trigger automations, and be @-mentioned directly in documents. Claude operating through this API can chain actions together: read a project brief, check the task database for related work, draft a new document, and create a linked task entry — all in a single session, running on CMA infrastructure with full sandboxing.

    Asana: AI Teammates

    Asana built AI Teammates on CMA — agents that pick up assigned tasks inside projects, draft deliverables, and hand back outputs for human review. Specialist agents handle specific workflows: the Campaign Brief Writer turns scattered notes into structured briefs, the Workflow Optimizer identifies process gaps and builds automations, and the Compliance Specialist checks work against regulatory standards. Asana’s CTO said CMA let them ship these features “dramatically faster” than any prior approach to agent development.

    Atlassian: Claude Agent for Jira

    Atlassian released Claude Agent for Jira, built on CMA infrastructure, which lets teams assign work items directly to Claude from the Jira UI. The agent clones the repository, analyzes the codebase, implements changes on an independent branch, pushes the code, and opens a draft pull request — streaming real-time status updates back to the Jira work item throughout the process.

    Sentry: From Bug Detection to Merge-Ready PR

    Sentry’s existing AI debugging agent, Seer, already used Claude for root cause analysis. With CMA, Sentry extended the workflow from diagnosis to automated fixing — the agent takes Seer’s root cause output, generates a fix, opens a branch with the changes, and creates a pull request for developer review. Sentry processes over one million root cause analyses per year and provides near-immediate reviews on over 600,000 pull requests per month. The CMA integration was built by a single engineer in weeks, eliminating months of custom agent runtime development.

    Rakuten: Specialist Agents Across the Enterprise

    Rakuten deployed specialist agents across product, sales, marketing, and finance using CMA, with each agent deployed in approximately one week. Agents plug into Slack and Teams, letting employees assign tasks and receive deliverables including spreadsheets, slides, and applications. In the pilot, Rakuten reported a 97% drop in critical first-pass errors, with cost down more than 30% and latency reduced by 34%, without any loss in output quality.

    KPMG: Global Professional Services Alliance

    On May 19, 2026, KPMG and Anthropic announced a global alliance and launched “Digital Gateway Powered by Claude.” The partnership embeds Claude, Cowork, and CMA directly into KPMG’s client delivery platform, with an initial focus on tax and private equity clients. Building an AI agent for tax regulation workflows previously took weeks and required switching between multiple tools. With CMA integrated into Digital Gateway, KPMG says the same capability takes minutes. The alliance extends to KPMG’s 276,000-person global workforce.

    The Strategic Pattern: Agent Runtime as a Service

    Step back from the individual integrations and the strategic pattern becomes clear. Anthropic is not trying to own the interface. It is deliberately positioning CMA as the execution layer underneath interfaces that other companies own. Slack owns the messaging UI. Notion owns the workspace UI. Jira owns the project tracking UI. Anthropic owns the agent brain that powers all of them.

    This is a fundamentally different strategy from its two largest competitors.

    OpenAI chose vertical integration. When OpenAI launched Workspace Agents on April 22, 2026, it positioned ChatGPT itself as the central hub — a no-code successor to custom GPTs that connects to Slack, Salesforce, Google Drive, and Notion through plugins. Agents are created inside ChatGPT, accessed from ChatGPT, and managed through ChatGPT. OpenAI wants to own the surface area.

    Google chose platform depth. At Google Cloud Next on April 22, 2026, Google unveiled the Gemini Enterprise Agent Platform — a reimagined evolution of Vertex AI — alongside Workspace Intelligence, a semantic unifying layer that connects data across Docs, Slides, Gmail, and the broader Google Cloud ecosystem. Google’s agent platform supports 200+ models including Claude, and the Agent2Agent (A2A) protocol enables distributed peer-to-peer agent communication. Google is leveraging its data moat and distribution at the platform level.

    Anthropic chose tool-centric orchestration. Rather than owning the UI (OpenAI) or the platform (Google), Anthropic is embedding its agent runtime into every tool through composable APIs and the Model Context Protocol. The platform you use becomes irrelevant — whether it is Slack, Notion, Jira, Asana, or Sentry — because the agent brain running underneath is Claude on CMA.

    This is the agent-as-a-service model. And it may be the most defensible position of the three, because it does not require users to change their behavior or migrate to a new platform. The agent shows up where they already work.

    What the Numbers Say About Enterprise Agent Adoption

    The macro context supports Anthropic’s timing. Gartner predicts that 40% of enterprise applications will include embedded task-specific agents by the end of 2026, up from less than 5% in 2025. McKinsey’s April 2026 analysis found that agentic AI can enable automation of 60 to 80 percent of routine infrastructure work over time, translating to a 20 to 40 percent run-rate cost reduction in initial deployments.

    The gap between experimentation and production remains the defining challenge. Industry research compiled from major firms shows that nearly four in five enterprises have experimented with or deployed agents in some form, but fewer than one in nine are running them in production at a scale that generates measurable business value. For the agents that do reach production, the average return on investment is 171% — though 19% of deployments never reach payback at all.

    That production gap is exactly what CMA is designed to close. The infrastructure burden — sandboxing, session persistence, credential isolation, error recovery, observability — is the bottleneck. Engineering teams routinely dedicated significant senior engineering resources for months before a single agent reached production. CMA eliminates that layer entirely, which is why partners like Asana, Sentry, and Rakuten report shipping production agents in days or weeks rather than quarters.

    What This Means for Businesses Already Using These Tools

    If your organization uses Slack, Notion, Jira, or Asana — and statistically, you use at least two of them — you are about to encounter Claude whether you planned to adopt it or not. This is not a technology decision your IT team is making. It is a feature that your existing vendors are shipping.

    The practical implications are significant. Claude Tag in Slack means your team channels will have an AI participant that remembers past conversations, can be handed tasks asynchronously, and may proactively surface information. Claude in Notion means your project documentation, databases, and task boards can be read, analyzed, and acted upon by an agent that chains actions together. Claude Agent for Jira means development tickets can be assigned to an AI that clones your repo, writes code, and opens pull requests.

    For agencies and service providers managing client work across multiple tools, the embedded agent layer changes the economics fundamentally. Work that previously required a human to context-switch between Slack, Notion, and a project management tool — reading a brief here, updating a task there, drafting a document somewhere else — can be handled by an agent that operates across all of them simultaneously. The coordination tax that consumes a substantial share of knowledge work time is the exact problem embedded agents are built to solve.

    The companies that benefit most will be the ones that have clean operational systems — structured task boards, documented processes, well-organized project databases — because agents can only act on information they can read. Messy Notion workspaces and disorganized Jira boards will limit what agents can accomplish. Operational hygiene just became a competitive advantage.

    What This Means for Solo Operators Already Running Agent Infrastructure

    There is a specific audience that should be paying very close attention to CMA: the solo operators and small agency owners who have already built their own agent stacks from scratch. If you are running scheduled Claude tasks on a GCP Compute Engine VM, connecting to WordPress via REST API proxies, piping work orders through Notion, monitoring Gmail for client replies, and publishing content through MCP-connected pipelines — you have already built a version of what CMA is productizing.

    The economics question is worth doing the math on. A lightweight GCP VM running 24/7 to host recurring agent tasks — news desk monitors, outreach reply checks, newsletter extraction, scheduled content audits — costs a fixed monthly rate whether the agents are actively working or sitting idle. CMA at $0.08 per session-hour of active runtime only charges when agents are executing. For tasks that run for a few minutes every few hours, the per-session billing model could be substantially cheaper than keeping a VM warm around the clock. A task that runs for ten minutes six times a day would cost roughly $0.08 per day on CMA, versus the cost of a VM instance that never sleeps.

    But the migration path is not ready yet, and solo operators should understand exactly where the gaps are before making any infrastructure decisions.

    The biggest gap is MCP tunnels. CMA’s ability to connect agents to private MCP servers inside your network is still in research preview — not production-ready. If your agent stack depends on a private WordPress REST API proxy, a Notion workspace connected via MCP, or any internal tool that is not exposed to the public internet, CMA cannot reach it today. The Vaults system for credential management is promising, but it does not solve the network connectivity problem for self-hosted infrastructure.

    The second gap is orchestration control. Solo operators who have built their own agent infrastructure typically have precise control over scheduling, retry logic, error handling, and the exact sequence of tool calls. CMA’s Dreaming feature — which reviews past sessions to curate agent memory — is an interesting approach to agent learning, but it is not the same as having direct control over a cron job that fires at 6:00 AM, checks three data sources in a specific order, and writes results to a specific Notion database with a specific schema.

    The thesis for solo operators is straightforward: CMA is almost certainly the future migration path for self-hosted agent infrastructure. The economics favor it for intermittent workloads, the managed security and sandboxing eliminate operational risk you are currently carrying yourself, and the session persistence model solves problems that custom agent runtimes handle poorly. But the plumbing — particularly MCP tunnels to private infrastructure — is not production-ready. Track it closely. Do not migrate yet. When MCP tunnels graduate from research preview to general availability, revisit the math and the connectivity story. That is the trigger point.

    The Risk Nobody Is Talking About

    There is a tension in this model that deserves attention. When Claude operates as an invisible layer inside tools you already trust, the boundary between the tool’s native capabilities and the AI agent’s actions blurs. A Jira ticket that was “completed” might have been implemented by Claude, reviewed by a human for thirty seconds, and merged. A Notion project plan that looks thorough might have been generated by an agent that filled in the sections with plausible-sounding content.

    The embedded model works precisely because it reduces friction — but reduced friction also means reduced scrutiny. Organizations adopting embedded agents need to build review processes that match the speed at which agents can produce output. The 171% average ROI from agent deployments accounts for the value created, but it does not account for the subtle quality risks of production work generated by systems that are confident, fluent, and occasionally wrong.

    Anthropic has built guardrails into CMA — sandboxed execution, credential isolation, session logging — but the governance layer for reviewing agent output at enterprise scale is still largely unsolved. This is a space where internal operational discipline matters more than the technology itself.

    Where This Goes Next

    Claude Tag launched on Slack first. Anthropic has indicated plans for wider rollout beyond Slack. If the pattern holds, expect Claude Tag’s persistent team member model to appear in Microsoft Teams, Discord, and any other collaboration surface where teams coordinate work.

    The CMA primitives are designed to be composable, which means the partner integration list will grow rapidly. Any SaaS company with an API and a workflow that involves reading context, making decisions, and taking actions is a candidate for CMA integration. Customer support platforms, CRM systems, design tools, analytics dashboards, HR systems — the addressable surface is essentially every tool that knowledge workers touch.

    Gartner’s long-term projection estimates that agentic AI could drive approximately 30% of enterprise application software revenue by 2035, surpassing $450 billion. If Anthropic’s embedded strategy succeeds, a meaningful slice of that revenue flows through CMA as the underlying runtime — regardless of whose logo is on the interface.

    The chatbot era is ending. The embedded agent era is starting. And Anthropic is betting that the company that owns the invisible execution layer wins the market, even if no end user ever sees its name.

    Frequently Asked Questions

    What are Claude Managed Agents (CMA)?

    Claude Managed Agents is a set of composable APIs launched by Anthropic on April 8, 2026 in public beta. CMA lets developers build and deploy production AI agents on Anthropic’s cloud infrastructure, handling sandboxed code execution, session persistence, credential management, and end-to-end tracing. The architecture separates the “brain” (Claude reasoning) from the “hands” (code execution sandbox), enabling parallel processing and faster agent responses.

    How much do Claude Managed Agents cost?

    During the current public beta, CMA pricing is standard Claude API token rates plus $0.08 per session-hour of active runtime. Runtime is measured to the millisecond and only accrues while the agent is actively executing — idle time does not count. GA pricing has not been finalized and may differ from the beta rate.

    What is Claude Tag in Slack?

    Claude Tag is Anthropic’s persistent AI team member for Slack, launched June 23, 2026. Unlike a traditional chatbot, Claude Tag lives in channels, builds memory across conversations, takes initiative through ambient mode, and works asynchronously. It is multiplayer — one Claude identity per channel that all team members interact with. Claude Tag runs on Claude Opus 4.8 and is available on Enterprise and Team plans. It replaces the original Claude in Slack app, which retires August 3, 2026.

    Which tools have Claude Managed Agents embedded?

    As of June 2026, CMA is embedded in Slack (via Claude Tag), Notion (via the External Agents API), Asana (AI Teammates), Atlassian Jira (Claude Agent for Jira), and Sentry (extending the Seer debugging agent). Enterprise deployments include Rakuten (specialist agents across product, sales, marketing, and finance) and KPMG (Digital Gateway Powered by Claude for tax and private equity clients).

    How does Anthropic’s agent strategy differ from OpenAI and Google?

    Anthropic uses a tool-centric orchestration approach, embedding its agent runtime inside existing tools via composable APIs and the Model Context Protocol (MCP). OpenAI chose vertical integration with Workspace Agents, positioning ChatGPT as the central hub. Google chose platform depth with the Gemini Enterprise Agent Platform and Workspace Intelligence semantic layer. Anthropic’s approach does not require users to change platforms — the agent shows up where they already work.

    What percentage of enterprise apps will have embedded AI agents by end of 2026?

    Gartner predicts that 40% of enterprise applications will include embedded task-specific agents by the end of 2026, up from less than 5% in 2025. However, fewer than one in nine enterprises currently run agents in production at scale, suggesting significant growth ahead.

    Can Claude Managed Agents run inside a private network?

    Yes. CMA supports self-hosted sandboxes through partners including Cloudflare, Daytona, Modal, and Vercel, or custom VPC deployments. MCP tunnels allow agents to connect to private Model Context Protocol servers inside your network without public exposure. A Vaults system keeps credentials out of the sandbox using envelope encryption.



  • Claude Fable 5 Pricing and Access (2026)

    Claude Fable 5 Pricing and Access (2026)

    Last verified: June 13, 2026

    Claude Fable 5 (claude-fable-5) is Anthropic’s most capable widely released model, built for the most demanding reasoning and long-horizon agentic work. On the Claude API it is priced at $10 per million input tokens and $50 per million output tokens — double the rate of Claude Opus 4.8 — with a 1M-token context window and up to 128K output tokens per request. It reached general availability on June 9, 2026. The verified pricing and access details are below.

    Pricing at a glance

    All figures below are from Anthropic’s official pricing and models pages. Prices are in USD per million tokens (MTok). Fable 5 includes the full 1M-token context window at standard pricing — there is no long-context premium.

    Item Claude Fable 5
    Model ID (API) claude-fable-5
    Base input $10 / MTok
    Output $50 / MTok
    5-minute cache write $12.50 / MTok
    1-hour cache write $20 / MTok
    Cache hit / read $1 / MTok
    Batch API input / output $5 / MTok · $25 / MTok
    Context window 1M tokens
    Max output 128K tokens

    How Fable 5 compares to Opus, Sonnet, and Haiku

    Fable 5 sits at the top of Anthropic’s lineup, a tier above the Opus models. The per-token cost difference is the clearest way to see where it fits.

    Model Input $/MTok Output $/MTok Context Max output
    Claude Fable 5 $10 $50 1M 128K
    Claude Opus 4.8 $5 $25 1M 128K
    Claude Sonnet 4.6 $3 $15 1M 64K
    Claude Haiku 4.5 $1 $5 200K 64K

    Where you can use Fable 5

    At general availability, Fable 5 is offered across Anthropic’s first-party API and all major cloud platforms, plus claude.ai subscription plans (subject to the access note below). The model IDs differ by platform.

    Surface Availability / model ID
    Claude API (first-party) Generally available — claude-fable-5
    Claude Platform on AWS Generally available — claude-fable-5
    Amazon Bedrock Generally available — anthropic.claude-fable-5
    Google Vertex AI Generally available — claude-fable-5
    Microsoft Foundry Generally available
    claude.ai — Pro, Max, Team, Enterprise Promotional access June 9–22, 2026 (see below)
    claude.ai — Free plan Not included

    Consumer-plan access and the promotional window

    For claude.ai subscribers, Anthropic launched Fable 5 with a time-limited promotion rather than a permanent plan inclusion. From June 9 through June 22, 2026, Fable 5 was included on the Pro, Max, Team, and seat-based Enterprise plans at no extra charge. During that window, Anthropic’s documentation states that Fable 5 usage “counts toward your plan’s usage limits, and you won’t be charged anything extra,” but that it draws from those limits “at a higher rate than other models.” The Free plan was explicitly excluded.

    Anthropic’s announced plan was that after June 22, 2026, Fable 5 would no longer be included in plan usage limits, and continued use on claude.ai would require usage credits — a pay-as-you-go balance for usage beyond what a plan includes.

    Integration notes that affect cost and handling

    Fable 5 differs from the Opus, Sonnet, and Haiku models in a few ways that matter when you wire it into an application. It ships with safety classifiers that can decline a request: when that happens, the Messages API returns stop_reason: "refusal" as a successful HTTP 200 response, not an error. You are not billed for a request that is refused before any output is generated, and Anthropic provides server-side, client-side, and manual fallback paths to retry on another Claude model. Adaptive thinking is always on (thinking: {"type": "disabled"} is not supported), and the raw chain of thought is never returned — thinking.display controls whether thinking blocks contain a summary or are empty. Fable 5 also uses the tokenizer introduced with Opus 4.7, which can produce roughly 30–35% more tokens for the same text than older models, so re-baseline your token counts rather than assuming parity with earlier Claude models.

    How much does Claude Fable 5 cost?

    On the Claude API, Fable 5 costs $10 per million input tokens and $50 per million output tokens. Prompt-cache writes are $12.50/MTok (5-minute) or $20/MTok (1-hour), cache reads are $1/MTok, and the Batch API halves the rate to $5/MTok input and $25/MTok output.

    Is Fable 5 more expensive than Claude Opus 4.8?

    Yes. Fable 5 is priced at exactly double Opus 4.8 on both input ($10 vs $5 per MTok) and output ($50 vs $25 per MTok). Both share a 1M-token context window and 128K max output.

    Which claude.ai plans include Fable 5?

    From June 9 to June 22, 2026, Fable 5 was included on the Pro, Max, Team, and seat-based Enterprise plans at no extra cost, drawing from plan usage limits at a higher rate. The Free plan was not included. Anthropic’s plan was to move continued claude.ai use to usage credits after June 22.

    What is the difference between Fable 5 and Mythos 5?

    They share the same specs ($10/$50 per MTok, 1M context, 128K output) and June 9, 2026 launch date. Fable 5 is the generally available model with built-in safety classifiers that can decline requests; Mythos 5 is offered only in limited availability.


  • Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

    Claude Message Batches API: 50% Pricing, Limits and How It Works (2026)

    Last verified: June 13, 2026

    The Message Batches API lets you submit up to 100,000 Claude requests in a single call and receive results asynchronously — at exactly 50% of standard token prices. Most batches finish in under an hour. Results remain downloadable for 29 days. This page covers every verified limit, the per-tier rate limit tables, and how batch pricing stacks with prompt caching.

    Pricing: 50% off standard rates

    Every token processed through the Message Batches API is billed at half the standard input and output price. No quality difference from synchronous requests — only timing. The table below shows verified batch prices for active models.

    Model Batch input (per MTok) Batch output (per MTok) Standard input (per MTok) Standard output (per MTok)
    Claude Fable 5 $5.00 $25.00 $10.00 $50.00
    Claude Opus 4.8 $2.50 $12.50 $5.00 $25.00
    Claude Opus 4.7 $2.50 $12.50 $5.00 $25.00
    Claude Opus 4.6 $2.50 $12.50 $5.00 $25.00
    Claude Opus 4.5 $2.50 $12.50 $5.00 $25.00
    Claude Sonnet 4.6 $1.50 $7.50 $3.00 $15.00
    Claude Sonnet 4.5 $1.50 $7.50 $3.00 $15.00
    Claude Haiku 4.5 $0.50 $2.50 $1.00 $5.00

    Source: platform.claude.com/docs/en/build-with-claude/batch-processing

    Key limits at a glance

    Limit Value
    Maximum requests per batch 100,000
    Maximum batch payload size 256 MB
    Typical completion time Under 1 hour
    Hard expiration window 24 hours from creation
    Result retention period 29 days after creation
    Zero Data Retention eligible No
    Results format JSONL, streamed via results_url
    Supported models All active Claude models

    A batch expires if processing has not completed within 24 hours. Any individual request within that batch that did not finish is marked expired — you are not billed for expired or errored requests. Batch results (the JSONL file) are accessible for download for 29 days after the batch was created; after that the batch object itself is still visible but results can no longer be downloaded.

    Message Batches API rate limits by tier

    The Message Batches API has its own rate-limit pool, shared across all models, separate from the standard Messages API limits. The “processing queue” count refers to individual batch requests (not batches) that have been submitted but not yet completed by the model.

    Tier RPM (API calls) Max batch requests in processing queue Max batch requests per batch
    Tier 1 50 100,000 100,000
    Tier 2 1,000 200,000 100,000
    Tier 3 2,000 300,000 100,000
    Tier 4 4,000 500,000 100,000

    Source: platform.claude.com/docs/en/api/rate-limits

    RPM here limits how fast you can make HTTP requests to the Batches API endpoints (create, retrieve, list, cancel). It does not limit how many individual requests inside a batch are processed per minute — that is governed by the queue cap above. If high demand causes processing to slow, more individual requests within a batch may reach the 24-hour expiration limit.

    Stacking batch pricing with prompt caching

    The Batches API documentation explicitly states that the 50% batch discount and prompt caching discounts stack. Cache writes incur a one-time cost at 1.25x the base input rate (5-minute TTL) or 2x (1-hour TTL); subsequent cache reads cost 0.1x the base input rate. Because batches process asynchronously and may take longer than 5 minutes, Anthropic recommends using the 1-hour cache duration for batch requests that share large context.

    The following example uses Claude Opus 4.8 (standard input: $5.00/MTok) to show what each token type costs in a batch with a 1-hour cached system prompt.

    Token type Multiplier applied Effective price per MTok How calculated
    Uncached input (standard) 1x $5.00 Baseline
    Uncached input (batch) 0.5x $2.50 50% batch discount
    Cache write — 1h TTL (batch) 2x × 0.5x = 1x $5.00 2x write cost, then 50% batch
    Cache read (batch) 0.1x × 0.5x = 0.05x $0.25 10% read cost, then 50% batch
    Output (batch) 0.5x of $25.00 $12.50 50% batch discount on output

    In practice: if you cache a 50,000-token system prompt once and then read it across 1,000 batch requests, the cache write costs $0.25 (50K tokens at $5.00/MTok effective), while 1,000 cache reads cost $12.50 total (50M tokens at $0.25/MTok). The same 50 million tokens without caching would cost $125 in batch input (50 MTok at the $2.50/MTok batch rate). Cache hit rates on batches vary; Anthropic’s documentation notes typical rates of 30% to 98% depending on traffic patterns, since batch requests are processed concurrently rather than sequentially.

    How results come back

    When the batch finishes (or the 24-hour limit is reached), a results_url property is set on the batch object. Results are in JSONL format — one JSON object per line, in any order (not necessarily matching submission order). Each result carries the custom_id you assigned, plus a result object of type succeeded, errored, canceled, or expired. Streaming the results file rather than downloading it all at once is recommended for large batches. You are not billed for errored, canceled, or expired requests.

    Does the Batches API count against my standard Messages API rate limits?

    No. The Message Batches API has its own rate-limit pool that is tracked separately from the standard Messages API RPM, ITPM, and OTPM limits. You can use both simultaneously up to their respective limits.

    What happens if my batch does not finish within 24 hours?

    Any individual requests within the batch that did not complete are marked expired. You are not billed for those requests. The batch itself moves to ended status and whatever results did complete are available at the results_url.

    Can I use extended thinking, tool use, or vision in a batch?

    Yes. The Batches API supports vision, tool use (including server tools such as web search and code execution), system messages, multi-turn conversations, and extended thinking. The parameters not supported are stream: true, fast mode (speed), Threads parameters, and max_tokens: 0.

    How long are batch results available for download?

    Results are available for 29 days after the batch was created. After that window, the batch object remains visible in the Console and via the API, but the results file can no longer be downloaded.

    Is the Batches API eligible for Zero Data Retention?

    No. The Message Batches API is explicitly excluded from Zero Data Retention (ZDR). Data is retained under the feature’s standard retention policy regardless of your organization’s ZDR settings.

  • How Many Words Is a Million Claude Tokens? (2026) — and How the New Tokenizer Changed the Math

    How Many Words Is a Million Claude Tokens? (2026) — and How the New Tokenizer Changed the Math

    Last verified: June 13, 2026

    A million Claude tokens equals roughly 750,000 words on Claude Sonnet 4.6 — but only about 555,000 words on Claude Opus 4.7, Claude Opus 4.8, and Claude Fable 5. The gap comes from a new tokenizer that Anthropic introduced with Opus 4.7: it emits up to 35% more tokens from the same text. The only reliable way to measure your actual token count is the /v1/messages/count_tokens endpoint.

    Token-to-word conversion by model (1 million tokens)

    Anthropic publishes word equivalents directly in the context-window tooltips on the official models overview page. The figures below come from those tooltips.

    Model Tokenizer Context window ~Words per 1M tokens ~Pages per 1M tokens*
    Claude Fable 5 (claude-fable-5) New (Opus 4.7) 1M tokens ~555,000 ~2,200
    Claude Opus 4.8 (claude-opus-4-8) New (Opus 4.7) 1M tokens ~555,000 ~2,200
    Claude Opus 4.7 (claude-opus-4-7) New (Opus 4.7) 1M tokens ~555,000 ~2,200
    Claude Sonnet 4.6 (claude-sonnet-4-6) Older 1M tokens ~750,000 ~3,000
    Claude Haiku 4.5 (claude-haiku-4-5) Older 200k tokens ~150,000 (200K context) ~600 (200K context)
    Claude Opus 4.6 (claude-opus-4-6) Older 1M tokens ~750,000 ~3,000

    * Pages estimated at ~250 words per double-spaced page. These are approximations for typical English prose; actual counts vary by content type.

    What the new tokenizer changed — and why it matters

    Anthropic introduced a new tokenizer with Claude Opus 4.7. The official migration guide states that the new tokenizer “may use roughly 1x to 1.35x as many tokens when processing text compared to previous models (up to ~35% more, varying by content).” The most commonly cited figure across Anthropic’s documentation is roughly 30% more tokens for the same text.

    The practical effect: a document that costs 1,000,000 tokens on Opus 4.6 or Sonnet 4.6 costs approximately 1,300,000 tokens on Opus 4.7, Opus 4.8, or Fable 5. Budgets built for the old tokenizer need to be re-baselined against the new one.

    Tokenizer Models Approximate token increase vs. older tokenizer
    New (introduced Opus 4.7) Opus 4.7, Opus 4.8, Fable 5, Mythos 5 ~30% typical; up to ~35% depending on content
    Older Opus 4.6, Sonnet 4.6, Haiku 4.5, Opus 4.5, Sonnet 4.5 Baseline

    The token counting page also notes the comparison directly: “Claude Fable 5 and Claude Mythos 5 use the tokenizer introduced with Claude Opus 4.7, which produces roughly 30% more tokens than models before Claude Opus 4.7 for the same text.”

    Use count_tokens — not tiktoken or ratio math

    Anthropic’s migration guide explicitly flags the risk: “Any code path that estimates tokens client-side or assumes a fixed token-to-character ratio should be re-tested against Claude Opus 4.7.” OpenAI’s tiktoken library is trained on a different vocabulary and produces different counts. It will not give accurate results for any Claude model.

    The correct approach is the /v1/messages/count_tokens endpoint, passing the specific model you intend to use:

    curl https://api.anthropic.com/v1/messages/count_tokens \
      --header "x-api-key: $ANTHROPIC_API_KEY" \
      --header "content-type: application/json" \
      --header "anthropic-version: 2023-06-01" \
      --data '{
        "model": "claude-opus-4-8",
        "messages": [{"role": "user", "content": "Your text here"}]
      }'

    The endpoint returns a model-specific count. If you are migrating a workload from Sonnet 4.6 to Opus 4.8, count the same prompt with both model IDs and compare the two input_tokens values. The token counting endpoint is free to use (rate limits apply by usage tier). Anthropic notes that the returned count is an estimate; the actual count at inference time may differ by a small amount.

    Quick reference: common document sizes

    Document type Approx. words Tokens (older tokenizer) Tokens (new tokenizer)
    Novel (~400 pages) ~100,000 ~133,000 ~173,000
    Long research paper ~20,000 ~27,000 ~35,000
    Full context, Sonnet 4.6 (1M tokens) ~750,000 1,000,000 N/A (different model)
    Full context, Opus 4.8 (1M tokens) ~555,000 N/A (different model) 1,000,000

    These word estimates assume typical English prose. Code, structured data, and non-Latin scripts tokenize differently from natural language prose. Highly repetitive text and dense symbol-heavy content (like JSON or code) can fall well outside the ~0.75 words-per-token ratio.

    Does the new tokenizer change what fits in the context window?

    Yes, in one direction. The context window is still 1M tokens, but that window holds fewer words on the new tokenizer (~555k words) than on the old one (~750k words). A document that previously fit comfortably may now require trimming or chunking when moving to Opus 4.7, Opus 4.8, or Fable 5.

    Does Sonnet 4.6 use the new tokenizer?

    No. Claude Sonnet 4.6 uses the older tokenizer. Anthropic’s model overview page lists Sonnet 4.6’s 1M-token context window as equivalent to ~750k words, the same ratio as Opus 4.6 — confirming it has not adopted the Opus 4.7 tokenizer. Only Opus 4.7, Opus 4.8, Fable 5, and Mythos 5 use the new tokenizer.

    Can I use tiktoken or another open-source tokenizer for Claude?

    No. tiktoken is built for OpenAI models and uses a different vocabulary. It will not produce accurate token counts for any Claude model, and its error will be larger on the new Opus 4.7 tokenizer than on older Claude models. Use /v1/messages/count_tokens with the specific Claude model ID you plan to deploy.

    Does the new tokenizer affect pricing?

    Yes. Billing reflects token counts under the model’s tokenizer. If you migrate a workload from Opus 4.6 to Opus 4.8 and the new tokenizer produces 30% more tokens, your input token costs increase by roughly 30% before accounting for any per-token price difference between the models. Re-baseline cost estimates using the count_tokens endpoint rather than scaling from old measurements.

    How many pages is the full 1M-token context window?

    On models with the older tokenizer (Sonnet 4.6, Opus 4.6), 1 million tokens is approximately 3,000 double-spaced pages of typical English prose. On models with the new tokenizer (Opus 4.8, Fable 5), the same 1 million tokens holds approximately 2,200 pages. These are prose estimates — a 1M-token window filled with source code or dense structured data will span a very different page count.

  • Claude Cowork vs Code vs Agent SDK vs Managed Agents (2026)

    Claude Cowork vs Code vs Agent SDK vs Managed Agents (2026)

    Last verified: June 13, 2026

    Anthropic ships four distinct ways to put Claude to work as an agent, and they are easy to confuse. The short version: Claude Cowork and Claude Code are interactive products billed through your Claude subscription — Cowork for knowledge work in the desktop app, Code for software work in your terminal, IDE, desktop, or browser. The Claude Agent SDK and Managed Agents are programmatic surfaces for developers, billed through the API: the Agent SDK is a Python/TypeScript library that runs the agent loop inside your own process, while Managed Agents is a REST API where Anthropic runs the loop and hosts the sandbox. The tables below give the verified, side-by-side breakdown.

    The decision matrix

    Each row is one surface. Read across for who it serves, whether you drive it turn-by-turn or hand it a goal, where the work executes, and how it is paid for.

    Surface Who it is for Interactive vs autonomous Where it runs How it is billed
    Claude Cowork Knowledge workers (non-developers) — research, documents, file and spreadsheet work Interactive, supervised — shows you the plan and waits for your approval before acting The Claude desktop app on your own computer (macOS or Windows); not available on web or mobile Claude subscription (Pro, Max, Team, Enterprise) — draws from your plan’s usage allocation
    Claude Code Developers doing interactive coding — build features, fix bugs, automate dev tasks Interactive — you drive it in a session, though it can run agentically across files and tools Your machine (terminal, VS Code, JetBrains, desktop app) or the browser at claude.ai/code Claude subscription or an Anthropic Console (API) account
    Claude Agent SDK Developers building custom agents programmatically (Python or TypeScript) Autonomous — Claude reads files, runs commands, and edits code on its own via the agent loop Your own process and infrastructure API key (pay-as-you-go credits); see the subscription note below for the June 15, 2026 change
    Managed Agents Developers running production or long-running agents without operating their own sandbox/session infrastructure Autonomous — you send events, Claude executes tools and streams back results Anthropic-managed cloud sandbox per session (or a self-hosted sandbox on your own infrastructure) Claude API key + the managed-agents-2026-04-01 beta header (no subscription path)

    Where billing actually differs

    The cleanest way to split these four is by the wallet they draw from. The two interactive products are funded by a subscription; the two programmatic surfaces are funded by the API. This is the single distinction that trips people up most often, so it is worth stating plainly in its own table.

    Surface Billing model Notes
    Claude Cowork Subscription Included on Pro, Max, Team, and Enterprise. Multi-step tasks consume more of your usage allocation than chatting.
    Claude Code Subscription or API Most surfaces require a Claude subscription or a Console account; the terminal CLI and VS Code also support third-party providers.
    Claude Agent SDK API (pay-as-you-go) Authenticated with an ANTHROPIC_API_KEY; also supports Bedrock, Claude Platform on AWS, Vertex AI, and Azure. Anthropic does not permit claude.ai login for third-party agents built on the SDK.
    Managed Agents API (credits) Requires a Claude API key and the beta header; enabled by default for API accounts.

    One dated nuance is worth pinning down because it changes how subscription users pay for programmatic work. Starting June 15, 2026, Claude Agent SDK and claude -p usage on subscription plans no longer counts toward your Claude plan’s interactive usage limits; instead, eligible subscribers receive a separate monthly Agent SDK credit (per-user, not pooled), while subscription usage limits stay reserved for interactive use of Claude Code, Cowork, and Claude. If you use the Agent SDK with an API key from the Claude Platform, nothing changes — pay-as-you-go billing continues and you do not receive an Agent SDK monthly credit.

    SDK vs Managed Agents: the programmatic split

    Both programmatic surfaces let Claude run tools autonomously, but they differ in where the loop and the work live. Anthropic’s own comparison frames it this way: the Agent SDK “is a library that runs the agent loop inside your own process,” while Managed Agents “is a hosted REST API: Anthropic runs the agent and the sandbox, and your application sends events and streams back results.” Pick by who you want operating the infrastructure.

    Dimension Agent SDK Managed Agents
    Runs in Your process, your infrastructure Anthropic-managed infrastructure
    Interface Python or TypeScript library REST API
    Agent works on Files on your infrastructure A managed sandbox per session
    Session state JSONL on your filesystem Anthropic-hosted event log
    Best for Local prototyping; agents that work directly on your filesystem and services Production agents without operating sandbox/session infrastructure; long-running, asynchronous sessions

    A common path, per Anthropic’s docs, is to prototype with the Agent SDK locally, then move to Managed Agents for production.

    Quick chooser

    If you are not writing code and want Claude to finish a task on your computer, use Cowork. If you are a developer working interactively on a codebase, use Claude Code. If you are building your own agent and want it to run in your own process, use the Agent SDK. If you want Anthropic to run the agent and host the sandbox for long-running or production work, use Managed Agents.

    Is Claude Cowork the same as Claude Code?

    No. Both appear in the Claude desktop app, but Cowork is aimed at knowledge work (research, documents, spreadsheets, file management) for non-developers, while Claude Code is an agentic coding tool. Cowork runs only in the desktop app (macOS or Windows); Claude Code also runs in the terminal, VS Code, JetBrains, and the browser.

    Does a Claude subscription cover the Agent SDK or Managed Agents?

    Cowork and Claude Code are included with Claude subscriptions (Pro, Max, Team, Enterprise). The Agent SDK and Managed Agents are API surfaces authenticated with a Claude API key. As of June 15, 2026, subscription users do get a separate monthly Agent SDK credit for SDK and claude -p usage, but Managed Agents has no subscription path — it requires an API key and a beta header.

    Where does the work actually execute for each surface?

    Cowork runs on your own computer in the desktop app. Claude Code runs on your machine (or in the browser). The Agent SDK runs in your own process and infrastructure. Managed Agents executes in an Anthropic-managed cloud sandbox per session, or a self-hosted sandbox you control.

    Is the Agent SDK built on Claude Code?

    Yes. Per Anthropic, the Agent SDK “gives you the same tools, agent loop, and context management that power Claude Code, programmable in Python and TypeScript.” Anthropic also describes it as “Claude Code as a library.”

    Is Managed Agents generally available?

    No. As of June 13, 2026, Claude Managed Agents is in beta. Every Managed Agents endpoint requires the managed-agents-2026-04-01 beta header (the SDK sets it automatically), and access is enabled by default for API accounts.


  • Claude Enterprise Compliance: BAA, SOC 2, GDPR and Data Policy (2026)

    Claude Enterprise Compliance: BAA, SOC 2, GDPR and Data Policy (2026)

    Last verified: June 13, 2026

    Anthropic publishes a defined compliance posture for Claude: it holds SOC 2 Type I and Type II, ISO 27001:2022, and ISO/IEC 42001:2023 credentials; it will sign a Business Associate Agreement (BAA) covering HIPAA-ready services such as the first-party API and Enterprise plans; by default it does not train models on data sent under its commercial terms; and it offers a zero-data-retention (ZDR) arrangement on the Messages and Token Counting APIs. The hard part for buyers is the per-surface boundary — what the BAA covers, which features are blocked under ZDR or HIPAA, how long data is kept, and where it can be processed. Every figure below is drawn from Anthropic’s own trust, privacy, and developer documentation, with sources at the bottom. Eligibility, feature lists, and durations change; treat your signed contract and the live Trust Center as the controlling sources.

    Certifications and attestations

    Anthropic’s help center lists the following compliance credentials for its commercial products (Claude for Work and the Anthropic API). It directs customers to the Trust Portal at trust.anthropic.com to request copies of the underlying reports and certificates.

    Credential Status as described by Anthropic Scope
    SOC 2 Type I & Type II Listed as held Commercial products (Claude for Work, Anthropic API)
    ISO 27001:2022 Certified Information Security Management
    ISO/IEC 42001:2023 Certified (issued by Schellman Compliance, LLC, accredited by the ANSI National Accreditation Board) AI Management Systems
    HIPAA “HIPAA-ready configuration (BAA available)” See BAA section

    Anthropic describes itself as “one of the first frontier AI labs” to achieve ISO/IEC 42001:2023 certification, in an announcement dated January 13, 2025. The help-center certifications list does not mention ISO 27017, ISO 27018, FedRAMP, or CSA STAR; those are left out here rather than asserted. GDPR and CCPA are handled through Anthropic’s privacy program and customer agreements rather than as line-item “certifications” (see GDPR section).

    HIPAA and the BAA: covered by product surface

    Anthropic states it “provides a Business Associate Agreement (BAA) covering our HIPAA-ready services, such as use of our first-party API or Enterprise plans.” HIPAA readiness is enforced at the organization level: Anthropic provisions a dedicated HIPAA-enabled organization that automatically blocks non-eligible features. To process protected health information (PHI) on the API, an administrator must sign the BAA and contact sales to enable it; for Enterprise, an admin activates HIPAA compliance in the Claude Enterprise admin settings under “Data & Privacy” and signs the BAA there.

    Surface BAA / HIPAA-ready coverage
    First-party Claude API (Messages API) Covered as an Eligible Service (admin signs BAA, then contact sales)
    Claude Enterprise Covered once an admin activates HIPAA compliance and signs the BAA
    Workbench and Console Not covered
    Claude Free, Pro, Max, Team Not covered
    Cowork Not covered
    Claude Code Not covered under HIPAA readiness
    Amazon Bedrock / Vertex AI Not covered (cloud provider is the data processor; see those platforms)
    Claude Platform on AWS / Microsoft Foundry HIPAA readiness not available
    Beta features (e.g., Claude in Office, Claude Design) Generally not covered unless explicitly listed as eligible

    Within the API, only a subset of features is HIPAA-eligible. Anthropic enforces this in code: a HIPAA-enabled organization that sends a non-eligible feature gets a 400 invalid_request_error naming the blocked feature. Anthropic states your signed BAA is the official source of truth for what is covered.

    API feature HIPAA-eligible
    Messages API (/v1/messages) Yes
    Token counting Yes
    Web search Yes (dynamic filtering not eligible)
    Prompt caching, structured outputs, extended/adaptive thinking, citations, 1M context, PDF (inline), data residency, effort, fast mode, bash & text-editor tools, memory tool Yes
    Web fetch, computer use, advisor tool, context management (compaction / editing), tool search, cache diagnostics No
    Code execution, programmatic tool calling No
    Batch API, Files API, Agent Skills, MCP connector, Claude Managed Agents, MCP tunnels No

    PHI must appear only in message content, attached files, or related file names/metadata — never in JSON schema definitions (property names, enum/const values, or pattern regexes), because compiled schemas are cached separately and do not receive the same PHI protections. Anthropic notes workspace names, user contact details, billing data, and support tickets are not expected to contain PHI under the BAA.

    Data retention (commercial default)

    Under Anthropic’s commercial data retention policy, conversation content is not retained by default for the API, and API inputs and outputs are automatically deleted on the backend within 30 days of receipt or generation. For interface products such as Claude for Work, data persists until you delete it, after which it is removed from backend storage within 30 days. Two exceptions extend retention regardless of arrangement.

    Data type / event Retention
    API inputs and outputs (default) Auto-deleted within 30 days
    Deleted conversation content (Claude for Work) Removed from backend within 30 days
    Inputs/outputs for a chat flagged as a Usage Policy violation Up to 2 years
    Trust & safety classification scores (flagged chat) Up to 7 years
    Data tied to feedback you submit (thumbs up/down, bug report) 5 years

    Zero data retention (ZDR)

    With a ZDR arrangement, customer data is not stored at rest after the API response is returned, except where needed to comply with law or combat misuse. ZDR is requested through Anthropic sales and enabled per organization — it does not carry over automatically to new organizations under the same account. Even under ZDR, Anthropic retains User Safety classifier results, and may retain inputs and outputs for up to 2 years if a chat or session is flagged for a Usage Policy violation. CORS is not supported for ZDR organizations, so browser apps must call through a backend proxy.

    Surface ZDR coverage
    Claude Messages API & Token Counting API Eligible
    Claude Code (Commercial org API keys, or via Claude Enterprise with ZDR enabled) Eligible
    Console and Workbench Not eligible
    Claude Teams & Claude Enterprise interfaces Not eligible (except Claude Code via Enterprise with ZDR on)
    Claude Free, Pro, Max Not eligible
    Claude Managed Agents Not eligible (stateful; delete transcripts manually)
    Batch API, Files API, code execution, Agent Skills, MCP connector Not eligible
    Third-party integrations Not eligible

    A handful of ZDR-eligible features are marked “Yes (qualified)” — structured outputs and cache diagnostics — meaning Anthropic retains a narrow, documented set of technical data (for example, a cached JSON schema for up to 24 hours since last use) rather than your prompts or Claude’s outputs.

    Model-training policy and Covered Models

    Anthropic’s Privacy Policy states it does not apply to content processed on behalf of business customers; that data is governed by the customer agreement. For the API specifically, Anthropic states retained data is never used for model training without your express permission. Anthropic’s consumer-terms update confirms the data-use changes “do not apply to services under our Commercial Terms,” including Claude for Work, Claude for Government, Claude for Education, and API use (including via Amazon Bedrock and Google Cloud’s Vertex AI). Training on commercial data happens only if a customer explicitly opts in (for example, the Development Partner Program).

    One model-specific exception affects retention, not training: Claude Fable 5 and Claude Mythos 5 are designated Covered Models and require 30-day data retention. ZDR is not available for these two models; a request to either from an organization whose retention configuration doesn’t meet the requirement returns a 400 invalid_request_error. Organizations with ZDR can turn on 30-day retention for a single workspace (Console > Settings > Workspaces > Privacy controls) to use those models there while keeping ZDR elsewhere. On Bedrock, Vertex AI, and Microsoft Foundry, retention requirements for these models are set by each platform.

    GDPR, data residency, and international transfers

    For users in the EEA, UK, or Switzerland, the data controller is Anthropic Ireland, Limited; elsewhere it is Anthropic PBC. Where the EU or UK GDPR applies, Anthropic responds to verifiable data-subject requests within one calendar month. For transfers to countries without an adequacy decision, Anthropic relies on standard contractual clauses, and publishes its subprocessors at anthropic.com/subprocessors.

    On data residency, the Claude API exposes two independent controls. inference_geo sets where inference runs per request — values are "global" (default) or "us" — and is supported on Claude Opus 4.6, Sonnet 4.6, and later (older models return a 400). Workspace geo controls where data is stored at rest and where endpoint processing happens; it is set at workspace creation and cannot be changed afterward. Per Anthropic’s documentation, "us" is currently the only available workspace geo, and only "us" and "global" inference geos are available — so there is currently no EU-resident storage option at the workspace level. US-only inference is priced at 1.1x the standard rate on supported models. Data residency is available on the Claude API (first-party) and Claude Platform on AWS; on Bedrock and Vertex AI the region is set by the endpoint or inference profile.

    Does Anthropic train its models on my API or commercial data?

    No, not by default. Anthropic’s Privacy Policy excludes business-customer content (governed by your customer agreement), and for the API it states retained data is never used for training without your express permission. The consumer data-use changes explicitly do not apply to Commercial Terms services. Training on commercial data requires an explicit opt-in.

    Will Anthropic sign a BAA, and for what?

    Yes. Anthropic signs a BAA covering HIPAA-ready services such as the first-party API and Enterprise plans. The Messages API is covered as an Eligible Service. It does not cover Workbench/Console, Free/Pro/Max/Team, Cowork, Claude Code, or beta features unless explicitly listed. An admin must sign the BAA and enable HIPAA readiness; the organization then auto-blocks non-eligible features.

    What’s the difference between ZDR and HIPAA readiness?

    Per Anthropic, ZDR prevents customer data from being stored at rest after the API response. HIPAA readiness is a broader set of safeguards (encryption, access controls, audit logging) that protect PHI throughout its lifecycle and lets data be retained with safeguards rather than deleted immediately. Anthropic states you do not also need ZDR if you have HIPAA readiness.

    How long does Anthropic keep my data?

    By default, API inputs and outputs are auto-deleted within 30 days. If a chat is flagged as a Usage Policy violation, inputs/outputs may be retained up to 2 years and trust & safety classification scores up to 7 years. Data tied to feedback you submit is kept 5 years. ZDR removes the default at-rest storage but does not remove the law/misuse exceptions.

    Can I keep Claude inference and data in the EU?

    Not at rest currently. The API’s inference_geo can pin inference to "us" or run "global", but Anthropic’s documentation lists "us" as the only available workspace geo (storage region). EU/UK data-subject rights and standard contractual clauses apply regardless, but an EU storage-residency option is not currently offered at the workspace level per the docs verified here.


  • Claude Skills vs MCP vs Connectors vs Plugins: What Each One Is (2026)

    Claude Skills vs MCP vs Connectors vs Plugins: What Each One Is (2026)

    Last verified: June 13, 2026

    The simplest way to keep these straight: Skills teach Claude how to do a task, MCP servers and Connectors give Claude access to external systems, Plugins bundle several of these together, and Hooks and slash commands control a Claude Code session. A Skill is a folder of instructions Claude reads when relevant. MCP (the Model Context Protocol) is an open standard that connects Claude to your tools and data. A Connector is Anthropic’s packaging of a remote MCP server inside the Claude apps. A Plugin packages any combination of commands, agents, MCP servers, hooks, and skills for Claude Code. Every definition below is taken verbatim from Anthropic’s official documentation, fetched on the verification date.

    The one-glance comparison

    This is the liftable summary. Each row is one mechanism; the third column is the distinction people most often get wrong — whether the thing teaches Claude how to do something or gives Claude access to something.

    Type What it is Teaches-HOW or gives-ACCESS Where it runs How you install / enable it
    Skill A modular capability that packages instructions, metadata, and optional resources (scripts, templates) in a SKILL.md file that Claude uses automatically when relevant. Teaches HOW. Provides domain-specific expertise: workflows, context, and best practices (procedural knowledge). In Claude’s code execution environment / VM, where Claude has filesystem access, bash, and code execution. claude.ai: upload a zip under Settings > Features. API: upload via the Skills API (/v1/skills) with the required beta headers (code-execution-2025-08-25, skills-2025-10-02, files-api-2025-04-14). Claude Code: a SKILL.md directory under ~/.claude/skills/ or .claude/skills/.
    MCP server An implementation of the Model Context Protocol — “an open-source standard for connecting AI applications to external systems.” Described as “a USB-C port for AI applications.” Gives ACCESS. Connects Claude to data sources, tools, and workflows so it can access information and perform tasks. Local (stdio) servers run as processes on your machine; remote servers run over HTTP (recommended) or SSE (deprecated). In Claude Code: claude mcp add, at local, project, or user scope. Also configurable in .mcp.json or imported from Claude Desktop / claude.ai.
    Connector A feature that “let[s] Claude access your apps and services, retrieve your data, and take actions within connected services.” Custom connectors use remote MCP. Gives ACCESS. Same access role as MCP — a Connector is the in-app packaging of a remote MCP server. Custom connectors are reached from Anthropic’s cloud infrastructure, not from your local machine. In the Claude apps under Customize > Connectors (or the in-chat “+” menu). Add a directory connector, or “Add custom connector” by URL.
    Plugin “A lightweight way to package and share any combination of” Claude Code customizations. Both — it’s a container. Bundles things that teach HOW (commands, skills) and things that give ACCESS (MCP servers), plus hooks and subagents. In Claude Code — “they’ll work across your terminal and VS Code.” The /plugin command (public beta). For a marketplace: /plugin marketplace add user-or-org/repo-name, then install from the /plugin menu.
    Hook “User-defined shell commands, HTTP endpoints, or LLM prompts that execute automatically at specific points in Claude Code’s lifecycle.” Controls behavior. Provides deterministic control rather than relying on the LLM to decide. In Claude Code, firing at lifecycle events (e.g. PreToolUse, PostToolUse, UserPromptSubmit, SessionStart, Stop). Configured in JSON settings files such as ~/.claude/settings.json or .claude/settings.json, or bundled in a plugin.
    Slash command A command starting with / that controls a Claude Code session. Includes built-ins (e.g. /help, /compact) and custom commands. Teaches HOW (custom) / controls session (built-in). Custom commands have been merged into Skills. In the Claude Code session (terminal or VS Code). Built-ins ship with Claude Code. Custom: a Markdown file under .claude/commands/ (project) or ~/.claude/commands/ (personal); a .claude/skills/<name>/SKILL.md does the same.

    Skills: teaching Claude a procedure

    An Agent Skill is “a directory containing a SKILL.md file” with YAML frontmatter plus instructions, and optionally additional markdown files, executable scripts, and reference resources. The point is procedural knowledge — it turns “general-purpose agents into specialists” by giving Claude the workflows and best practices for a task, the way you’d write an onboarding guide for a new teammate. Anthropic ships pre-built Skills for PowerPoint, Excel, Word, and PDF, and you can author your own.

    What makes Skills cheap to install in bulk is progressive disclosure: Claude loads information in stages instead of all at once. The numbers below come straight from Anthropic’s Skills overview.

    Loading level When loaded Token cost (per Anthropic docs) Content
    Level 1: Metadata Always, at startup ~100 tokens per Skill name and description from the YAML frontmatter
    Level 2: Instructions When the Skill is triggered Under 5k tokens The SKILL.md body — workflows and guidance
    Level 3+: Resources As needed Effectively unlimited Bundled files read or executed via bash without loading their contents into context

    The name field is capped at 64 characters (lowercase letters, numbers, hyphens; it cannot contain the reserved words “anthropic” or “claude”), and the description is capped at 1,024 characters. One important constraint: custom Skills do not sync across surfaces — a Skill uploaded to claude.ai is not automatically available via the API, and Claude Code Skills are filesystem-based and separate from both.

    MCP: the open standard for access

    The Model Context Protocol is, in Anthropic’s words, “an open-source standard for connecting AI applications to external systems.” The canonical analogy: “Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect electronic devices, MCP provides a standardized way to connect AI applications to external systems.” Using MCP, “AI applications like Claude or ChatGPT can connect to data sources (e.g. local files, databases), tools (e.g. search engines, calculators) and workflows (e.g. specialized prompts).”

    An MCP server can expose three kinds of building block — tools, resources, and prompts. In Claude Code, resources are referenced with @server:protocol://resource/path and prompts surface as commands in the form /mcp__servername__promptname. You connect a server with claude mcp add, choosing a transport and a scope:

    Scope Loads in Shared with team Stored in
    Local (default) Current project only No ~/.claude.json
    Project Current project only Yes, via version control .mcp.json in project root
    User All your projects No ~/.claude.json

    For transports, HTTP is “the recommended option for connecting to remote MCP servers,” local stdio servers “run as local processes on your machine,” and SSE is explicitly marked deprecated in favor of HTTP.

    Connectors: MCP, packaged for the apps

    A Connector is how the Claude apps surface MCP. Per Anthropic’s help center, “Connectors let Claude access your apps and services, retrieve your data, and take actions within connected services,” and “Custom connectors using remote MCP are available on Claude, Cowork, and Claude Desktop.” So a Connector is not a different technology from MCP — a custom connector is a remote MCP server wired into the Claude UI.

    The most consequential detail is where the connection originates: “Custom connectors (remote MCP servers) are reached from Anthropic’s cloud infrastructure, not from your local machine.” That means a custom-connector MCP server must be reachable over the public internet — one hosted only on a private network, behind a VPN, or blocked by a firewall will not connect even if you can reach it yourself.

    Aspect Directory (pre-built) connector Custom connector
    Source Pre-built integrations in the Connectors Directory Added by you via a remote MCP server URL
    Plan availability Available across Claude plans Free, Pro, Max, Team, and Enterprise
    Free-plan limit Per directory “Free users are limited to one custom connector.”
    Where to add it Customize > Connectors, or the in-chat “+” > Connectors > Manage connectors

    Plugins: a bundle, not a single thing

    A Plugin is “a lightweight way to package and share any combination of” Claude Code customizations. The official announcement lists four bundle components, and the Claude Code documentation adds skills as a fifth thing a plugin can carry:

    Component What it adds
    Slash commands Custom shortcuts for frequently-used operations
    Subagents Purpose-built agents for specialized development tasks
    MCP servers Connections to tools and data sources through MCP
    Hooks Customizations of Claude Code’s behavior at key workflow points
    Skills Per the Claude Code docs, a plugin can include a skills/ directory; plugin skills use a plugin-name:skill-name namespace

    You install a plugin “directly within Claude Code using the /plugin command.” To pull from a marketplace — “curated collections where other developers can discover and install plugins” — you run /plugin marketplace add user-or-org/repo-name and then install from the /plugin menu. Plugins “work across your terminal and VS Code.” Plugins were announced on October 9, 2025, as a public beta for all Claude Code users.

    Hooks and slash commands: controlling the session

    The last two mechanisms aren’t about adding capability — they’re about controlling a Claude Code session. Hooks are “user-defined shell commands, HTTP endpoints, or LLM prompts that execute automatically at specific points in Claude Code’s lifecycle.” Their defining property is determinism: they provide deterministic control rather than relying on the LLM to make decisions. A PreToolUse hook can, for example, block a destructive rm -rf command regardless of what Claude intended. Hooks are configured in JSON settings files (such as ~/.claude/settings.json or a project’s .claude/settings.json) and fire at events including SessionStart, UserPromptSubmit, PreToolUse, PostToolUse, and Stop.

    Slash commands start with / and control the session. Built-in commands like /help and /compact ship with Claude Code. Custom commands are Markdown files — a project command lives at .claude/commands/<name>.md and a personal one at ~/.claude/commands/<name>.md, with the file name becoming the command. As of the 2026 Claude Code docs, custom commands have been merged into Skills: a file at .claude/commands/deploy.md and a skill at .claude/skills/deploy/SKILL.md “both create /deploy and work the same way,” and existing .claude/commands/ files keep working.

    Is a Connector the same as an MCP server?

    Effectively yes, for custom connectors. Anthropic states “Custom connectors using remote MCP are available on Claude, Cowork, and Claude Desktop,” and that they “are reached from Anthropic’s cloud infrastructure.” A Connector is the Claude-app packaging of a remote MCP server; MCP is the underlying open standard.

    What’s the difference between a Skill and an MCP server?

    A Skill teaches Claude how to do a task — it “provide[s] Claude with domain-specific expertise: workflows, context, and best practices.” An MCP server gives Claude access to external systems — it connects Claude “to data sources, tools and workflows.” One is procedural knowledge; the other is a connection.

    Do Skills cost a lot of context tokens?

    Not until used. Per Anthropic’s docs, Level 1 metadata costs about 100 tokens per Skill and is always loaded; the full SKILL.md body (under 5k tokens) only loads when the Skill is triggered; and bundled resources are read on demand with effectively no upfront cost. This is the “progressive disclosure” design.

    What can a Claude Code plugin contain?

    “Any combination of” slash commands, subagents, MCP servers, and hooks, per the announcement; the Claude Code documentation adds that a plugin can also bundle a skills/ directory. You install one with the /plugin command, optionally from a marketplace added via /plugin marketplace add.

    Are custom slash commands still a thing?

    They still work, but they’ve been folded into Skills. The Claude Code docs state custom commands “have been merged into skills,” that existing .claude/commands/ files keep working, and that a command file and an equivalent SKILL.md both produce the same / command. Skills add optional extras like supporting files and automatic invocation.


  • Migrating Off Retired Claude Models: The Breaking-Change Checklist (2026)

    Migrating Off Retired Claude Models: The Breaking-Change Checklist (2026)

    Last verified: June 13, 2026

    Claude Opus 4 (claude-opus-4-20250514) and Claude Sonnet 4 (claude-sonnet-4-20250514) are deprecated and retire on June 15, 2026, after which requests to them return a 404. The official replacements are claude-opus-4-8 and claude-sonnet-4-6. But swapping the model string alone will break a working integration: depending on which target you choose, several request parameters that were valid on the May 2025 models now return a 400 error, and two changes alter behavior silently. This page maps each removed or changed parameter to the exact failure and the fix.

    One distinction governs the whole migration. The Opus path (to claude-opus-4-8) is the strict one: it removes temperature/top_p/top_k and manual thinking budgets entirely. The Sonnet path (to claude-sonnet-4-6) is gentler: it keeps sampling parameters (with the older “one of temperature or top_p, not both” rule) and still accepts budget_tokens as deprecated-but-functional. The one rule both paths share: assistant-turn prefills now return 400.

    The breaking-change matrix

    Each row is a change that breaks on at least one migration target. “Error” means the API rejects the request server-side (HTTP 400) even though the SDK request type still type-checks. “Silent” means no error — the behavior simply differs.

    Change On Opus 4.8 On Sonnet 4.6 Symptom Fix
    thinking: {type:"enabled", budget_tokens:N} 400 error (removed) Deprecated, still works 400 on Opus; cost/latency drift on Sonnet thinking: {type:"adaptive"} + output_config.effort
    temperature / top_p / top_k 400 error (removed) Keep only one of temperature or top_p 400 on Opus if any set; 400 on Sonnet if both set Remove on Opus; steer via prompt. Keep one on Sonnet
    Assistant-turn prefill (last message role:"assistant") 400 error 400 error Request rejected on both output_config.format (structured outputs) or system-prompt instruction
    thinking.display default Defaults to "omitted" Returns summarized text Reasoning text empty on Opus (silent) Set display: "summarized" on Opus
    Tokenizer New tokenizer (more tokens) Unchanged tokenizer Same text counts higher on Opus; max_tokens too tight Re-baseline with count_tokens; add headroom
    output_format (top-level) Deprecated API-wide Deprecated API-wide Works, but slated for removal Move to output_config: {format: {...}}

    Model ID swaps and retirement dates

    Retiring model Model ID Retires Replacement
    Claude Opus 4 claude-opus-4-20250514 (alias claude-opus-4-0) June 15, 2026 claude-opus-4-8
    Claude Sonnet 4 claude-sonnet-4-20250514 (alias claude-sonnet-4-0) June 15, 2026 claude-sonnet-4-6

    These are the original May 2025 models, not the later Opus 4.6 or Sonnet 4.5 releases. Use the exact replacement strings above — do not append a date suffix to claude-opus-4-8 or claude-sonnet-4-6 (they are dateless pinned snapshots).

    budget_tokens to adaptive thinking

    The Opus path removes the fixed thinking budget. thinking: {type:"enabled", budget_tokens:N} returns a 400 on claude-opus-4-8. The replacement is adaptive thinking — the model decides how much to think per request — with overall depth controlled by the effort parameter (low | medium | high | xhigh | max). There is no direct token-count equivalent; effort is an output-level control, not a thinking budget.

    # Before (Claude Opus 4 / Sonnet 4)
    client.messages.create(
        model="claude-opus-4-20250514",
        max_tokens=16000,
        thinking={"type": "enabled", "budget_tokens": 10000},
        messages=[{"role": "user", "content": "..."}],
    )
    
    # After (Claude Opus 4.8)
    client.messages.create(
        model="claude-opus-4-8",
        max_tokens=16000,
        thinking={"type": "adaptive"},
        output_config={"effort": "high"},  # or "max", "xhigh", "medium", "low"
        messages=[{"role": "user", "content": "..."}],
    )

    On the Sonnet path, budget_tokens is deprecated but still functional on claude-sonnet-4-6, so it will not 400 — but you should still migrate to adaptive thinking. Note also that Sonnet 4.6 defaults to effort: "high" where Sonnet 4 had no effort parameter at all; if you do not set it explicitly you may see higher latency and token use after the swap.

    Sampling parameters: removed vs. restricted

    This is where the two paths diverge most. On claude-opus-4-8, setting temperature, top_p, or top_k to any non-default value returns a 400. Remove them entirely and steer behavior through prompting instead. (If you used temperature=0 for determinism, note it never guaranteed identical outputs on prior models either.)

    # Opus path — sampling params 400 on claude-opus-4-8
    # Before
    client.messages.create(
        model="claude-opus-4-20250514",
        temperature=0.7,
        top_p=0.9,
        messages=[...],
    )
    
    # After — remove them
    client.messages.create(
        model="claude-opus-4-8",
        messages=[...],
    )

    On claude-sonnet-4-6 the older Claude 4.x rule still applies: you may pass one of temperature or top_p, but passing both returns a 400. So a Sonnet 4 to Sonnet 4.6 move only requires dropping one of the two if you were setting both.

    Assistant-turn prefills to structured outputs

    Prefilling the final assistant turn — ending your messages array with a role: "assistant" message to force a response shape — returns a 400 on both claude-opus-4-8 and claude-sonnet-4-6. This is the one breaking change you cannot dodge by choosing the gentler target. The replacement depends on what the prefill was doing.

    Prefill was used for Replacement
    Forcing JSON / YAML / schema output output_config.format with a json_schema
    Forcing a classification label A tool with an enum field, or structured outputs
    Skipping preambles (“Here is…”) System-prompt instruction: respond directly, no preamble
    Continuing an interrupted response Move continuation into the user turn
    Steering around bad refusals Usually unnecessary now — plain user-turn prompting suffices
    # Before (fails on both targets) — prefill forcing JSON shape
    messages=[
        {"role": "user", "content": "Extract the name."},
        {"role": "assistant", "content": "{\"name\": \""},
    ]
    
    # After — structured outputs replace the prefill
    client.messages.create(
        model="claude-opus-4-8",
        max_tokens=1024,
        output_config={"format": {"type": "json_schema", "schema": SCHEMA}},
        messages=[{"role": "user", "content": "Extract the name."}],
    )

    Thinking display: the silent one

    On claude-opus-4-8, thinking blocks still stream, but their thinking text field is empty unless you opt in — the default is display: "omitted". There is no error; if your UI rendered the summarized reasoning, it now shows a long pause before output. Restore it by setting the display mode:

    thinking = {
        "type": "adaptive",
        "display": "summarized",  # default is "omitted" on Opus 4.8/4.7
    }

    The block-field name is unchanged — it is still block.thinking on a thinking-type block. The fix is the request parameter, not the response-handling code. (Sonnet 4.6 is not affected by this default change.)

    The new tokenizer: re-baseline max_tokens

    This change is Opus-only and easy to miss because it produces no error. claude-opus-4-8 uses the tokenizer introduced with Opus 4.7, under which the same text tokenizes to roughly 1x–1.35x as many tokens — up to about 35% more, around 30% on typical content, varying by workload. Three consequences:

    What to check Why
    max_tokens ceilings and compaction triggers The same output now consumes more tokens; tight limits truncate mid-thought
    Client-side token estimators (e.g. fixed char-to-token ratios) Calibrated against the old tokenizer; now undercount
    Cost and rate-limit dashboards count_tokens returns higher numbers; re-baseline before reacting

    Re-run client.messages.count_tokens(model="claude-opus-4-8", ...) on a representative sample of your prompts. Do not apply a blanket multiplier. Sonnet 4.6 keeps the older tokenizer, so a Sonnet 4 to Sonnet 4.6 move has no tokenizer re-baseline to do.

    The full checklist

    Step Opus 4 to 4.8 Sonnet 4 to 4.6
    Update model ID string Required Required
    Replace budget_tokens with adaptive thinking Required (400) Recommended (deprecated)
    Sampling params Remove all (400) Keep only one (both 400)
    Remove assistant-turn prefills Required (400) Required (400)
    Set display: "summarized" if showing reasoning Required for visible thinking Not applicable
    Re-baseline max_tokens for new tokenizer Required Not applicable
    Set effort explicitly Defaults to high Defaults to high
    Move output_format to output_config.format Recommended Recommended
    Verify tool inputs parsed with a JSON parser Recommended Recommended
    Spot-check one request, then roll out Required Required

    If you run Claude Code, /claude-api migrate applies the model swap, breaking-parameter changes, prefill replacement, and effort calibration across a codebase, then produces a verify-it-yourself checklist. It asks you to confirm scope before editing any files.

    Is migrating off Claude Opus 4 really not just a model-string change?

    No. Moving to claude-opus-4-8 also requires removing temperature/top_p/top_k and any budget_tokens (all now return 400), removing assistant-turn prefills (400), opting back into summarized thinking if your UI shows it, and re-baselining max_tokens for the new tokenizer. Only the Sonnet 4 to Sonnet 4.6 move is close to a drop-in — and even that requires removing prefills.

    When exactly do Claude Opus 4 and Sonnet 4 stop working?

    June 15, 2026. After that date, requests to claude-opus-4-20250514 and claude-sonnet-4-20250514 return a 404. These are the original May 2025 models, not Opus 4.6 or Sonnet 4.5.

    What replaces budget_tokens now that it errors on Opus?

    Adaptive thinking (thinking: {type:"adaptive"}) plus the effort parameter inside output_config. There is no exact token-count equivalent: the model decides how much to think per request, and effort (low through max) tunes overall depth and spend. On Sonnet 4.6, budget_tokens still works but is deprecated.

    Why does the same prompt cost more tokens on Opus 4.8?

    Opus 4.8 uses the tokenizer introduced with Opus 4.7, under which the same text produces roughly 1x–1.35x as many tokens (about 30% more on typical content, up to ~35%). Re-run the count_tokens endpoint against claude-opus-4-8 and give max_tokens and compaction triggers extra headroom. Sonnet 4.6 keeps the older tokenizer, so it is unaffected.

    My thinking summaries disappeared after migrating to Opus — is that a bug?

    No. On Opus 4.8 (and 4.7), thinking.display defaults to "omitted", so thinking blocks stream with an empty text field. Set display: "summarized" in your thinking config to restore visible reasoning. The field name is unchanged; only the default flipped.


  • Claude Code Billing in 2026: Subscription Usage vs the Agent Credit Pool

    Claude Code Billing in 2026: Subscription Usage vs the Agent Credit Pool

    Last verified: June 13, 2026

    Claude Code has two billing models, and which one applies depends on how you run it, not just which plan you hold. When you use Claude Code interactively in the terminal or IDE on a Pro or Max plan, it draws from the same subscription usage limits as your Claude.ai chats. But starting June 15, 2026, Anthropic separates out programmatic usage: the Claude Agent SDK, the claude -p headless command, the Claude Code GitHub Actions integration, and third-party apps that authenticate through the Agent SDK will no longer count against your interactive subscription pool. Instead they draw from a new, separate monthly Agent SDK credit, billed at standard API rates. This page documents both models, the exact credit amounts per plan, and the SDK package rename you may also need to handle.

    The two billing models at a glance

    The dividing line is interactive vs. programmatic. One number to remember: setting an ANTHROPIC_API_KEY environment variable overrides your subscription entirely — Claude Code then authenticates with that key and bills as pay-as-you-go API usage, regardless of plan.

    Usage type How it runs Billed against
    Interactive Claude Code Terminal or IDE, human at the keyboard Pro/Max subscription usage limits
    Claude.ai chat Web, desktop, mobile Pro/Max subscription usage limits
    Agent SDK (Python/TypeScript) Your own programmatic projects Separate Agent SDK credit (from June 15, 2026)
    claude -p (non-interactive) Headless / scripted Claude Code Separate Agent SDK credit (from June 15, 2026)
    Claude Code GitHub Actions CI/CD automation Separate Agent SDK credit (from June 15, 2026)
    Any usage with ANTHROPIC_API_KEY set API-key auth instead of subscription Standard API rates (pay-as-you-go)

    What changes on June 15, 2026

    Per Anthropic’s support documentation: “Starting June 15, 2026, Claude Agent SDK and claude -p usage no longer counts toward your Claude plan’s usage limits.” Each subscription tier instead receives a fixed monthly Agent SDK credit. When that credit runs out, additional Agent SDK usage flows to usage credits at standard API rates — but only if you have enabled usage credits. If you have not, “Agent SDK requests stop until your credit refreshes.” Unused credits do not roll over to the next billing cycle, and there is no automatic fallback to the interactive pool.

    Plan Monthly Agent SDK credit
    Pro $20
    Max 5x $100
    Max 20x $200
    Team (Standard seats) $20
    Team (Premium seats) $100
    Enterprise (seat-based Premium) $200

    What stays on the interactive subscription pool, unchanged: Claude conversations on web, desktop, and mobile; and interactive Claude Code in the terminal or IDE. The change is scoped strictly to programmatic execution.

    How each pool is metered and priced

    Claude Code “charges by API token consumption” — the underlying meter is input/output tokens, including thinking tokens billed as output. On a subscription, that token consumption is what counts against your plan limits (interactive) or your Agent SDK credit (programmatic). The Agent SDK credit and any overflow are billed at standard API list rates; the per-model API token prices below are the verified current rates.

    Pool Meter Price basis
    Interactive (Pro/Max) Tokens, against plan usage limits Included in subscription
    Agent SDK credit Tokens, against monthly credit Standard API rates
    Overflow past the credit Tokens, usage credits Standard API rates (only if usage credits enabled)
    API key (ANTHROPIC_API_KEY) Tokens, pay-as-you-go Standard API rates

    Verified current API token prices (per million tokens) for models commonly used in Claude Code:

    Model Model ID Input $/Mtok Output $/Mtok
    Claude Opus 4.8 claude-opus-4-8 $5.00 $25.00
    Claude Sonnet 4.6 claude-sonnet-4-6 $3.00 $15.00
    Claude Haiku 4.5 claude-haiku-4-5 $1.00 $5.00

    Subscription plan prices

    These are the published Claude plan prices the Agent SDK credits attach to. The Max 5x plan starts at $100/month; the $200 figure for Max 20x is documented as the matching Agent SDK credit amount for that tier.

    Plan Price
    Free $0
    Pro $20/month, or $17/month billed annually ($200 up front)
    Max 5x From $100/month
    Team (Standard seat) $25/seat/month, or $20/seat/month billed annually

    The SDK rename: claude-code-sdk to claude-agent-sdk

    Separate from billing, the SDK itself was renamed. Anthropic’s migration guide states: “The Claude Code SDK has been renamed to the Claude Agent SDK.” If you have code on the old package, you must update the package name, imports, and one Python type. The headless CLI command name is unchanged — it is still claude -p.

    Aspect Old New
    npm package (TS/JS) @anthropic-ai/claude-code @anthropic-ai/claude-agent-sdk
    Python package claude-code-sdk claude-agent-sdk
    Python options type ClaudeCodeOptions ClaudeAgentOptions
    Default system prompt Claude Code’s preset Minimal (opt back in via preset: "claude_code")
    # TypeScript
    npm uninstall @anthropic-ai/claude-code
    npm install @anthropic-ai/claude-agent-sdk
    
    # Python
    pip uninstall claude-code-sdk
    pip install claude-agent-sdk

    Decision: which billing path applies to your work

    If you are… Billing path
    A developer coding interactively in the terminal Subscription usage limits (unchanged)
    Running claude -p in a script or cron job Agent SDK credit (from June 15, 2026)
    Running Claude Code in GitHub Actions Agent SDK credit (from June 15, 2026)
    Building an app on the Agent SDK with subscription auth Agent SDK credit (from June 15, 2026)
    A team or service account wanting budgets + usage reports Set ANTHROPIC_API_KEY → standard API billing

    Does interactive Claude Code billing change on June 15, 2026?

    No. Anthropic’s documentation confirms interactive Claude Code in the terminal or IDE, and Claude conversations on web, desktop, and mobile, continue using subscription usage limits as before. Only programmatic usage — the Agent SDK, claude -p, GitHub Actions, and third-party Agent SDK apps — moves to the separate Agent SDK credit.

    How much is the separate Agent SDK credit?

    $20/month on Pro, $100 on Max 5x, $200 on Max 20x, $20 on Team Standard seats, $100 on Team Premium seats, and $200 on Enterprise seat-based Premium. The credit is billed at standard API rates, does not roll over, and refreshes monthly.

    What happens when the Agent SDK credit runs out?

    Additional Agent SDK usage flows to usage credits at standard API rates — but only if you have enabled usage credits. If you have not enabled them, Agent SDK requests stop until your credit refreshes. There is no automatic fallback to your interactive subscription pool.

    How do I avoid the credit pool entirely?

    Set an ANTHROPIC_API_KEY environment variable. Claude Code and the Agent SDK then authenticate with that key and bill as standard pay-as-you-go API usage, separate from any subscription. This is Anthropic’s recommended path for apps, CI jobs, service accounts, and team-owned projects that need budgets and usage reporting.

    Was the Claude Code SDK renamed?

    Yes. It is now the Claude Agent SDK. The npm package @anthropic-ai/claude-code became @anthropic-ai/claude-agent-sdk, the Python package claude-code-sdk became claude-agent-sdk, and the Python type ClaudeCodeOptions became ClaudeAgentOptions. The claude -p CLI command name is unchanged.


  • Latest Claude Models — June 2026 (Current Lineup, Pricing, and Specs)

    Latest Claude Models — June 2026 (Current Lineup, Pricing, and Specs)

    Updated June 12, 2026. Fable 5 is the current top-tier model, released June 9, 2026. The full lineup: Fable 5 → Opus 4.8 → Sonnet 4.6 → Haiku 4.5. Pricing and availability verified against Anthropic’s official docs.

    Current Claude Models — Quick Reference

    Model API ID Input $/MTok Output $/MTok Context Best For
    Claude Fable 5 🆕 claude-fable-5 $10.00 $50.00 1M tokens Complex engineering, long-horizon agentic work
    Claude Opus 4.8 claude-opus-4-8 $5.00 $25.00 1M tokens Everyday advanced work, high-volume pipelines
    Claude Sonnet 4.6 claude-sonnet-4-6 $3.00 $15.00 1M tokens Balanced capability and speed
    Claude Haiku 4.5 claude-haiku-4-5-20251001 $1.00 $5.00 200K tokens High-speed, cost-sensitive tasks

    All four models support vision, tool use/function calling, and batch processing. Fable 5 and Opus 4.8 are available on AWS Bedrock, Google Cloud Vertex AI, and Microsoft Azure AI Foundry in addition to the direct Anthropic API.

    Claude Fable 5 (June 9, 2026)

    Fable 5 is Anthropic’s first publicly available Mythos-class model — a capability tier previously restricted to research and select enterprise partners. It’s the most capable model Anthropic has released to date.

    What makes Fable 5 different:

    • SWE-bench Verified: 95.0% (vs 88.6% for Opus 4.8)
    • SWE-bench Pro: 80.0% (vs 69.2%)
    • Senior Engineer benchmark: 91/100 (vs ~63/100)
    • Adaptive extended thinking (always on, not a mode switch)

    Important limitations:

    • 2x the cost of Opus 4.8 ($10/$50 vs $5/$25)
    • Mandatory 30-day data retention — not available under zero data retention (ZDR)
    • Safety classifiers route cybersecurity, biology, chemistry, and distillation prompts to an Opus 4.8 fallback — you pay Fable 5 rates for Opus 4.8 output in those domains
    • Higher latency on complex tasks (60 seconds to several minutes vs 3–15 seconds for Opus 4.8)

    Free through June 22, 2026: Claude Pro, Max 5x, Max 20x, Team, and Enterprise subscription plans include Fable 5 at no extra charge during the launch window.

    Claude Opus 4.8

    Opus 4.8 is Anthropic’s current workhorse for serious work — the right default for most API applications and Claude Code use. It supports zero data retention (ZDR), which Fable 5 does not.

    Key specs:

    • Context: 1M tokens
    • Max output: 32K tokens per request
    • Extended thinking: Available (opt-in mode)
    • ZDR: Yes
    • Batch API: Yes (50% discount on batch processing)

    Use Opus 4.8 as your default model unless you have a specific reason to go up to Fable 5 or down to Sonnet/Haiku. It hits the best balance of capability, speed, cost, and data policy flexibility.

    Claude Sonnet 4.6

    Sonnet 4.6 targets use cases where response speed matters and you don’t need Opus-level reasoning. It’s the model Anthropic’s own infrastructure runs Claude.ai on for Pro subscribers doing day-to-day chat work.

    Key specs:

    • Context: 1M tokens
    • Max output: 64K tokens per request
    • Extended thinking: Available
    • ZDR: Yes
    • Typical latency: 1–5 seconds for most tasks

    Good for: content generation pipelines, customer-facing chat, document analysis at volume, anything where sub-5-second response time matters.

    Claude Haiku 4.5

    Haiku 4.5 is Anthropic’s fastest and cheapest model. At $1 input / $5 output per million tokens, it’s 10x cheaper than Fable 5.

    Key specs:

    • Context: 200K tokens (smaller than the Opus/Sonnet/Fable 1M window)
    • Max output: 16K tokens per request
    • Latency: Sub-second for most tasks
    • ZDR: Yes

    The 200K context window is the main limitation. For tasks that fit within that window — classification, short-form generation, routing, extraction — Haiku 4.5 is the cost-optimal choice. For longer documents or conversations, step up to Sonnet or Opus.

    How to Choose the Right Claude Model

    The decision framework I use:

    1. Does the task require multi-step reasoning, complex coding, or long-horizon autonomy? → Fable 5 (if cost and latency are acceptable) or Opus 4.8
    2. Is this a routine task at reasonable volume? → Opus 4.8 as the default
    3. Does latency matter more than maximum reasoning depth? → Sonnet 4.6
    4. Is this high-volume, short-context, cost-sensitive work? → Haiku 4.5
    5. Does your use case require zero data retention? → Any model except Fable 5

    Most production applications use a routing strategy: Fable 5 or Opus 4.8 for the hard jobs, Haiku 4.5 for classification and pre-processing, Sonnet 4.6 for user-facing response generation.

    Claude Subscription Plans and Model Access (June 2026)

    Plan Price Models Included
    Free $0 Limited Sonnet 4.6 access
    Pro $20/mo ($17 annual) Sonnet 4.6, Opus 4.8, Fable 5 (through June 22)
    Max 5x $100/mo All models, 5x usage vs Pro
    Max 20x $200/mo All models, 20x usage vs Pro
    Team Standard $20/seat/mo (annual), $25 month-to-month All models + admin features
    Team Premium $100/seat/mo (annual), $125 month-to-month All models + priority + advanced admin
    Enterprise $20/seat + usage at API rates (contact sales) All models + ZDR + custom retention + SSO

    Claude Code (the CLI tool) is included in all paid subscription plans. API access for building your own applications is separate — billed per token via the Anthropic Console regardless of subscription status.

    Legacy Models (Still Available, No Longer Latest)

    These models are still available via the API but are not Anthropic’s current recommended versions:

    • Claude Opus 4.7 (claude-opus-4-7) — prior Opus tier, succeeded by 4.8
    • Claude Opus 4.6 (claude-opus-4-6) — two generations back
    • Claude Sonnet 4.5 (claude-sonnet-4-5) — prior Sonnet tier
    • Claude 3.5 Haiku / Sonnet / Opus — Claude 3.x generation, still functional for legacy integrations

    If you’re building a new application, start with the current lineup. Legacy model IDs are useful for maintaining compatibility in existing applications that haven’t been updated.

    Platform Availability

    Platform Fable 5 Opus 4.8 Sonnet 4.6 Haiku 4.5
    Anthropic API (direct)
    AWS Bedrock
    Google Cloud Vertex AI
    Microsoft Azure AI Foundry
    GitHub Copilot ✓ (via Foundry)

    Frequently Asked Questions

    What is the newest Claude model?
    As of June 2026, Claude Fable 5 is the newest and most capable model Anthropic has released. It launched June 9, 2026. The API model ID is claude-fable-5.

    Is Claude Fable 5 the same as Claude 5?
    No. Anthropic changed the naming convention — there is no “Claude 5.” The Fable series is a new tier above the Opus/Sonnet/Haiku hierarchy. Fable 5 is Anthropic’s first Mythos-class model released for general availability.

    What is the most powerful Claude model?
    Claude Fable 5 is currently the most powerful. For tasks where Fable 5’s safety classifier routing applies (cybersecurity, biology, chemistry, distillation), or where zero data retention is required, Claude Opus 4.8 is the appropriate top-tier choice.

    What Claude model does Claude.ai use by default?
    Depends on your plan. Free tier uses a limited version of Sonnet. Pro and Max subscribers access Opus 4.8 as the default with Fable 5 available (through June 22, 2026 included, after that plan-dependent). Claude.ai routes to the appropriate model for your plan automatically.

    How do I use the latest Claude model in the API?
    Set the model parameter in your API request to the model ID. For Fable 5: "model": "claude-fable-5". For Opus 4.8: "model": "claude-opus-4-8". See the full API reference at console.anthropic.com.

    What’s the difference between Claude Opus 4.8 and Fable 5?
    Fable 5 is significantly stronger on complex engineering tasks — SWE-bench Pro: 80% vs 69.2%, and Senior Engineer benchmark: 91 vs ~63 out of 100. The trade-off: Fable 5 costs 2x more ($10/$50 vs $5/$25 per MTok), has higher latency, and requires 30-day data retention. For most work, Opus 4.8 is the right choice. Full Fable 5 vs Opus 4.8 breakdown here.

    Changelog

    • June 12, 2026 — Added Claude Fable 5 (released June 9). Updated pricing table. Added platform availability table.
    • May 2026 — Claude Opus 4.8 and Sonnet 4.6 were the current top-tier models.

    This page is updated as Anthropic releases new models. Last verified: June 12, 2026. For API pricing, check console.anthropic.com — the canonical source.