Tag: Claude AI Pricing

  • Claude Team Plan Usage Limits: What Doubled in May 2026 (and What Didn’t)

    Claude Team Plan Usage Limits: What Doubled in May 2026 (and What Didn’t)

    Last refreshed: May 15, 2026

    The Claude Team plan’s usage limits changed significantly in May 2026. If you’re a Team subscriber and you haven’t noticed yet, you’re now getting substantially more capacity than you were in April — and the free tier got left behind entirely. Here’s exactly what changed, what you have now, and what it means in practice.

    Updated May 9, 2026

    Rate limits doubled for Team plan subscribers following Anthropic’s SpaceX Colossus 1 compute deal (announced May 6, 2026). Free plan excluded from all increases. This page reflects current limits.

    What Changed in May 2026: The SpaceX Rate Limit Increase

    On May 6, 2026, Anthropic announced a compute partnership with SpaceX, giving it access to SpaceX’s Colossus 1 data center. The practical result for paying subscribers came fast: rate limits doubled. Here’s the breakdown by tier:

    • Claude Code Pro and Max: 5-hour rate limits doubled
    • Team plan (all seats): 5-hour rate limits doubled
    • Seat-based Enterprise: 5-hour rate limits doubled
    • Tier 1 API customers: Max input tokens per minute increased 1,500%; max output tokens per minute increased 900%
    • Peak-hours throttling: Eliminated entirely for Pro and Max subscribers
    • Free plan: No change. Explicitly excluded from all increases.

    Source: Anthropic’s official announcement at anthropic.com/news/higher-limits-spacex.

    The 1,500% input token figure for Tier 1 API is the one that didn’t get much press coverage. That’s a 15× ceiling increase for API users who’ve been running agent pipelines and hitting hard walls. If you’ve been rate-limited during multi-step Claude Code runs, this is the change that matters most.

    Team Plan Seat Structure (Still Current)

    The seat types haven’t changed — just the capacity within them. The Team plan still offers two seat types that can be mixed within the same organization:

    Seat Type Annual Price Monthly Price Usage vs Pro Claude Code
    Standard $25/seat/month $30/seat/month 1.25× more per session No
    Premium $100/seat/month $125/seat/month 6.25× more per session Yes

    Both seat types benefit from the May 2026 doubling of the 5-hour rate limit window. A Premium seat’s 6.25× multiplier now applies to a higher baseline than it did before May 6.

    How the 5-Hour Rate Limit Window Works

    Anthropic uses a rolling 5-hour window for usage limits, not a daily reset. Here’s what that means practically:

    • Usage is measured across a rolling 5-hour window, not midnight-to-midnight
    • If you hit the limit, you wait for the oldest usage to roll off — not for a fixed reset time
    • Heavy burst usage depletes your window faster than spread-out usage
    • The May 2026 doubling means the ceiling within that window is now twice as high

    Peak-hours throttling — the extra restriction that kicked in during high-demand periods — is now eliminated for Pro and Max. Team plan benefits from the doubled limit floor; the throttling elimination is Pro and Max specific.

    Current Models Available on Team Plan

    As of May 2026, the Claude model lineup (verified from Anthropic’s official models page):

    Model API String Context Window
    Claude Opus 4.7 claude-opus-4-7 1M tokens
    Claude Sonnet 4.6 claude-sonnet-4-6 1M tokens
    Claude Haiku 4.5 claude-haiku-4-5-20251001 200K tokens

    Deprecation notice: Claude Sonnet 4 and Opus 4 (original 4.0-generation, 20250514 date-string model IDs) are being retired June 15, 2026. Update any API integrations before that date.

    What the Free Plan Doesn’t Get

    The May 2026 rate limit increase does not apply to free accounts. Anthropic explicitly excluded the free tier from all capacity increases tied to the SpaceX deal. Paid plans now have a substantially higher ceiling while the free ceiling stays the same. If you’re hitting limits regularly on the free tier, the May 2026 changes are pressure toward upgrading — not relief.

    Team Plan vs Pro: Which Limit Structure Fits You?

    • Individual power user: Pro ($20/month) with throttling eliminated is a strong option.
    • Team with Claude Code needs: Team Premium seats ($100/seat/month annually) give Claude Code access, 6.25× multiplier, and the doubled 5-hour window.
    • Team without Claude Code needs: Standard Team seats ($25/seat/month annually) for shared access at higher limits than individual Pro.

    Frequently Asked Questions

    Did the Team plan rate limits actually double in May 2026?

    Yes. Anthropic confirmed the 5-hour rate limit doubled for Team plan subscribers following the SpaceX Colossus 1 compute deal announced May 6, 2026. This applies to both Standard and Premium seats.

    Does peak-hours throttling elimination apply to Team plan?

    The peak-hours throttling elimination was announced specifically for Pro and Max subscribers. Team plan benefits from the doubled rate limit floor; throttling elimination was not announced for Team.

    What happens when I hit a Team plan usage limit?

    Claude notifies you that you’ve reached your usage limit. With the 5-hour rolling window, you can continue once older usage rolls off — you’re not waiting for a midnight reset. Burst usage depletes the window faster than spread usage over the same period.

    Are Claude Sonnet 4 and Opus 4 still available on Team?

    They remain available but retire June 15, 2026. After that date, the active lineup is Opus 4.7, Sonnet 4.6, and Haiku 4.5.

    Does the 1,500% Tier 1 API increase apply to Team plan API usage?

    The 1,500% input and 900% output token increases apply to Tier 1 API customers specifically. Team plan through claude.ai uses the doubled 5-hour window. Both benefits apply in their respective contexts if you’re a Tier 1 API customer and a Team subscriber.

    Is the free plan getting any rate limit improvements?

    No. The free plan was explicitly excluded from all rate limit increases in the May 2026 SpaceX announcement.

  • Claude AI Pricing: Every Plan Explained (Free, Pro, Max, Team, Enterprise)

    Claude AI Pricing: Every Plan Explained (Free, Pro, Max, Team, Enterprise)

    Looking for quick answers? The FAQ version covers every common question directly.

    → Claude Pricing FAQ

    Anthropic’s Claude pricing covers six tiers — Free, Pro, Max 5x, Max 20x, Team, and Enterprise — plus a separate pay-per-token API. Choosing the wrong path can cost you significantly more than necessary. Here’s what each option actually includes in 2026.

    What Are Claude’s Subscription Plans and Prices?

    Claude offers six tiers: Free ($0), Pro ($20/month), Max 5x ($100/month), Max 20x ($200/month), Team (from $25/seat/month), and Enterprise (custom pricing).

    Plan Price Best For
    Free $0 Casual exploration
    Pro $20/month Individual power users
    Max 5x $100/month Developers hitting Pro limits
    Max 20x $200/month Full-day heavy usage
    Team Standard $25/seat/month (annual) Collaborative teams
    Team Premium $100/seat/month (annual) Developer teams needing Claude Code
    Enterprise Custom Large orgs with compliance needs

    What Does the Claude Free Plan Include?

    The Free plan gives you access to Claude on web, iOS, Android, and desktop with no credit card required, subject to rolling usage limits.

    The Free plan gives you access to Claude on web, iOS, Android, and desktop with no credit card required. It includes text, image, and code generation plus web search. Usage limits are intentionally opaque — Anthropic doesn’t publish exact message caps — but limits reset on a rolling 5-hour window. The Free tier is designed for exploration, not sustained daily work.

    Is Claude Pro Worth $20 a Month?

    Pro delivers substantially more usage than Free, plus Claude Code, unlimited projects, the Research feature, and Google Workspace integration — sufficient for most individual developers and writers.

    Pro delivers substantially more usage than Free, Claude Code in the terminal, unlimited projects, the Research feature, file creation, code execution, and Google Workspace integration. Usage still has limits — Anthropic does not publish exact message counts, but heavy sessions will reach the ceiling — but it’s sufficient for most individual developers and writers. Annual billing brings the effective rate to $17/month.

    What Is the Difference Between Claude Max 5x and Max 20x?

    Max 5x ($100/month) gives you 5x Pro’s per-session usage; Max 20x ($200/month) gives you 20x — enough that rate limits stop being a practical concern for full-day development work.

    Max 5x provides 5x Pro’s per-session headroom at $100/month. Max 20x at $200/month delivers 20x Pro usage — enough that rate limits stop being a practical concern for most full-day development work. Both tiers include Claude Code, with access to Claude Opus 4.7 and Sonnet 4.6, and a 1M token context window.

    Extra usage is available on Pro, Max 5x, and Max 20x — when you hit your included limit, you can continue at standard API-rate billing with a spending cap you set.

    How Does Claude Team Plan Pricing Work?

    Team requires a minimum of 5 seats: Standard seats at $25/seat/month (annual) include collaboration features but not Claude Code; Premium seats at $100/seat/month add Claude Code for developers.

    Team requires a minimum of 5 seats and comes in two flavors. Standard seats at $25/seat/month (annual) include 1.25x more usage per session than Pro with a weekly reset, plus collaboration features, central billing, SSO, and Microsoft 365 and Slack integrations. Standard seats do not include Claude Code.

    Premium seats at $100/seat/month add Claude Code, making them the right choice for engineering team members. You can mix Standard and Premium seats within one Team plan — so non-technical staff get Standard while developers get Premium.

    Enterprise Plan — Custom Pricing

    Enterprise is for organizations with compliance, data residency, or governance requirements. It includes access to the full 1M token context window, HIPAA readiness, SAML SSO, domain capture, spend controls, and dedicated support. Based on user reports, pricing starts around $60/seat with a 70-seat minimum, putting the floor near $50,000 annually — contact Anthropic sales for exact figures. Training on customer data is disabled contractually at this tier.

    How Much Does the Claude API Cost Per Token?

    As of May 2026: Claude Sonnet 4.6 costs $3.00 input / $15.00 output per million tokens; Opus 4.6 costs $5.00 / $25.00; Haiku 4.5 costs $1.00 / $5.00.

    The API is entirely separate from subscription plans. You pay per million tokens (MTok) with no monthly minimum. Current rates as of May 2026 (verified from Anthropic’s official models page):

    • Claude Opus 4.7: $5.00 input / $25.00 output per MTok
    • Claude Sonnet 4.6: $3.00 input / $15.00 output per MTok
    • Claude Haiku 4.5: $1.00 input / $5.00 output per MTok

    Prompt caching cuts input costs by up to 90% for repeated context. The Batch API processes requests within 24 hours at a flat 50% discount on all tokens — ideal for content pipelines, data enrichment, and any workload where real-time responses aren’t required. As of March 2026, Anthropic eliminated long-context surcharges, so a 900K-token request costs the same per-token rate as a 9K one.

    May 2026 — Professional Services Pricing

    Managed Agents

    Token rates + $0.08/session-hour active runtime. No surcharge for Orchestration or Outcomes (public beta).

    Claude Security Beta

    Included in Enterprise during beta. Powered by Opus 4.7 ($5/$25 per MTok at API rates).

    Claude Mythos Preview

    $25/$125 per MTok. Invitation-only via Project Glasswing.

    → Full Pricing FAQ · Managed Agents pricing deep-dive

    Which Claude Plan Is Right for You?

    Start with Pro for individual use, move to Max 5x if you regularly hit limits, choose Max 20x for full-day heavy use, and use Team for groups of 5+ where Standard seats cover non-technical staff and Premium covers developers.

    Start with Pro if you’re an individual who hits Free limits regularly. Move to Max 5x if you’re a developer doing focused coding sessions. Max 20x makes sense if Claude is your primary tool throughout the workday. For teams, buy Standard seats for non-technical staff and Premium seats for developers who need Claude Code. If you’re building an application or automation that calls Claude programmatically, use the API — subscription plans don’t provide API credits and don’t reduce API costs.

    Claude API Pricing: Pay-Per-Token Rates for Every Model

    The Claude API is priced separately from claude.ai subscriptions. You pay per million tokens (MTok) consumed — input and output priced separately. There is no monthly minimum; you add credits and they deplete as you use the API.

    Model Input (per MTok) Output (per MTok) Context Window
    Claude Opus 4.7 $5.00 $25.00 1M tokens
    Claude Sonnet 4.6 $3.00 $15.00 1M tokens
    Claude Haiku 4.5 $1.00 $5.00 200K tokens

    Prompt caching reduces costs significantly for repeated context: cache write is 25% of base input price, cache read is 10%. The Batch API offers 50% off all models for non-time-sensitive work. For a full breakdown of how to minimize token spend, see Claude on a Budget: the Complete Guide.

    How Does Claude Pricing Compare to GPT-4o and Gemini 2.0?

    Model Input (per MTok) Output (per MTok)
    Claude Sonnet 4.6 $3.00 $15.00
    Claude Haiku 4.5 $1.00 $5.00
    GPT-4o (OpenAI) $2.50 $10.00
    Gemini 2.0 Flash $0.075 $0.30
    Gemini 2.5 Pro $1.25 $10.00

    Claude Sonnet 4.6 sits above GPT-4o on price but competes at or above it on reasoning tasks. Claude Haiku 4.5 is the cost-competitive option for high-volume pipelines. Gemini 2.0 Flash is significantly cheaper for commodity tasks; the trade-off is reasoning depth and context handling on complex documents.

    Frequently Asked Questions: Claude Pricing

    How much does Claude cost per month?

    Claude costs $0 (Free), $20/month (Pro), $100/month (Max 5x), or $200/month (Max 20x) for individual plans. Team plans start at $25/seat/month (annual, 5-seat minimum). API access is pay-per-token with no monthly minimum.

    Is there a free version of Claude?

    Yes. The Free plan gives access to Claude on web, iOS, Android, and desktop with no credit card required. Usage limits apply and reset on a rolling 5-hour window. The Free tier is suitable for light, exploratory use but not sustained daily work.

    What does Claude Pro include at $20/month?

    Pro includes approximately 5x the usage of Free, Claude Code in the terminal, unlimited projects, the Research feature, file creation, code execution, and Google Workspace integration. Annual billing brings the effective rate to $17/month.

    What is the cheapest way to use Claude?

    The Free plan is the cheapest at $0. For API access, Claude Haiku 4.5 at $1 input / $5 output per MTok is the most cost-efficient model. Combined with the Batch API (50% discount) and prompt caching, high-volume workflows can run at a fraction of standard API cost.

    May 2026: Managed Agents & Claude Security Pricing

    Added May 9, 2026

    Anthropic’s professional services now include Managed Agents and Claude Security. Pricing for both is API-based, not subscription-based.

    Claude Managed Agents Pricing

    Managed Agents pricing follows the standard API token rates for whichever Claude model you use inside the agent pipeline — there’s no separate Managed Agents surcharge on top of model costs. You pay for the tokens the models consume:

    Component Model Used Input / Output per MTok Status
    Multiagent Orchestration Your choice Model rate applies Public beta
    Outcomes Your choice Model rate applies Public beta
    Dreaming (memory refinement) Advisor model (short plan) + executor model Billed separately by role Developer preview

    The Dreaming advisor tool uses a short-plan generation (typically 400–700 tokens) at the advisor model’s rate, while the executor handles full output at its lower rate — keeping combined cost well below running the advisor model end-to-end. Use max_uses to cap advisor calls per request. Requires beta header: anthropic-beta: advisor-tool-2026-03-01. Docs: platform.claude.com/docs/en/managed-agents/dreams

    Claude Security Beta Pricing

    Claude Security is currently in public beta for Enterprise customers. Anthropic has not published a standalone per-scan or per-seat price for Claude Security Beta — access is included as part of Enterprise during the beta period. Underlying model is Claude Opus 4.7 ($5 input / $25 output per million tokens at API rates). For Enterprise pricing including Claude Security, contact Anthropic sales.

    Claude Mythos Preview Pricing (Project Glasswing)

    Claude Mythos Preview is not available via standard API or any subscription tier. Through Project Glasswing (invitation-only, defensive cybersecurity workflows): $25 per million input tokens, $125 per million output tokens. No self-serve access — contact Anthropic for Glasswing information at anthropic.com/glasswing.

    What to do next

    Now that you have the price — here’s how to actually run it

    Knowing the cost is step one. The harder questions are whether Managed Agents is the right architecture for your use case, how it compares to building on the raw API, and what a realistic monthly bill looks like at scale.


    Claude Pricing Calculator (Updated May 15, 2026)

    Use this tool to figure out which Claude plan actually fits your usage, what you’d pay on the API equivalent, and how the new June 15, 2026 Agent SDK billing change affects your costs. All rates verified against Anthropic’s official pricing documentation as of May 15, 2026.

    Tell us how you use Claude





    2 = roughly 30 hours of normal Claude use per month


    Output is typically ~25% of input for chat work


    $ value of unattended Claude work (cron jobs, scripts, GitHub Actions). 0 if you only chat.

    This calculator uses Anthropic’s published API rates as of May 15, 2026. Subscription pricing reflects current public plans. The Agent SDK monthly credit pool launches June 15, 2026 — Pro $20, Max 5x $100, Max 20x $200, Team Standard $20/seat, Team Premium $100/seat.

    Next Steps: What to Read After This

    You came here for pricing. Depending on what you actually need to do next, these are the right places to go:

    If you’re deciding whether to subscribe

    Is Claude Free? What You Actually Get Without Paying

    Walk through the free tier limits and decide if you need to pay at all.

    If you’re working at a team or company

    Claude Team Plan: When to Upgrade and What You Get

    Per-seat pricing, shared usage limits, admin controls, and when Team beats individual Pro.

    If you’re running automation or scripts

    Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026

    The new Agent SDK credit pool, what it covers, and what to do before the cutover.

    If you want to actually start building

    Anthropic Console: The Complete Guide to Getting Started

    Set up an API key, navigate the console, and run your first request.

    If you’re a student looking to save

    Claude Student Discount: The Honest Guide to Getting Claude for Less

    No public student discount exists, but here are the legitimate paths to free or reduced access.

    If you’re choosing which model to use

    Claude Models Roadmap May 2026: Opus 4.7, Knowledge Cutoffs, the 1M Context Window

    The current lineup, what each tier costs, and what’s actually verified about Claude 5.

    For the broader operating philosophy of how Claude fits alongside the rest of a working AI stack, see The Three-Legged Stack: Why I Run Everything on Notion, Claude, and Google Cloud.

  • Claude Student Discount: The Honest Guide to Getting Claude for Less (May 2026)

    Claude Student Discount: The Honest Guide to Getting Claude for Less (May 2026)

    Last refreshed: May 15, 2026

    May 2026 Update — Free Plan Left Behind

    Anthropic’s May 2026 SpaceX rate limit increase (doubled 5-hour limits, eliminated peak-hour throttling) explicitly excluded the free plan. If you’re on free and hoping the latest compute expansion helped, it didn’t. This update explains what that means practically and what your actual options are.

    The May 2026 Update: Free Plan Was Explicitly Left Out

    When Anthropic announced doubled rate limits following the SpaceX Colossus 1 compute deal (May 6, 2026), they were specific: the increases apply to Pro, Max, Team, and seat-based Enterprise. The free plan was explicitly excluded.

    This matters for the student/budget conversation because:

    • Free plan rate limits stayed exactly where they were — no improvement
    • The gap between what free users can do and what paid subscribers can do just widened
    • Peak-hours throttling elimination applies to Pro and Max only — not free
    • Claude Code access remains unavailable on free

    If you were waiting to see if Anthropic would upgrade free tier limits alongside the major infrastructure expansion — the answer is no. The business decision is clear: compute improvements go to paying customers first, and the free tier stays constrained to drive conversion.

    What This Means If You’re a Student Trying to Use Claude Free

    You can still use Claude on the free tier. The model you access is capable — Anthropic hasn’t crippled it. What you’re constrained by is how much you can use it before hitting a limit, and how fast it responds during peak hours. Both of those constraints worsened relative to paid tiers in May 2026, because paid tiers got better while free stayed the same.

    For light usage — occasional questions, single documents, short projects — free is still viable. For sustained daily use, research workflows, or anything involving long documents and multiple sessions, free will slow you down in ways that affect your work.

    Quick Answer

    There is no official Claude student discount. Claude Pro costs $20/month for everyone. However, there are three legitimate paths to reduced or free access for students — and one of them covers most student use cases completely.

    The Three Ways Students Actually Get Claude for Less

    Best for most students
    Claude Free Tier
    Access to Claude Sonnet 4.6 with daily usage limits. Sufficient for essay drafting, coding help, summarization, and research. No credit card required. Limits reset daily.
    $0/month — no card needed
    University programs
    Claude for Education
    Anthropic has institutional agreements with select universities. If your school has a deal, access may be included in your student account. Check with your IT department or university library — coverage is expanding but not universal.
    Free if your school participates
    API credits
    GitHub Student Developer Pack
    GitHub’s student pack periodically includes credits for AI tools and APIs. Availability changes — check current offers at education.github.com. Requires a .edu email or institutional verification.
    Variable — check current offers
    Full access
    Claude Pro — $20/month
    5x more usage than free, priority access during peak hours, access to Claude Opus 4.7 for complex tasks. No student discount, but the free tier covers most student workloads without it.
    $20/month — no discount available

    What the Free Tier Actually Gets You

    Most students overestimate how much Claude Pro they need. The free tier handles:

    • Essay feedback and drafting assistance
    • Coding help — debugging, explaining concepts, generating boilerplate
    • Research summarization — paste an article or paper, get a structured summary
    • Math and problem-set walkthroughs
    • Study guide generation from lecture notes

    Where you’ll hit limits: long research sessions on a single topic, processing multiple long documents in the same conversation, or high-volume API access for a class project. For those cases, Claude Pro or API credits are the right call.

    Claude for Education — Current Status

    Anthropic’s education program is expanding but not yet universal. The fastest way to find out if your institution participates is to email your university’s IT department or check whether your library already has a Claude subscription that extends to students.

    Harvard, for example, replaced ChatGPT Edu with Claude in 2026 — so institutional deals are happening. If your school hasn’t moved yet, it may soon.

    What Claude Pro Is Actually Worth for Students

    If you’re doing intensive AI-assisted work — a thesis, a capstone project, a research paper that requires synthesizing many sources — $20/month is reasonable for a semester. Many students find they need it for two or three months out of the year and can drop to free for the rest.

    There’s no annual commitment required. You can subscribe month-to-month and cancel when the project is done.

    Bottom Line

    Start with the free tier. It covers the majority of student use cases. If you hit the limit consistently, check whether your university has an institutional deal before paying. If neither works for your project, Claude Pro at $20/month is month-to-month with no lock-in.

    For teams making a buying decision

    Evaluating Claude for a team — not just yourself?

    If you’re working through the plan decision for a business or agency, the calculus is different than individual use. We’ve run this math across 20+ client accounts and can tell you exactly where the API breaks even vs. subscription, and which plan structure makes sense for your workload.

    Get a plan recommendation →

  • Claude API Access from Singapore and China: What Actually Works in 2026

    Claude API Access from Singapore and China: What Actually Works in 2026

    Last refreshed: May 15, 2026

    If you are a developer in Singapore or China trying to use Claude, you have already noticed that the standard instructions don’t quite apply to you. The console.anthropic.com onboarding assumes a US billing address. The latency numbers assume you are pinging from a US data center. And for developers in mainland China, the direct API doesn’t work at all without a workaround.

    This is a practical guide to what actually works in 2026, written for the Asian developer market that is increasingly one of Claude’s most active audiences.

    Singapore: What Works Directly

    Singapore is a fully supported country for the Anthropic API. You can create an account at console.anthropic.com, add a payment method, and generate API keys with no restrictions. Most major international credit cards work without issues. If you are at a company with a Singapore entity, Anthropic accepts international wire transfers for enterprise contracts.

    Latency from Singapore to Anthropic’s US API endpoints typically runs 180–250ms round-trip depending on your ISP and the model you are calling. For most application use cases this is acceptable. For latency-sensitive real-time applications — voice interfaces, live coding assistants — you will want to route through a closer compute layer, which is where Vertex AI becomes relevant.

    Vertex AI: The Regional Solution for Both Markets

    Google Cloud’s Vertex AI hosts Claude models (Sonnet and Haiku tiers as of mid-2026) and has a data center in Singapore: asia-southeast1. This is the cleanest solution for developers in both Singapore and the broader Asia-Pacific region who want lower latency and enterprise-grade SLAs.

    The practical difference: instead of calling api.anthropic.com, you call a Vertex AI endpoint scoped to asia-southeast1. Your tokens are processed in Singapore, not Virginia. For regulated industries — fintech, healthcare, legal — this also means your data doesn’t leave the region, which is a compliance requirement in several Singapore regulatory frameworks (MAS TRM guidelines being the primary one).

    To get started with Claude on Vertex AI from Singapore:

    1. Create a GCP project and enable the Vertex AI API
    2. Request access to Claude models via the Vertex AI Model Garden (approval is typically same-day for Singapore accounts)
    3. Set your region to asia-southeast1 in all API calls
    4. Authenticate via a GCP service account rather than an Anthropic API key

    The pricing on Vertex AI is comparable to direct Anthropic API pricing, with GCP committed use discounts available at higher volumes.

    AWS Bedrock: The Other Regional Option

    Amazon Bedrock also hosts Claude models and has a Singapore region (ap-southeast-1). If your infrastructure is already on AWS, this is often the simpler path. The setup mirrors Vertex AI: enable Bedrock in your AWS console, request Claude model access, and specify the Singapore region in your SDK calls.

    The practical consideration: as of mid-2026, model availability on Bedrock sometimes lags behind the direct Anthropic API by a few weeks when new versions ship. If being on the latest Claude version immediately matters for your use case, the direct API or Vertex AI are more current.

    China: The Honest Situation

    The direct Anthropic API is not accessible from mainland China without a VPN. Console.anthropic.com is not blocked at the DNS level in the same way Google is, but connectivity is unreliable and payment processing from Chinese-issued cards through Stripe (Anthropic’s payment processor) fails for most users.

    The workarounds that Chinese developers are actually using in 2026:

    VPN plus international card. Developers with access to a VPN and an international payment card (Hong Kong or Singapore bank account) use the direct API without issues. This is the most common setup among individual developers and small teams.

    Hong Kong entity. Companies with a Hong Kong subsidiary or registered office use that entity for the Anthropic API account. Hong Kong is a fully supported region with no connectivity issues.

    Third-party API proxies. Several API aggregators operating out of Hong Kong and Singapore re-sell Anthropic API access to mainland China developers. Quality and terms vary significantly — vet carefully before using in production.

    Vertex AI via a non-China GCP account. Some development teams maintain a GCP account registered to a Singapore or Hong Kong entity, then call the Vertex AI Claude endpoint from within China via GCP’s global network. Google Cloud has limited but operational connectivity from within China through its global backbone. This is the most enterprise-appropriate solution for teams that need a compliant path.

    Latency Reality Check by Access Method

    Access Method From Singapore From China (with VPN)
    Direct Anthropic API (us-east) 180–250ms 300–500ms+
    Vertex AI (asia-southeast1) 30–60ms 150–300ms via GCP backbone
    AWS Bedrock (ap-southeast-1) 25–55ms Not directly accessible

    Latency figures are representative ranges based on typical ISP routing. Your numbers will vary.

    Payment and Billing Notes

    For Singapore developers on the direct Anthropic API: Visa, Mastercard, and American Express issued by Singapore banks work reliably. PayNow and local payment rails are not supported — you need an international card.

    For enterprise: Anthropic’s sales team handles invoiced billing for Singapore and other APAC markets. If you are spending meaningfully on the API, contact sales rather than running on a credit card — the invoiced route gives you better cost predictability and eliminates card limit friction.

    The Bottom Line

    If you are in Singapore, the direct API works and Vertex AI’s asia-southeast1 region gives you a lower-latency, compliance-friendly alternative worth evaluating for production workloads.

    If you are in mainland China, the direct API requires a workaround. A Hong Kong entity plus Vertex AI is the cleanest enterprise path. For individual developers, VPN plus an international card is the practical reality.

    The Asian developer market is using Claude at scale. The tooling is there — it just requires knowing which path to take from where you are sitting.

    Based in Singapore or Asia-Pacific?

    I can help you pick the right access path for your stack and region.

    Email me your setup — direct API, Vertex AI, or Bedrock — and I’ll give you a straight answer on what makes sense.

    Email Will → will@tygartmedia.com

  • Claude Context Window Size 2026: What 1 Million Tokens Actually Means

    Claude Context Window Size 2026: What 1 Million Tokens Actually Means

    Last refreshed: May 15, 2026

    Looking for quick answers? The FAQ version covers every common question directly.

    → Context Window FAQ

    Claude’s context window is one of those specs that sounds simple until you actually need to use it. “1 million tokens” means almost nothing without a frame of reference. This is the guide we wish existed when we started building on Claude — written from our own experience running it in production, with numbers pulled directly from Anthropic’s official documentation.

    Quick Definition

    The context window is Claude’s working memory for a conversation. It holds everything Claude can see and reason about at once: your messages, Claude’s responses, any documents you’ve shared, and system prompts. When the window fills up, earlier content drops out.

    Current Context Window Sizes by Model (May 2026)

    These numbers come directly from Anthropic’s official models page, fetched May 9, 2026. Model strings are exact API identifiers:

    Model API String Context Window Max Output
    Claude Opus 4.7 claude-opus-4-7 1,000,000 tokens 128,000 tokens
    Claude Sonnet 4.6 claude-sonnet-4-6 1,000,000 tokens 64,000 tokens
    Claude Haiku 4.5 claude-haiku-4-5-20251001 200,000 tokens 64,000 tokens

    Opus 4.7 and Sonnet 4.6 both have the full 1M token context window. Haiku 4.5 is 200K. The key difference between Opus 4.7 and Sonnet 4.6 in this table is the max output — Opus 4.7 can write up to 128K tokens in a single response, Sonnet 4.6 caps at 64K.

    What Does 1 Million Tokens Actually Hold?

    Token counts are an abstraction. Here’s what 1 million tokens translates to in practical terms:

    • About 750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
    • A full mid-size codebase — a 50,000-line Python project with comments fits comfortably
    • Hours of meeting transcripts — a full workday of recorded calls, transcribed, fits in one context window
    • Multiple large documents simultaneously — 10 research PDFs at 30 pages each, all in the same conversation
    • Long conversation histories — hundreds of back-and-forth exchanges before anything starts dropping off

    We’ve loaded entire Notion exports, full project histories, and multi-document research packs into a single Claude session. At 1M tokens, you’re unlikely to hit the ceiling in a normal working session. You hit it when you’re doing things like: loading your entire codebase plus documentation plus conversation history and then asking Claude to do a full architectural review.

    Context Window vs. Memory: What’s the Difference?

    This is where a lot of people get confused. The context window and memory are not the same thing:

    • Context window: What Claude can see right now, in this session. Once a session ends, it’s gone.
    • Memory (in claude.ai): A separate system that extracts and stores key information from past sessions. It surfaces relevant facts into future conversations as a snippet in the context.
    • Managed Agents memory stores: A developer-layer construct where agents maintain and update knowledge bases across sessions — distinct from both the context window and the consumer memory feature.

    The 1M token context window is your working memory for one session. It doesn’t persist. Memory systems are what carry information across sessions — but they work by injecting a summary into the context window of the new session, not by giving Claude access to the full history.

    Does a Bigger Context Window Mean Better Performance?

    Mostly yes, with one important nuance. More context means Claude has more information to reason about, which generally produces better outputs for tasks that benefit from full context — code reviews, document synthesis, long-form writing, multi-document comparison.

    The nuance: performance can degrade on tasks involving specific information buried deep in a very long context. This is sometimes called the “lost in the middle” problem — models tend to pay more attention to the beginning and end of a long context than the middle. Anthropic has worked on this with Claude’s architecture, and it performs well on long-context tasks, but it’s worth structuring important information at natural reference points rather than burying it in the middle of a 500-page document.

    How We Actually Use the 1M Token Window

    We run Claude in production for content operations, site management, and agentic coding workflows. Here’s where the 1M context window makes a concrete difference in our work:

    • Full site audits: Loading every post from a WordPress site (200+ posts worth of content) into one session for comprehensive SEO analysis — without having to chunk and re-prompt
    • Cross-session context: Pasting in long Notion briefings, prior session transcripts, and the current task in one go. The window is large enough that we don’t have to decide what to leave out.
    • Codebase-wide reasoning: In Claude Code, having the full project context means Claude can make changes that account for how files interact rather than reasoning only about the current file
    • Multi-document synthesis: Research projects where we load 10-15 source documents and ask Claude to synthesize across them — something that was impossible at 100K context windows

    The practical shift from 200K to 1M tokens wasn’t just “more room.” It changed what we could ask Claude to do in a single session.

    Context Window on the API: Batch Output Extension

    For API users: on the Message Batches API, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 300K output tokens using the output-300k-2026-03-24 beta header. This is relevant for batch generation tasks where you need very long outputs — documentation generation, large codebases, book-length content.

    Frequently Asked Questions

    What is Claude’s context window in 2026?

    Claude Opus 4.7 and Claude Sonnet 4.6 both have 1,000,000 token (1M token) context windows as of May 2026. Claude Haiku 4.5 has a 200,000 token context window. These are the current generally available models.

    How many pages can Claude read at once?

    At 1M tokens, Claude can hold roughly 750,000 words of English text — equivalent to approximately 3,000 average pages. In practice, a typical 20-page PDF is roughly 10,000-15,000 tokens, so you could load 60-100 such documents in a single session before approaching the limit.

    Does the context window reset between messages?

    No — the context window accumulates across an entire conversation session. Every message you send and every response Claude gives adds to the total. The window doesn’t reset between individual messages; it resets when you start a new conversation.

    What happens when Claude hits the context window limit?

    When a conversation reaches the context window limit, earlier messages begin to drop out of the active context. Claude can no longer reference information from those earlier messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification when you’re approaching the limit.

    Is the 1M context window available on the free plan?

    The model available to free plan users has access to the 1M context window. However, free plan usage limits mean long-context sessions hit rate limits faster than paid plans. The window is technically available, but sustained heavy use of it is more practical on paid tiers.

    What’s the difference between Claude Opus 4.7 and Sonnet 4.6 context windows?

    Both have the same 1M token input context window. The difference is max output: Opus 4.7 can generate up to 128,000 tokens in a single response; Sonnet 4.6 caps at 64,000 tokens. For most tasks this distinction doesn’t matter, but for very long document generation or large code outputs, Opus 4.7 has the higher output ceiling.

  • Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Anthropic’s Claude model lineup in 2026 breaks down into three distinct tiers: Opus 4.7 for maximum capability, Sonnet 4.6 for the best balance of performance and cost, and Haiku 4.5 for speed and high-volume work. Picking the wrong model costs money or performance — sometimes both. This guide covers every meaningful difference so you can make the right call for your use case.

    Quick answer: Sonnet 4.6 handles 80–90% of tasks at 40% less cost than Opus. Use Opus 4.7 when you need maximum reasoning depth, the largest output window, or agentic coding at frontier quality. Use Haiku 4.5 when speed and cost are the priority and the task is straightforward.

    The Current Claude Model Lineup (April 2026)

    As of April 2026, Anthropic’s three recommended models are Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5. All three support text and image input, multilingual output, and vision processing. They differ significantly in pricing, context window, output limits, and capability.

    Feature Opus 4.7 Sonnet 4.6 Haiku 4.5
    Input price $5 / MTok $3 / MTok $1 / MTok
    Output price $25 / MTok $15 / MTok $5 / MTok
    Context window 1M tokens 1M tokens 200K tokens
    Max output 128K tokens 64K tokens 64K tokens
    Extended thinking No Yes Yes
    Adaptive thinking Yes Yes No
    Latency Moderate Fast Fastest
    Reliable knowledge cutoff Jan 2026 Aug 2025 (reliable) Feb 2025 (reliable)

    Pricing is per million tokens (MTok) via the Claude API. Source: Anthropic Models Overview, April 2026.

    Claude Opus 4.7: When to Use It

    Opus 4.7 is Anthropic’s most capable generally available model as of April 2026. Anthropic describes it as a step-change improvement in agentic coding over Opus 4.6, with a new tokenizer that contributes to improved performance on a range of tasks. Note that this new tokenizer may use up to 35% more tokens for the same text compared to previous models — a cost consideration worth factoring in for high-volume workflows.

    Key differentiators for Opus 4.7 over the other two models:

    • 128K max output tokens — double Sonnet and Haiku’s 64K cap. This matters for generating long-form code, detailed reports, or complete document drafts in a single call.
    • 1M token context window — same as Sonnet 4.6, meaning Opus can process entire codebases or book-length documents in a single session.
    • Adaptive thinking — Opus 4.7 and Sonnet 4.6 both support adaptive thinking, which lets the model adjust reasoning depth based on task complexity.
    • Most recent knowledge cutoff — January 2026, versus August 2025 (reliable) for Sonnet and February 2025 (reliable) for Haiku.

    Opus does not support extended thinking — that capability lives on Sonnet 4.6 and Haiku 4.5. Extended thinking lets the model reason step-by-step before generating output, which is particularly useful for complex math, science, and multi-step logic problems.

    Use Opus 4.7 for: complex architecture decisions, large codebase analysis, multi-agent orchestration tasks, outputs that require more than 64K tokens, tasks demanding the latest possible knowledge, and any work where you need the absolute frontier of Anthropic’s reasoning capability.

    Skip Opus 4.7 for: routine content generation, customer support pipelines, high-volume classification or extraction, real-time applications requiring low latency, or any task where Sonnet scores within your acceptable quality threshold.

    Claude Sonnet 4.6: The Workhorse

    Sonnet 4.6 is the model Anthropic recommends as the best combination of speed and intelligence. Released in February 2026, it delivers a 1M token context window at $3 input / $15 output per million tokens — the same context window as Opus at 40% lower cost.

    Sonnet 4.6 also uniquely offers extended thinking, which Opus 4.7 does not. When extended thinking is enabled, Sonnet can perform additional internal reasoning before generating its response — useful for reasoning-heavy tasks like complex debugging, multi-step research, and technical problem-solving where chain-of-thought depth matters.

    For developers and teams using Claude Code, Sonnet 4.6 is the standard daily driver. It handles tool calling, agentic workflows, and multi-file code reasoning reliably, at a price point that makes heavy daily use economically viable.

    Use Sonnet 4.6 for: most production workloads, Claude Code sessions, long-document analysis, content generation, coding tasks, research synthesis, customer-facing applications, and any workflow requiring the 1M context window where Opus’s premium isn’t justified.

    Skip Sonnet 4.6 for: high-volume pipelines where Haiku’s lower cost is acceptable, simple classification or extraction tasks, or real-time applications where Haiku’s faster latency is required.

    Claude Haiku 4.5: Speed and Volume

    Haiku 4.5 is the fastest model in the Claude family and the most cost-efficient at $1 input / $5 output per million tokens. It has a 200K token context window — smaller than Opus and Sonnet’s 1M, but still substantial for most single-task work. It supports extended thinking but not adaptive thinking.

    The 200K context limit is the most important practical constraint. Most single-document, single-task workflows fit within 200K. Multi-file codebases, long books, or extended conversation histories that push past that threshold need Sonnet or Opus.

    Haiku 4.5 has the oldest knowledge cutoff of the three: February 2025. For tasks requiring awareness of events or developments from mid-2025 onward, Haiku won’t have that context baked in.

    Use Haiku 4.5 for: content moderation, classification pipelines, entity extraction, customer support triage, real-time chat interfaces, simple Q&A, high-volume API workflows where cost and speed dominate, and any task where quality requirements are modest.

    Skip Haiku 4.5 for: complex reasoning, large codebase analysis, tasks requiring recent knowledge (post-February 2025), multi-step agent workflows, or any output requiring more than 200K tokens of input context.

    Pricing: What the Numbers Actually Mean in Practice

    All three models price output tokens at 5x the input rate — a ratio that holds across the entire Claude lineup. This means verbose, long-form outputs cost significantly more than short, targeted responses. Minimizing generated output length is the highest-leverage cost optimization available before you touch model routing or caching.

    To put the pricing in concrete terms: generating one million output tokens (roughly 750,000 words of generated text) costs $25 on Opus, $15 on Sonnet, and $5 on Haiku. For input-heavy workloads like document analysis where you’re feeding in large amounts of text but getting shorter responses, the cost gap narrows.

    Three additional pricing levers apply across all models:

    • Prompt caching: Cuts cache-read input costs by up to 90% for repeated system prompts or documents. If your application reuses a large system prompt across many requests, caching is the single highest-impact cost reduction available.
    • Batch API: Provides a 50% discount for non-time-sensitive workloads processed asynchronously. Combine with prompt caching for up to 95% savings on qualifying workflows.
    • Model routing: Running a mix of Haiku for simple tasks, Sonnet for production workloads, and Opus for complex reasoning — rather than using one model for everything — can reduce total API costs by 60–70% without meaningful quality loss on the tasks that don’t require a flagship model.

    Context Windows: 1M Tokens vs. 200K

    Opus 4.7 and Sonnet 4.6 both offer a 1M token context window at standard pricing — no premium surcharge for extended context. For reference, 1 million tokens is roughly 750,000 words, enough to hold a large codebase, a full academic textbook, or months of business communications in a single conversation.

    Haiku 4.5 has a 200K token context window. That’s still roughly 150,000 words — sufficient for most single-document tasks, but it creates a hard ceiling for anything requiring multi-file code review, book-length document analysis, or lengthy conversation histories.

    If your workflow consistently requires more than 200K tokens of input, Sonnet 4.6 is the cost-efficient choice. Opus 4.7 is the right call only when the input load requires the additional reasoning capability Opus provides, not just the context window size — because Sonnet gets you the same 1M window at 40% lower cost.

    Extended Thinking vs. Adaptive Thinking

    These are two distinct features that appear together in the comparison table but serve different purposes.

    Extended thinking (available on Sonnet 4.6 and Haiku 4.5, not Opus 4.7) lets Claude perform additional internal reasoning before generating its response. When enabled, the model produces a “thinking” content block that exposes its reasoning process — step-by-step problem decomposition before the final answer. Extended thinking tokens are billed as standard output tokens at the model’s output rate. A minimum thinking budget of 1,024 tokens is required when enabling this feature.

    Adaptive thinking (available on Opus 4.7 and Sonnet 4.6, not Haiku 4.5) adjusts reasoning depth dynamically based on task complexity — the model allocates more reasoning for harder problems and less for simpler ones, without requiring explicit configuration.

    The practical implication: if you need transparent, controllable step-by-step reasoning that you can inspect and use in your application, Sonnet 4.6’s extended thinking is often the right tool — and at lower cost than Opus.

    Which Claude Model Should You Choose?

    The right framework for model selection in 2026 is to start with Sonnet 4.6 as your default and escalate selectively. Most production workloads — coding, writing, analysis, research, customer-facing applications — are well-served by Sonnet. Opus 4.7 earns its premium in specific scenarios: tasks requiring more than 64K output tokens, agent workflows demanding maximum reasoning depth, or applications where Anthropic’s latest knowledge cutoff is a meaningful factor.

    Haiku 4.5 belongs in any pipeline where you’ve identified tasks that don’t require Sonnet’s capability. High-volume routing, triage, classification, and real-time response scenarios are Haiku’s natural territory. Building a 70/20/10 routing split across Haiku 4.5, Sonnet 4.6, and Opus 4.7 — rather than using a single model for everything — is the standard approach for cost-efficient production deployments.

    Frequently Asked Questions

    What is the difference between Claude Opus 4.7, Sonnet, and Haiku?

    Opus is Anthropic’s most capable model, optimized for complex reasoning, large outputs, and agentic tasks. Sonnet offers a balance of capability and cost, handling most production workloads at lower price. Haiku is the fastest and cheapest option, suited for high-volume, lower-complexity tasks. All three share the same core Claude architecture and safety training.

    Is Claude Opus 4.7 worth the extra cost over Sonnet?

    For most tasks, no. Sonnet 4.6 handles the majority of coding, writing, and analysis work at 40% lower cost. Opus 4.7 is worth the premium when you need outputs longer than 64K tokens, maximum agentic coding capability, or the most recent knowledge cutoff (January 2026 vs. Sonnet’s August 2025).

    Which Claude model is best for coding?

    Sonnet 4.6 is the standard recommendation for most coding work, including Claude Code sessions. Opus 4.7 is preferred for large codebase analysis, complex architecture decisions, or multi-agent coding workflows where maximum reasoning depth is required. Haiku 4.5 can handle simple code edits and explanations at much lower cost.

    What is the Claude context window?

    Claude Opus 4.7 and Sonnet 4.6 both have a 1 million token context window — roughly 750,000 words of combined input and conversation history. Claude Haiku 4.5 has a 200,000 token context window. Context window size determines how much information Claude can hold and reference in a single conversation.

    Does Claude Opus 4.7 support extended thinking?

    No. Extended thinking is available on Claude Sonnet 4.6 and Claude Haiku 4.5, but not on Claude Opus 4.7. Opus 4.7 supports adaptive thinking instead, which dynamically adjusts reasoning depth based on task complexity.

    What is the cheapest Claude model?

    Claude Haiku 4.5 is the least expensive model at $1 per million input tokens and $5 per million output tokens. It is also the fastest Claude model, making it well-suited for high-volume, latency-sensitive applications.

    Can I use Claude through Amazon Bedrock or Google Vertex AI?

    Yes. All three current Claude models — Opus 4.7, Sonnet 4.6, and Haiku 4.5 — are available through Amazon Bedrock and Google Vertex AI in addition to the direct Anthropic API. Bedrock and Vertex AI offer regional and global endpoint options. Pricing on third-party platforms may vary from direct Anthropic API rates.

    Claude vs GPT-4o: Which Model Wins for Everyday Work?

    Claude Sonnet 4.6 and GPT-4o are the primary head-to-head competitors in 2026 for professional daily use. They price similarly ($3 vs $3.00 per MTok input) but perform differently depending on task type.

    Task Type Claude Sonnet 4.6 GPT-4o
    Long-document analysis (200K+ tokens) ✓ 1M context window 128K limit
    Multi-step reasoning Extended thinking available o1 series for reasoning
    Code generation Strong; Claude Code natively Strong; GitHub Copilot integration
    Instruction following Very consistent Consistent
    API cost (output) $15/MTok $10/MTok
    Context window 1M tokens 128K tokens

    The clearest differentiator is context window size. If your workflow involves analyzing full codebases, long contracts, or book-length documents in a single call, Claude Sonnet 4.6’s 1M token window eliminates chunking overhead that GPT-4o requires at 128K. For shorter tasks, either model performs comparably.

    Claude vs Gemini 2.5 Pro: How Do They Compare?

    Google’s Gemini 2.5 Pro competes directly with Claude Sonnet 4.6 on price and capability. Key differences:

    Feature Claude Sonnet 4.6 Gemini 2.5 Pro
    Input price $3.00/MTok $3.00/MTok (under 200K tokens)
    Output price $15.00/MTok $10.00/MTok
    Context window 1M tokens 1M tokens
    Extended thinking Yes Yes (2.5 Pro)
    Agentic coding Claude Code native Via Gemini API / IDX

    Gemini 2.5 Pro is cheaper on paper, especially for prompts under 200K tokens. Claude Sonnet 4.6’s advantage is instruction-following consistency on complex multi-step tasks and the Claude Code ecosystem for engineering teams already in the Anthropic stack.

    Which Claude Model Should You Use in Claude Code?

    Claude Code supports all three models. The recommended routing for most teams:

    • Sonnet 4.6 — Default daily driver for all coding tasks. Best cost-to-performance ratio. Extended thinking handles complex architecture decisions.
    • Opus 4.7 — Use for multi-agent orchestration, large codebase analysis across many files, or when output length exceeds 64K tokens (Opus has a 128K output cap vs 64K for Sonnet).
    • Haiku 4.5 — Use for high-frequency, low-complexity tasks: formatting, renaming, boilerplate generation, and pipeline steps where speed matters more than reasoning depth.

    The Max plan (available on claude.ai) unlocks 1M token context in Claude Code at no additional charge, which is the practical differentiator for large codebase work.

    Frequently Asked Questions: Claude Model Comparison

    What is the best Claude model in 2026?

    Claude Sonnet 4.6 is the recommended default for most tasks — it delivers 80-90% of Opus 4.7’s capability at 40% lower cost. Use Opus 4.7 when you need maximum reasoning depth, outputs longer than 64K tokens, or the most recent knowledge cutoff (January 2026). Use Haiku 4.5 for high-volume, speed-sensitive work.

    Is Claude Opus 4.7 better than Sonnet?

    Claude Opus 4.7 has a higher capability ceiling than Sonnet 4.6: larger output window (128K vs 64K tokens), the most recent knowledge cutoff, and stronger performance on complex agentic coding tasks. However, Sonnet 4.6 uniquely offers extended thinking which Opus does not support, and it costs 40% less. For most users, Sonnet 4.6 is the better practical choice.

    What is Claude Haiku 4.5 used for?

    Claude Haiku 4.5 is optimized for speed and cost efficiency at $1 input / $5 output per million tokens. It is best suited for high-volume pipelines, classification, metadata generation, social media content, and any task where fast response time matters more than maximum reasoning depth. It has a 200K token context window.

    Which Claude model supports extended thinking?

    Claude Sonnet 4.6 and Claude Haiku 4.5 both support extended thinking. Claude Opus 4.7 does not. Extended thinking allows the model to reason step-by-step internally before generating output, which improves performance on complex math, science, and multi-step logic problems.

  • How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The most counterintuitive thing about Claude Managed Agents pricing is what you don’t pay for. Most people, when they hear “$0.08 per session-hour,” mentally model a virtual machine running continuously. That’s the wrong mental model. Here’s the right one, and why it matters for your bill.

    The Core Distinction: Active vs. Idle

    Managed Agents session runtime only accrues while your session’s status is running. The session can exist — open, initialized, capable of continuing — without accumulating runtime charges when it’s not actively executing.

    The specific states that do not count toward your $0.08/hr charge:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Time waiting on an external API response your tool is calling
    • Rescheduling delays
    • Terminated session time

    This is a meaningful architectural decision by Anthropic. They’re billing on what actually taxes their compute — active execution — not on session existence or wall-clock time.

    Why This Is Different From How You Might Expect Billing to Work

    Compare three billing models:

    Virtual machine billing (what this is not): You pay for every hour the instance exists, whether it’s idle or saturated. A VM running 24/7 with 10% actual utilization still costs 24 hours/day.

    Lambda/function billing (closer analogy): AWS Lambda bills on execution duration and invocation count — you pay when code actually runs, not when a function is “available.” Idle Lambda functions cost nothing.

    Managed Agents billing (what this actually is): Closer to Lambda than VM. You pay $0.08 per hour of active execution. A session that runs for 2 hours of wall-clock time but has 90 minutes of waiting costs $0.08 × 1.5 hours = $0.12, not $0.08 × 2 hours = $0.16.

    A Real Scenario: The Human-in-the-Loop Agent

    Consider an agent that processes your inbox for action items and waits for your approval before sending replies. Wall-clock time: 4 hours open during your workday. Actual active execution: 20 minutes of processing across that 4-hour window, with the rest spent waiting for your review decisions.

    • VM billing equivalent: 4 hours × rate = significant charge
    • Managed Agents billing: 20 minutes × $0.08/hr = $0.027

    The difference is real. For interaction-heavy agents where the agent frequently waits for human decisions, the idle-time exclusion significantly reduces costs versus a naive per-hour model.

    A Real Scenario: The Autonomous Batch Agent

    Now consider an agent running a fully autonomous content pipeline — no human checkpoints, just continuous execution through a queue. Wall-clock time and active execution time are nearly identical because the agent never waits.

    • A 2-hour autonomous batch: 2 hours × $0.08 = $0.16

    Here, the idle-time model provides no benefit — the agent has no idle time. The billing is effectively equivalent to per-hour pricing because execution is continuous.

    Code Execution Containers Are Included

    One more billing nuance worth knowing: when your agent runs code, the execution happens in sandboxed Linux containers. These containers are not separately billed on top of session runtime. The $0.08/hr covers both the session runtime and the container execution. This is explicitly documented by Anthropic and represents meaningful savings if your agent is doing significant code execution work — you’re not paying twice.

    What This Means for Workload Design

    If you’re designing agent workflows and have the choice between architectures, the billing model creates a useful signal:

    • Agents that wait on humans: Metered billing is favorable — you only pay for the actual reasoning and execution time, not the human decision time
    • Fully autonomous agents: Billing approaches equivalent to per-hour rates — optimize these on token efficiency, not idle reduction
    • Scheduled batch agents: Natural fit — run when needed, terminate when done, no idle accumulation

    The 24/7 Agent Math

    For anyone doing the 24/7 always-on calculation: the maximum theoretical runtime exposure is 24 hrs × $0.08 × 30 days = $57.60/month in session fees. But a 24/7 agent with zero idle time is rare in practice. Agents that sleep between triggers, wait on external data, or hold for human decisions have meaningful idle windows that reduce the actual charge below the theoretical ceiling.

    Full monthly cost analysis: The Real Monthly Cost of Running Claude Managed Agents 24/7. Pricing reference: Complete Pricing Guide. All questions: FAQ Hub.

  • Claude Managed Agents — Every Question Answered (Complete FAQ 2026)

    Claude Managed Agents — Every Question Answered (Complete FAQ 2026)

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    Everything people actually ask about Claude Managed Agents, answered straight. No preamble about “the exciting world of AI agents.” If you’re here, you already know why this matters — you just need answers.

    This page covers pricing, setup, capabilities, limits, comparisons, and the specific questions that don’t have obvious homes in Anthropic’s documentation. It updates as the beta evolves.

    Context

    Claude Managed Agents launched April 8, 2026 as a public beta. All answers reflect current documentation as of April 2026. Beta details change — verify specifics at platform.claude.com/docs.

    Pricing Questions

    What does Claude Managed Agents cost?

    Two charges: standard Claude API token rates (same as calling the Messages API directly) plus $0.08 per session-hour of active runtime. That’s the complete formula. See the complete pricing reference for worked examples by workload type.

    What exactly is a “session-hour” and when does it start billing?

    A session-hour is one hour of active session runtime — time when your session’s status is running. Billing is metered to the millisecond. It does not accrue during idle time, time waiting for your input, time waiting for tool confirmations, or after session termination.

    What’s included in the $0.08/session-hour charge?

    The session runtime charge covers Anthropic’s managed infrastructure: sandboxed code execution containers, state management, checkpointing, tool orchestration, error recovery, and scaling. You are not separately billed for container hours on top of session runtime.

    Does the $0.08/hr apply even if my agent is just waiting?

    No. Time spent waiting for your message, waiting for tool confirmations, or sitting idle does not accumulate runtime charges. Only active execution time counts.

    What does web search cost inside a Managed Agents session?

    $10 per 1,000 searches ($0.01 per search), billed separately from session runtime and token costs. This is the same rate as web search through the standard API.

    Are there volume discounts?

    Yes, negotiated case-by-case for high-volume users. Contact [email protected] or through the Claude Console.

    How does Managed Agents pricing compare to running my own agent infrastructure?

    The $0.08/session-hour is almost always cheaper than equivalent provisioned compute — but you trade infrastructure control and data locality for that simplicity. For a full comparison: Build vs. Buy: The Real Infrastructure Cost.

    What’s the real monthly cost if I run an agent 24/7?

    Maximum theoretical session runtime: 24 hrs × $0.08 × 30 days = $57.60/month. In practice, no production agent has zero idle time. Token costs become the dominant cost driver long before you hit the runtime ceiling. Detailed breakdown: The Real Monthly Cost of Running Claude Managed Agents 24/7.

    Setup and Access Questions

    How do I get access to Claude Managed Agents?

    Available to all Anthropic API accounts in public beta — no separate signup. You need the managed-agents-2026-04-01 beta header in your API requests. The Claude SDK adds this header automatically.

    Does it work with my existing API key?

    Yes. Same API key you’re already using for the Messages API. Same authentication. The beta header is the only new requirement.

    What three ways can I access Managed Agents?

    Via the Claude SDK (recommended — handles the beta header automatically), via direct API calls with the beta header, or via the Claude Console’s new Managed Agents section for no-code agent configuration and session tracing.

    Can I use Managed Agents through AWS Bedrock or Google Vertex AI?

    Managed Agents runs on Anthropic-managed infrastructure. This is distinct from Bedrock and Vertex AI deployments. Check Anthropic’s current documentation for multi-cloud availability status — this is an area of active development.

    Capability Questions

    What can Claude Managed Agents actually do?

    Run long autonomous sessions with persistent state, execute code in sandboxed Linux containers, use tools including web search and MCP servers, coordinate multiple Claude instances via Agent Teams, and maintain checkpoints for crash recovery. The session can last minutes or hours without you staying in the loop.

    What’s the difference between Agent Teams and subagents?

    Agent Teams coordinate multiple Claude instances with independent contexts, direct agent-to-agent communication, and a shared task list — suited for complex parallel tasks. Subagents operate within the same session as the main agent and only report results upward — more economical for sequential targeted tasks but less capable of true parallelism.

    Does it support MCP servers?

    Yes. MCP servers can be integrated as tool sources in Managed Agents sessions, extending what the agent can access and act on.

    How long can a session run?

    Anthropic’s documentation currently references session durations of minutes to hours. Claude Code’s longest autonomous sessions have reached 45 minutes. Managed Agents is architected for longer-running work. Check current documentation for specific session duration limits as the beta matures.

    What happened to Claude Code — is it the same as Managed Agents?

    No. Claude Code is a separate local coding workflow product. Anthropic’s docs explicitly note partners should not conflate the two. Managed Agents is a hosted API runtime service. Claude Code is a developer tool. Different products, different use cases, different billing.

    Rate Limit Questions

    What are the rate limits for Managed Agents?

    60 requests per minute for create endpoints; 600 requests per minute for read endpoints. Organization-level API limits still apply on top of these. For higher limits, contact Anthropic enterprise sales. Detailed breakdown: Claude Managed Agents Rate Limits Explained.

    Do standard Claude API rate limits still apply inside a session?

    Organization-level limits apply. The session runtime and create/read endpoint limits are Managed Agents-specific. If you’re running many parallel Agent Teams, model token throughput limits will become relevant.

    Comparison Questions

    How does Managed Agents compare to OpenAI’s Agents API?

    Both offer hosted agent infrastructure. Key differences: Managed Agents is Claude-native (no multi-model flexibility), sessions bill on runtime + tokens vs. OpenAI’s different pricing model, and lock-in dynamics differ. Full comparison: Claude Managed Agents vs. OpenAI Agents API.

    Should I use Managed Agents or the Claude Agent SDK?

    Use Managed Agents when you want Anthropic to host the runtime — less infrastructure work, faster to production. Use the SDK when you need tighter loop control, on-premise execution, or multi-cloud flexibility. Anthropic’s own migration docs draw this line clearly: SDK runs in your environment; Managed Agents runs in theirs.

    What companies are already using Managed Agents in production?

    Notion, Asana, Rakuten, Sentry, and Vibecode were launch partners. Rakuten deployed five enterprise agents within a week. Allianz is using Claude for insurance agent workflows. Anthropic’s run-rate from the agent developer segment exceeds $2.5 billion. How Rakuten did it in a week →

    Data and Security Questions

    Where does my data go when running in Managed Agents?

    Execution runs on Anthropic’s infrastructure. This is the explicit trade-off: you get managed infrastructure; they manage the compute. For companies with strict data sovereignty requirements, this is the key constraint to evaluate. On-premise or native multi-cloud deployment is not currently available.

    What are the sandboxing guarantees?

    Anthropic uses disposable Linux containers — “decoupled hands” in their terminology. Each container is a fresh sandboxed environment for code execution. State persistence is managed separately from the execution environment.

    Strategic Questions

    Is this a bet worth making?

    That depends on your switching cost tolerance. Lock-in is real: once your agents run on Anthropic’s infrastructure with their tools, session format, and sandboxing, switching providers isn’t trivial. The counter-argument: the infrastructure you’d otherwise build to match this is months of engineering. One developer’s reaction at launch was blunt: “there goes a whole YC batch.” That captures both the opportunity and the risk. Our take on why we’re staying our course →

    What does this mean for AI citation and visibility?

    Agents running on Anthropic’s infrastructure make decisions about what content to surface, cite, and synthesize. As agent workloads grow, being present in the knowledge sources agents draw from becomes a search strategy question in itself. What AI citation monitoring looks like →

  • Claude Managed Agents — Complete Pricing Reference + Dreaming Update (May 2026)

    Claude Managed Agents — Complete Pricing Reference + Dreaming Update (May 2026)

    Last refreshed: May 15, 2026

    May 2026 Update — Dreaming Feature + Beta Status

    Anthropic introduced Dreaming at Code w/ Claude (May 6, 2026) — a new Managed Agents capability where agents review their own session history overnight to improve future performance. Harvey (legal AI) reported a roughly 6× task completion rate increase after implementing it. Dreaming is developer-access preview only. Multiagent Orchestration and Outcomes are now in public beta. See the new Dreaming section below.

    What Is Claude Managed Agents? (Current Status, May 2026)

    Claude Managed Agents is Anthropic’s framework for long-running, stateful AI agents — agents that can maintain context across sessions, hand off between sub-agents, and now, improve themselves by reviewing their own work history. Here’s the current status of each component:

    Component Status Who Has Access
    Multiagent Orchestration Public Beta All API developers
    Outcomes Public Beta All API developers
    Dreaming Developer Preview Selected developers only

    Dreaming: The Feature the Press Mostly Missed

    Announced at Code w/ Claude on May 6, 2026, Dreaming is a Managed Agents capability that lets agents review and reorganize their own memory between sessions. The mechanism:

    1. After a session ends, the agent reads its existing memory store alongside the session transcripts
    2. It produces a new, reorganized memory store: duplicates merged, stale entries replaced, new patterns surfaced
    3. The next session starts with a higher-quality knowledge base — capturing insights no single session could hold

    This is meaningfully different from simply persisting conversation history. The agent isn’t just remembering what happened — it’s synthesizing what it learned. Think of it as the difference between taking notes and actually reviewing and reorganizing your notes the next morning.

    The Harvey Result

    Harvey, the legal AI company, reported approximately a 6× task completion rate increase after implementing Dreaming in their Managed Agents workflow. Harvey’s use case — complex legal research that spans multiple sessions with evolving context — is exactly the kind of work Dreaming was designed for. Sessions build on each other rather than starting fresh each time.

    Dreaming is developer-access preview as of May 2026. Docs: platform.claude.com/docs/en/managed-agents/dreams.

    What Dreaming Is Not

    A few clarifications worth making explicit:

    • Dreaming is not available to end users — it’s a developer-layer capability requiring implementation
    • It’s not persistent memory in the claude.ai chat interface
    • It’s not available to free or standard Pro subscribers through any interface
    • It’s a developer preview, not GA — expect it to evolve before full release

    Our Take: Why This Architecture Matters

    We run Managed Agents in our own Cowork workflows. The Dreaming announcement is the first time Anthropic has shipped something that resembles how expert human knowledge actually compounds over time — not by accumulating raw notes, but by periodically synthesizing and reorganizing what’s been learned into a cleaner structure.

    The Harvey 6× result is a real-world data point from a production legal AI workflow. That’s not a benchmark number — it’s a deployed system showing measurable improvement from session-to-session memory refinement. Whether that 6× figure holds across different use cases is unknown, but the direction of the effect is the signal: agents that learn from their own history outperform agents that don’t.

    For non-developer users watching this space: Dreaming is the preview of what agentic AI will look like when it becomes mainstream. The groundwork being laid now in developer preview will eventually surface in subscription-tier products.

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    You opened this tab because you need a number you can actually use. Not a vibe, not “it depends.” A real pricing breakdown you can put in a spreadsheet, a budget request, or a Slack message to your CTO.

    This is that page. Every pricing variable for Claude Managed Agents in one place, verified against Anthropic’s current documentation as of April 2026. Bookmark it. The beta will update; so will this.

    Quick Reference: The Formula

    Total Cost = Token Costs + Session Runtime ($0.08/hr) + Optional Tools
    Session runtime only accrues while status = running. Idle time is free.

    The Two Cost Dimensions

    Claude Managed Agents bills on exactly two dimensions: tokens and session runtime. Every pricing question you have collapses into one of these two buckets.

    Dimension 1: Token Costs

    These are identical to standard Claude API pricing. You pay the same rates you’d pay calling the Messages API directly. No Managed Agents markup on tokens. Current rates for the models most commonly used in agent work:

    • Claude Sonnet 4.6: ~$3/million input tokens, ~$15/million output tokens
    • Claude Opus 4.7: higher rates apply — check platform.claude.com/docs/en/about-claude/pricing for current figures
    • Prompt caching: same multipliers as standard API — cache hits dramatically reduce input token costs on long sessions with stable system prompts

    The implication: a token-heavy agent with a large system prompt that runs the same context repeatedly benefits significantly from prompt caching, and that benefit carries over unchanged into Managed Agents.

    Dimension 2: Session Runtime — $0.08/Session-Hour

    This is the Managed Agents-specific charge. You pay $0.08 per hour of active session runtime, metered to the millisecond.

    The critical word is active. Runtime only accrues while your session’s status is running. The following do not count toward your bill:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Idle time between tasks
    • Rescheduling delays
    • Terminated session time

    This is not how you’d bill a virtual machine. It’s closer to how AWS Lambda bills — you pay for execution, not reservation. An agent that “runs” for 8 hours but spends 6 of those hours waiting on human input has a very different bill than one running continuous autonomous loops.

    Optional Tool Costs

    Web Search: $10 per 1,000 Searches

    If your agent uses web search, each search costs $10/1,000 — that’s $0.01 per search. For most agents, this is negligible. For a research agent running hundreds of searches per session, it becomes a line item worth modeling separately.

    Code Execution: Included in Session Runtime

    Code execution containers are included in your $0.08/session-hour charge. You’re not separately billed for container hours on top of session runtime. This is explicitly stated in Anthropic’s docs and represents meaningful savings versus provisioning your own compute.

    Worked Cost Examples

    Example 1: Daily Research Agent

    Runs once per day. 30 minutes of active execution. Processes 10 documents, outputs a summary report. Moderate token volume.

    • Session runtime: 0.5 hrs × $0.08 = $0.04/day (~$1.20/month)
    • Tokens (estimate): 50K input + 5K output with Sonnet 4.6 = ~$0.23/run (~$7/month)
    • Total: ~$8–10/month

    Example 2: Weekly Batch Content Pipeline

    Runs 3x/week. 2-hour active sessions. Processes multiple documents, generates structured outputs.

    • Session runtime: 2 hrs × $0.08 × 12 sessions/month = $1.92/month
    • Tokens: depends on content volume — typically $10–40/month
    • Total: ~$12–42/month

    Example 3: Customer Support Agent (Business Hours)

    Active during business hours, handling tickets. 8 hours/day active, 5 days/week.

    • Session runtime: 8 hrs × $0.08 × 22 days = $14.08/month in runtime
    • Tokens: highly variable by ticket volume — the dominant cost driver at scale
    • Runtime cost alone: ~$14/month — tokens are likely 5–20x this depending on volume

    Example 4: 24/7 Always-On Agent

    The maximum theoretical runtime exposure. Continuous operation, no idle time.

    • Session runtime: 24 hrs × $0.08 × 30 days = $57.60/month
    • In practice, no agent has zero idle time — real cost will be lower
    • Token costs at this scale become the dominant factor by a wide margin

    Anthropic’s Official Example (from their docs)

    A one-hour coding session using Claude Opus 4.7 consuming 50,000 input tokens and 15,000 output tokens: session runtime = $0.08. With prompt caching active and 40,000 of those tokens as cache reads, the token costs drop significantly. The runtime charge stays flat at $0.08 regardless of caching.

    What’s Not Billed in Managed Agents

    A few things that might seem like costs but aren’t:

    • Infrastructure provisioning: Anthropic handles hosting, scaling, and monitoring at no additional charge
    • Container hours: Explicitly not separately billed on top of session runtime
    • State management and checkpointing: Included in the session runtime charge
    • Error recovery and retry logic: Anthropic’s infrastructure problem, not yours

    Rate Limits

    Managed Agents has specific rate limits separate from standard API limits:

    • Create endpoints: 60 requests/minute
    • Read endpoints: 600 requests/minute
    • Organization-level limits still apply
    • For higher limits, contact Anthropic enterprise sales

    How to Access Managed Agents Pricing

    Managed Agents is available to all Anthropic API accounts in public beta. No separate signup, no premium tier gate. You need the managed-agents-2026-04-01 beta header in your API requests — the Claude SDK adds this automatically.

    For high-volume agent applications, Anthropic’s enterprise sales team negotiates custom pricing arrangements. Contact them at [email protected] or through the Claude Console.

    The Pricing Signals Worth Noting

    Anthropic recently ended Claude subscription access (Pro/Max) for third-party agent frameworks, requiring those users to switch to pay-as-you-go API pricing. This signals a deliberate strategy: consumer subscriptions are for human-paced interactions; agent workloads route through the API. The $0.08/session-hour rate exists in that context — it’s infrastructure pricing for compute that runs beyond human attention spans.

    The session-hour model also signals something about Anthropic’s infrastructure cost structure. They’re pricing on active execution time because that’s what actually taxes their systems. Idle sessions don’t cost them much; active agents do. The billing model follows the actual resource consumption pattern.

    Frequently Asked Questions

    Is the $0.08/session-hour charge in addition to token costs, or does it replace them?

    In addition to. You pay both: standard token rates for all input and output tokens, plus $0.08 per hour of active session runtime. They’re separate line items.

    Does prompt caching work in Managed Agents sessions?

    Yes. Prompt caching multipliers apply identically to Managed Agents sessions as they do to standard API calls. If your agent has a large, stable system prompt, caching it can significantly reduce input token costs.

    What happens if my session crashes? Am I billed for the crashed time?

    Runtime accrues only while status is running. Terminated sessions stop accruing. Anthropic’s infrastructure handles checkpointing and crash recovery — the session state is preserved even if the session terminates unexpectedly.

    Can I use Managed Agents on the free API tier?

    Managed Agents is available to all Anthropic API accounts in public beta, but standard tier access and rate limits apply. Free API tier users receive a small credit for testing.

    How does this compare to running agents on my own infrastructure?

    See our full breakdown: Build vs. Buy: The Real Infrastructure Cost of Claude Managed Agents. Short version: the $0.08/hour is almost certainly cheaper than provisioning and maintaining equivalent compute, but you trade control and data locality for that simplicity.

    Are there volume discounts?

    Volume discounts are available for high-volume users but negotiated case-by-case. Contact Anthropic enterprise sales.

    Does web search billing count against the $10/1,000 rate if the search returns no results?

    Anthropic’s current docs don’t explicitly address failed searches. Treat any triggered search as billable until confirmed otherwise.

    For the full session-hour math worked out by workload type, see: Claude Managed Agents Pricing, Decoded: What a Session-Hour Actually Costs You. For the build-vs-buy infrastructure comparison: Build vs. Buy: The Real Infrastructure Cost. For enterprise deployment patterns: Rakuten Stood Up 5 Enterprise Agents in a Week.