Tag: Claude Code

  • Claude MCP Token Cost Reality: Why Your Model Context Protocol Setup Is Burning 18,000 Tokens Per Turn

    Claude MCP Token Cost Reality: Why Your Model Context Protocol Setup Is Burning 18,000 Tokens Per Turn

    Last refreshed: May 15, 2026

    If you’ve ever connected a few Model Context Protocol (MCP) servers to Claude Code and watched your usage limit drain faster than the work you actually did would explain, you’re not imagining it. There’s a real, documented, and sometimes substantial token cost to wiring MCP servers into your Claude environment — and most setup guides don’t mention it.

    The short version: each MCP server you connect injects its complete tool schema into the context of every message you send. Multiple servers stack. The total overhead can range from a few thousand tokens for a single server up to roughly 18,000 tokens per turn when you’re running a typical multi-server developer setup. Anthropic’s own engineering team has acknowledged this in a public GitHub issue and shipped optimizations to reduce it.

    This article walks through where the overhead actually comes from, how to measure your own setup, what Anthropic has changed in 2026 to ease the cost, and the concrete steps you can take to keep MCP useful without burning through your token budget.

    What MCP actually is, briefly

    The Model Context Protocol is an open standard created by Anthropic that lets Claude (and other LLMs that adopt the standard) connect to external tools and data sources through a common interface. Instead of writing a custom integration for every API or database you want Claude to access, you point Claude at an MCP server, and the server exposes its capabilities — file access, Slack messages, GitHub repos, database queries — in a format Claude can use.

    It’s a real productivity unlock. It’s also why the token math gets complicated.

    Where the token cost comes from

    When you connect an MCP server to Claude Code (or any MCP-aware client), three things happen on every message:

    1. Tool schema injection. Every tool the server exposes — every name, every description, every parameter definition — is included in the context Claude sees. A Slack MCP server with 10–15 tools typically adds about 2,000 tokens. A GitHub server is heavier. A custom internal-tooling server with verbose descriptions can run 5,000–8,000 tokens on its own.

    2. Tool-use system prompt overhead. Anthropic’s documentation confirms that whenever tools are present in a request, a special system prompt is automatically prepended that teaches the model how to use tools. For Claude 4.x models with tool_choice: auto, that’s an additional 346 tokens per request. The bash tool adds 245. The text editor tool adds 700. The computer-use tool adds 735 plus a 466–499 token system prompt extension.

    3. Stateless re-sending. Each message in a conversation is a fresh API request that includes the full conversation history plus the full tool schema. Claude does not “remember” your tools from the last turn the way a human remembers a colleague’s job description. Every turn pays the schema cost again.

    That’s the math. Now multiply by the number of MCP servers you have connected. A developer running Slack + GitHub + a database connector + an internal custom server can easily land in the 15,000–20,000 tokens-per-turn range — and that’s before you’ve typed your actual question.

    The 18,000-token figure, sourced

    The “up to 18,000 tokens per turn” number comes from a combination of public sources verified May 15, 2026:

    • Anthropic’s own GitHub repo for Claude Code, issue #3406, titled “Built-in tools + MCP descriptions load on first message causing 10–20k token overhead.” Anthropic engineers acknowledged the issue and have shipped progressive optimizations against it.
    • Independent analysis by MindStudio measuring real Claude Code sessions with multiple MCP servers attached.
    • Anthropic’s official Claude Code documentation on cost management explicitly recommends running /mcp to inspect connected servers and disabling unused ones to control token consumption.

    The exact number for your setup will be different. The shape of the problem is the same.

    Why this matters more than it looks

    Claude’s standard context window is 200,000 tokens. Losing 18,000 of those to tool definitions before you start typing represents about 9% of your effective working space. That’s a real ceiling cost — but it’s not the part that hurts most.

    The part that hurts is the cumulative bill. If you’re on a Claude subscription with a usage limit, every turn through Claude Code is paying the full schema cost again. A workflow that takes 30 turns of back-and-forth burns 540,000 tokens worth of tool definitions across that session — even if the tool descriptions never change. On the API at standard Sonnet 4.6 rates, that’s about $1.62 in pure schema overhead per session, before any of the actual work gets billed.

    Multiply by a team of engineers running Claude Code daily, and the overhead becomes the largest single line item in your token spend.

    What Anthropic has changed in 2026

    Anthropic has shipped two meaningful optimizations against MCP token bloat over the past few months:

    Deferred tool loading. In recent Claude Code releases, MCP tool definitions are no longer all loaded into context at the start of a session by default. Tool names enter context, but the full schemas only load when Claude actually invokes a particular tool. This is a substantial improvement for sessions where you have many tools available but only use a few.

    Tool Search. A new built-in search mechanism lets Claude discover relevant MCP tools on demand rather than carrying them all in context. One independent measurement reported a Claude Code MCP context cut of 46.9% — from roughly 51,000 tokens down to 8,500 tokens — by using Tool Search instead of full upfront loading.

    These optimizations help, but they don’t make the overhead zero. The baseline cost of having any MCP server connected at all is real, and you still pay it on every turn even with deferral active.

    How to measure your own MCP token cost

    Two practical methods work for most setups:

    Method 1 — The /mcp command. In Claude Code, run /mcp to see every server currently connected. For each one, check how many tools it exposes. Anthropic’s documentation explicitly recommends this as the first step to controlling MCP costs.

    Method 2 — Token-count delta. Send a single message in Claude Code with no MCP servers connected and note the input token count from the API response. Reconnect your MCP servers one at a time. The delta in input tokens between configurations is the per-turn cost of each server. This is the most precise way to know your own number.

    Anything north of about 8,000 tokens per turn in pure MCP overhead is worth optimizing. North of 15,000 is a flag.

    Concrete steps to control MCP token cost

    • Disable MCP servers you aren’t actively using. The single highest-leverage move. If you connected a server two weeks ago for one experiment and never went back to it, every turn you’ve taken since has been paying for it.
    • Prefer CLI tools over MCP servers when both exist. Anthropic’s own cost-management guidance notes that tools like gh, aws, gcloud, and sentry-cli remain more context-efficient than equivalent MCP servers because they don’t add per-tool listing overhead. Claude can simply invoke them via the bash tool.
    • Use MCP gateways for large server counts. If you genuinely need many tools available, gateway products (Maxim, Milvus-backed setups, others) consolidate tools and surface only relevant ones per query, cutting net overhead substantially.
    • Run a complex CLAUDE.md audit. Long project-level CLAUDE.md files compound the per-turn baseline. Treat CLAUDE.md as an asset that’s expensive to keep verbose.
    • Watch for context compounding. In long Claude Code sessions, conversation history grows alongside the tool schema cost. If you’re running a workflow longer than 20 turns, periodically clear context (/clear) to reset the per-turn cost to baseline.

    Frequently Asked Questions

    Does every MCP server cost 18,000 tokens?

    No. The 18,000-token figure is for a typical multi-server setup with several connected servers and built-in tools active. A single small MCP server (5–10 tools, concise descriptions) might only add 1,500–3,000 tokens. The cost scales with the number of servers and the verbosity of their tool definitions.

    Why does Claude reload the tool definitions every turn?

    The Claude API is stateless. Every message is a fresh API request containing the full conversation history and the full tool schema. The model has no memory between requests, so the schema must be present every time tools could be used. Recent deferred-loading optimizations reduce this for unused tools, but anything Claude actually needs still loads each turn.

    How do I see what’s loaded in my Claude Code environment?

    Run /mcp in Claude Code to list every connected MCP server and its tool count. To check the actual token cost, send a test message and inspect the input token count returned by the API.

    Are CLI tools really cheaper than MCP servers?

    Yes, for tools that have both options. CLI tools accessed via the bash tool only add the bash tool’s 245-token overhead. An equivalent MCP server adds its full tool schema for every tool it exposes. For tools you use frequently, MCP can still be worth it for the structured interface; for tools you use rarely, CLI is more efficient.

    Does this affect Claude on the web (claude.ai) too?

    Web Claude does not use the same MCP server-connection model as Claude Code. The MCP token-overhead pattern primarily affects Claude Code, custom Agent SDK applications, and other developer-facing clients where you wire in MCP servers directly.

    Will this get better in future Claude releases?

    Likely. Anthropic has already shipped deferred tool loading and Tool Search in 2026, both of which materially reduce the per-turn overhead for unused tools. The architectural baseline (tools must be present in context to be invoked) is unlikely to change, but the practical cost should keep dropping as the deferred-loading optimizations mature.

    Related Reading

    How we sourced this

    Sources reviewed May 15, 2026:

    • Anthropic GitHub: anthropics/claude-code issue #3406, “Built-in tools + MCP descriptions load on first message causing 10-20k token overhead” (primary source for the overhead figure and Anthropic acknowledgment)
    • Anthropic Claude Code documentation: Connect Claude Code to tools via MCP and Manage costs effectively (primary source for /mcp command and CLI vs. MCP guidance)
    • Anthropic Pricing Documentation: tool-use system prompt token counts, bash/text-editor/computer-use overheads (primary source for the per-tool fixed costs)
    • Independent analysis: MindStudio (multiple Claude Code MCP measurements), Joe Njenga’s Tool Search 51K→8.5K measurement, Maxim and Scott Spence on optimization patterns (Tier 2 confirming sources)

    Token-cost numbers in this article are accurate as of May 15, 2026. Anthropic is shipping MCP optimizations regularly, so the practical overhead may be lower in your environment than what’s described here.

  • Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026 (And Why It Matters)

    Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026 (And Why It Matters)

    Last refreshed: June 9, 2026

    If you’ve been running Claude Code’s claude -p command in production, kicking off background jobs through the Claude Agent SDK, or wiring the Agent SDK into a third-party app, the way you pay for that work is about to change.

    Starting June 15, 2026, Anthropic is splitting Claude subscription billing into two separate buckets: one for the things you do interactively (Claude.ai chat, Claude Code in your terminal, Claude Cowork), and a brand-new credit pool that only covers programmatic, autonomous, and SDK-driven work.

    This is a meaningful shift. It’s also one of the most under-explained changes Anthropic has made to subscription pricing this year. If you don’t know about it after June 15, 2026 (now in effect), you can find yourself with stopped automations, surprise overage charges, or both.

    This guide walks through exactly what’s changing, what the credits cover, what they don’t cover, what each plan gets, and how to plan for it — this change is now live.

    Agent SDK Monthly Credit by Plan (June 2026)

    Plan Monthly Price Agent SDK Credit/Month Covers
    Pro $20/month $20 claude -p, SDK jobs, GitHub Actions
    Max 5x $100/month $100 claude -p, SDK jobs, GitHub Actions
    Max 20x $200/month $200 claude -p, SDK jobs, GitHub Actions
    Team Standard $25/seat/mo (annual) $20/seat claude -p, SDK jobs, GitHub Actions
    Team Premium $100/seat/mo (annual) $100/seat claude -p, SDK jobs, GitHub Actions
    Enterprise (usage-based) Custom $20/month SDK-driven work only
    Enterprise (seat-based Premium) Custom $200/seat SDK-driven work only

    The short version

    Claude subscription plans (Pro, Max, Team, Enterprise) currently have one shared usage limit. Whether you’re chatting with Claude on the web, using Claude Code in your terminal, or running unattended jobs through the Agent SDK, all of that draws from the same plan-level allowance.

    On June 15, 2026, Anthropic is separating those two modes of use:

    • Bucket 1 — Interactive use: Claude.ai chat, Claude Code in the terminal/IDE, Claude Cowork. Uses your existing subscription usage limits, exactly as before.
    • Bucket 2 — Agent SDK monthly credit: A separate, dollar-denominated credit pool. Funds the Claude Agent SDK, the claude -p non-interactive command, the Claude Code GitHub Actions integration, and any third-party app that authenticates via the Agent SDK.

    The two buckets do not commingle. Agent SDK work cannot draw from your interactive subscription limit, and interactive use cannot draw from your Agent SDK credit. If you exhaust your Agent SDK credit and don’t have extra usage enabled, your background jobs simply stop until the credit refreshes the following month.

    What each plan gets

    Here is the official monthly Agent SDK credit by plan, as published in Anthropic’s Help Center (verified June 9, 2026):

    • Pro: $20/month
    • Max 5x: $100/month
    • Max 20x: $200/month
    • Team — Standard seats: $20/month per seat
    • Team — Premium seats: $100/month per seat
    • Enterprise — usage-based: $20/month
    • Enterprise — seat-based Premium seats: $200/month

    Important detail buried in the announcement: Enterprise seat-based plans on Standard seats are not eligible to claim the Agent SDK credit at all. If you administer one of those plans and have engineers running automation, that’s a gap to plan around.

    What the credit covers (and what it doesn’t)

    Anthropic’s documentation is specific about what counts as Agent SDK use, so this is worth reading carefully.

    Covered by the credit:

    • Claude Agent SDK usage in your own Python or TypeScript projects
    • The claude -p command in Claude Code (non-interactive mode)
    • The Claude Code GitHub Actions integration
    • Third-party apps that authenticate with your Claude subscription through the Agent SDK

    Not covered (these still draw from your normal subscription limits):

    • Interactive Claude Code in your terminal or IDE
    • Claude conversations on web, desktop, or mobile
    • Claude Cowork
    • Other features that draw from extra usage

    The plain-English version: if a human is sitting at the keyboard waiting for the response, that’s interactive use. If a script kicks off the work and the result lands somewhere else later, that’s Agent SDK use.

    How the credit actually works in practice

    Five mechanics matter for budgeting:

    1. Per-user, never pooled. Each eligible user on a Team or Enterprise plan claims their own credit. There is no organization-level pool. Credits cannot be transferred between users, shared, or stockpiled across accounts.

    2. Refreshes monthly with the billing cycle. Whatever you don’t spend in a given month evaporates. Unused credits do not roll over.

    3. One-time opt-in. You claim your credit through your Claude account once. After that initial claim, it refreshes automatically each cycle.

    4. Drains first, before any other source. When an Agent SDK request fires, it pulls from your monthly credit before any other paid usage source kicks in. This is good — it means you actually use what you’ve already paid for.

    5. After the credit, requests either flow to extra usage or stop entirely. When your monthly credit hits zero, additional Agent SDK requests draw from extra usage at standard API rates — but only if you have extra usage enabled. If you haven’t enabled extra usage, your Agent SDK requests stop until the next refresh.

    That last point is the one most likely to bite teams. If you’re running a daily cron job through the Agent SDK and you don’t enable extra usage, the day your credit runs out is the day your automation goes silent — without obvious warning if you’re not watching the credit balance.

    Why Anthropic is doing this

    Anthropic frames this as separating individual experimentation from production automation. From the Help Center documentation: “The Agent SDK monthly credit is sized for individual experimentation and automation. Teams running shared production automation should use the Claude Developer Platform with an API key for predictable pay-as-you-go billing.”

    The translation: a single user’s $20 or $200 of Agent SDK credit was never going to cover a real production workload anyway. Anthropic is making explicit what was already true under the hood — that a subscription was a chat product, and serious unattended automation belongs on the API.

    What this also does, structurally, is protect interactive subscription users from getting their experience degraded by heavy autonomous workloads sharing the same pool. If you’ve ever hit a subscription rate limit during a normal chat session because something else on your account was burning tokens in the background, this change removes that failure mode.

    What you should do after June 15, 2026 (now in effect), 2026

    If you run any unattended Claude work (the most important group):

    Audit every place your subscription is being used by something other than a human at a keyboard. The big four to check:

    • claude -p commands in cron jobs, CI pipelines, or shell scripts
    • Claude Code GitHub Actions workflows
    • Custom Python or TypeScript projects using the Agent SDK
    • Any third-party tool that asks for “Sign in with Claude” — those go through the Agent SDK

    For each one, estimate dollar consumption per day at standard API rates. If the total approaches or exceeds your plan’s Agent SDK monthly credit, you have three options: enable extra usage to allow overage, move that workload to a Claude Developer Platform API key (more predictable for sustained loads), or downsize the workload itself.

    If you administer a Team or Enterprise plan:

    Eligible users on your team will receive an email with claim instructions after June 15, 2026 (now in effect), 2026. You don’t need to take action yourself, but it’s worth communicating internally that the credits are per-user, can’t be pooled, and that any team-wide automation should be on an API key, not on a subscription seat.

    If you’re a solo Pro or Max user who only chats with Claude:

    You probably don’t need to do anything. The split affects you only if you’re running scripts or background jobs. If you’ve never used claude -p or the Agent SDK directly, your interactive usage limits don’t change.

    Frequently Asked Questions

    What happens to my Agent SDK usage on June 14 vs. June 15, 2026?

    Before June 15, Agent SDK and claude -p usage counts against your subscription’s general usage limits. Starting June 15, that same usage no longer touches your subscription limits and instead draws from the new Agent SDK monthly credit pool. Your interactive Claude Code, web chat, and Cowork usage continues to work exactly as before.

    Can I share the Agent SDK credit across my team?

    No. Per Anthropic’s official documentation, “Credits are per-user. Each eligible user on your team claims their own credit. Credits can’t be pooled, transferred, or shared across the organization.” If your team needs shared automation budget, the Claude Developer Platform with an API key is the recommended path.

    Do unused Agent SDK credits roll over?

    No. Unused credits expire at the end of each billing cycle and do not carry into the next month.

    What happens if I run out of Agent SDK credit mid-month?

    If you have extra usage enabled, additional requests flow to extra usage at standard API rates (the same per-token prices listed in Anthropic’s pricing documentation). If extra usage is not enabled, your Agent SDK requests stop until your credit refreshes at the start of the next billing cycle.

    Does this affect Claude API customers using their own API key?

    No. If you authenticate with the Agent SDK using a Claude Developer Platform API key, nothing changes. Pay-as-you-go billing continues, and you do not receive an Agent SDK monthly credit. The credit only applies to subscription-authenticated Agent SDK use.

    Is interactive Claude Code in my terminal still covered by my subscription?

    Yes. Interactive Claude Code (typing commands and getting responses in your terminal or IDE) continues to draw from your subscription usage limits exactly as before. Only the non-interactive claude -p mode and direct Agent SDK calls move to the new credit pool.

    What’s the dollar value of the credit on each plan?

    As of May 15, 2026: Pro $20, Max 5x $100, Max 20x $200, Team Standard $20/seat, Team Premium $100/seat, Enterprise usage-based $20, Enterprise seat-based Premium $200. Enterprise seat-based Standard seats do not receive a credit.

    Related Reading

    How we sourced this

    Every factual claim in this article was triple-checked across the following sources, all reviewed on May 15, 2026:

    • Anthropic Help Center: Use the Claude Agent SDK with your Claude plan (primary source for credit amounts, eligibility, and mechanics)
    • Anthropic Pricing Documentation: docs.claude.com/en/docs/about-claude/pricing (primary source for standard API rates and tool-use pricing)
    • Independent press coverage from The New Stack, The Decoder, and InfoWorld confirming the announcement and its scope

    If you spot a number that’s drifted out of sync with Anthropic’s current published rates, treat the official documentation as authoritative. The pricing surface around Claude is moving quickly in 2026, and we date-stamp specifics so readers know which facts to re-verify.


    Frequently Asked Questions

    What is the Claude Agent SDK dual-bucket billing change?

    Starting June 15, 2026, Anthropic split Claude subscription billing into two buckets. Bucket 1 covers interactive use (claude.ai chat, Claude Code in terminal, Cowork). Bucket 2 is a separate monthly credit pool that covers only programmatic/autonomous work via the Claude Agent SDK, the claude -p command, and GitHub Actions integration.

    What happens if I run out of Agent SDK credit?

    If you exhaust your Agent SDK monthly credit and don’t have extra usage enabled, your background jobs and SDK-driven automations simply stop until the credit refreshes the following month. Interactive Claude use (chat, Claude Code in terminal) is unaffected — it draws from a separate bucket.

    How much Agent SDK credit does each Claude plan include?

    Pro: $20/month. Max 5x: $100/month. Max 20x: $200/month. Team Standard: $20/seat/month. Team Premium: $100/seat/month. Enterprise seat-based Premium: $200/seat/month. The credit is dollar-denominated and depletes at standard API token rates for whichever model your SDK jobs use.

    Does the Agent SDK credit apply to Claude Code in the terminal?

    No. Claude Code used interactively in your terminal or IDE draws from Bucket 1 (your subscription usage limit), not from the Agent SDK credit pool. Only non-interactive, programmatic use via the Agent SDK and claude -p command draws from the Agent SDK credit bucket.

    Can I add more Agent SDK credit if I run out?

    Yes. You can enable extra usage on Pro, Max 5x, Max 20x, and Team plans. Once enabled, SDK jobs that exceed your monthly credit continue at standard API rates with a spending cap you set, rather than stopping entirely.

    Which Claude plans don’t get Agent SDK credit?

    The Free plan receives no Agent SDK credit. Free tier users cannot run programmatic SDK workloads at all — that requires at minimum a Pro subscription at $20/month.

  • Claude Code Pricing in May 2026: What $20, $100, and $200 a Month Actually Buy You

    Claude Code Pricing in May 2026: What $20, $100, and $200 a Month Actually Buy You

    Last refreshed: May 15, 2026

    Claude Code pricing has stopped being a clean sticker number and started being a question of which ceiling you hit first. There is a $20 plan, a $100 plan, and a $200 plan — and underneath all three sits a 5-hour rolling window, a weekly active-hours cap added in August 2025, and a per-model multiplier that quietly makes Opus 4.7 the most expensive thing you can do inside the terminal. If you came looking for the right plan, the honest answer is: it depends on whether you are mostly a Sonnet operator or you live in Opus.

    The three subscription tiers, stripped down

    Pro — $20/month. Access to Claude Code in the terminal, web, and desktop, with both Sonnet 4.6 and Opus 4.7 available. The practical envelope is about 44,000 tokens per 5-hour window and roughly 40–80 weekly active hours on Sonnet, depending on session concurrency. This is the plan for someone running Claude Code a few hours a day on focused work — refactors, scoped feature builds, debugging passes — not someone leaving an agent running while they eat lunch.

    Max 5x — $100/month. Five times the Pro envelope, plus priority during peak demand. The window allocation lands around 88,000 tokens per 5-hour block. This is the tier where you stop thinking about token budgets during a single working day and start thinking about them across a whole week. Picked correctly, it is the cheapest way to use Claude Code as your primary IDE companion without flipping over to API billing.

    Max 20x — $200/month. Twenty times Pro — about 220,000 tokens per window — which translates to roughly 480 Sonnet-hours or about 40 Opus-hours per week before the weekly cap kicks in. Real-world reports from early 2026 had $200/month users watching single Opus prompts eat 10–20% of their daily allocation; Anthropic publicly acknowledged the problem, expanded capacity, and doubled the 5-hour rate limit for Pro and Max accounts. If you are running Claude Code across multiple repos all week and reaching for Opus on the hard problems, this is the tier that stops you from staring at a rate-limit wall.

    The API, as a sanity check

    If you want a sanity check on whether the subscription math works, price the same workload against the API:

    • Claude Haiku 4.5 (claude-haiku-4-5-20251001): $1.00 input / $5.00 output per million tokens
    • Claude Sonnet 4.6 (claude-sonnet-4-6): $3.00 input / $15.00 output per million tokens
    • Claude Opus 4.7 (claude-opus-4-7): $5.00 input / $25.00 output per million tokens

    Prompt caching is the lever almost nobody uses correctly. Cache writes cost 1.25x input price for the 5-minute TTL or 2.0x for the 1-hour TTL, but cache reads cost 0.10x — a 90% discount on every subsequent request that hits the same context. If your .clauderules file, project map, and the file you are editing are all stable for an hour, the bill on a long pairing session can drop by an order of magnitude. The Batch API knocks another 50% off both directions for asynchronous workloads, which is worth knowing if you are running large refactor sweeps.

    One trap on Opus 4.7 specifically: the model uses a new tokenizer that inflates token counts by up to 35% on identical text compared to Opus 4.6. The headline price did not change, but your effective spend per request did — sometimes by nothing, sometimes by a third, depending on the content. If you migrated from Opus 4.6 and your bill went up without your prompt patterns changing, that is the reason.

    How to actually choose

    The cleanest way to pick a plan is to first decide your model mix, then your weekly hours.

    If you are mostly a Sonnet operator — long agentic runs, multi-file edits, codebase Q&A, with Opus only reached for on the architectural questions — Pro at $20 is plausible up to about 5–8 hours of focused use per day, Max 5x covers most full-time individual developers, and Max 20x is overkill unless you are running multiple sessions in parallel.

    If you live in Opus — long-horizon agentic work, hard refactors across many files, anything where you would rather have one good attempt than three Sonnet retries — Pro will frustrate you within two weeks, Max 5x is the realistic floor, and Max 20x is the only tier that gives you a defensible Opus envelope without bouncing over to API billing.

    And if you are running Claude Code across multiple repos all week, leaving agents to grind on tasks while you do other things, Max 20x is the only subscription that holds up — and even then, the weekly cap is real. Use the API for the spillover and you will still come out cheaper than trying to brute-force a smaller plan.

    The number that matters

    One developer’s public report this year: roughly 10 billion tokens consumed across Claude Code over eight months. API metered cost would have exceeded $15,000. The same workload on Max at $100/month for the same window came in around $800 — about 93% cheaper. That is the gap that makes the subscription model worth taking seriously, even when the rate limits feel arbitrary. The $200 tier is not a vanity number; it is the price Anthropic charges to stop being a meaningful constraint on your workflow.

    The right way to read Claude Code pricing in May 2026 is not to ask which plan is cheapest. It is to ask which plan is the cheapest one that disappears — the one that stops appearing in your day. For most full-time developers reaching for Opus regularly, that plan is Max 20x. For everyone else, Max 5x is the first plan that actually gets out of your way.

  • Claude MCP in 2026: What Actually Changed and How to Configure It Without Wasting Tokens

    Claude MCP in 2026: What Actually Changed and How to Configure It Without Wasting Tokens

    Last refreshed: May 15, 2026

    If you set up Claude MCP six months ago and have not touched the config since, three things have changed underneath you: the recommended transport, how tools are loaded into context, and how teams share server configs. None of these are cosmetic. If you ignore them, you are leaving tokens, money, and stability on the table.

    This is the working Claude MCP setup I use in May 2026 — what the claude mcp add command actually does, which scope to pick, what the deprecation of SSE means in practice, and where Claude Code still falls short.

    The three-scope mental model

    Every MCP server you wire into Claude Code lives at exactly one of three scopes. Get this wrong and you will either leak credentials into git or wonder why your teammate cannot use the same database the AI just queried.

    • Local (default): the server is available only to you, only inside the current project. Config is written into your project’s entry inside ~/.claude.json. Good for project-specific servers like a dev database or a Sentry project key you do not want other repos to inherit.
    • User: the server is available to you across every project on your machine. Also stored in ~/.claude.json. This is where GitHub, search providers, and personal productivity servers belong.
    • Project: the server is written to a .mcp.json file at the repo root and shared with the whole team via git. Claude Code prompts for approval the first time a teammate opens the project — by design, because anyone who can push to the repo can wire a new server into your environment.

    When the same server is defined in more than one scope, Claude Code resolves it in this order: local beats project beats user beats plugin-provided. This is the part that bites people the most. If you have a “github” entry at user scope and someone adds a different “github” entry at project scope in .mcp.json, the project definition wins for that repo. Run claude mcp list when something behaves strangely.

    The commands you actually need

    The CLI is more useful than the docs make it look. Three commands cover ~90% of real setup work:

    # Add a remote HTTP MCP server at user scope (available everywhere)
    claude mcp add --transport http hubspot --scope user https://mcp.hubspot.com/anthropic
    
    # Add a local stdio server scoped only to this project
    claude mcp add my-db -s local -- node ./scripts/db-mcp.js
    
    # Share a server with your team via the repo's .mcp.json
    claude mcp add my-server -s project -- node server.js

    The short flag is -s, the long is --scope. The -- separator is required for stdio servers because everything after it is treated as the literal command to spawn. Forget it and Claude Code will try to interpret your Node arguments as its own flags.

    SSE is dead. Use Streamable HTTP.

    If your MCP server documentation still tells you to use the sse transport, the documentation is stale. The MCP spec dated 2025-03-26 introduced Streamable HTTP and simultaneously deprecated HTTP+SSE. Through 2026, vendor after vendor has set hard cutoff dates — Atlassian’s Rovo MCP server keeps SSE around until June 30, 2026 and then drops it; Keboola pulled SSE on April 1; Cumulocity’s AI Agent Manager flipped to Streamable HTTP on May 8.

    Why this matters beyond a name change: SSE required Claude Code to hold a persistent connection to a single server replica, which broke horizontal scaling and made every transient network blip a reconnection drama. Streamable HTTP is stateless. Multiple replicas behind a load balancer just work. If you have flaky MCP connections in production, the first thing to check is whether the server is still on SSE.

    For new setups, use --transport http. The older --transport sse still functions but is on the deprecation path.

    Tool Search is the feature you should actually care about

    The single biggest change in how Claude Code uses MCP in 2026 is lazy tool loading via Tool Search. Older MCP clients dumped every tool schema from every connected server into the model’s context window at the start of every conversation. With ten servers wired up that could easily be 20,000+ tokens of overhead before you typed a single character.

    Tool Search inverts this. Claude Code keeps only the server names and short descriptions resident. When a tool is actually needed, it fetches that tool’s full schema on demand. Anthropic’s own documentation says this reduces tool-definition context usage by roughly 95% versus eager-loading clients. In practice that means you can run a serious MCP fleet — GitHub, Sentry, a database, a search provider, your internal API — without quietly burning through your context budget. The Sonnet 4.6 and Opus 4.7 1M-token context window does not save you here, because anything you let crowd the prompt is also being re-read on every turn.

    Companion feature: list_changed notifications. An MCP server can now tell Claude Code “my tool list changed” and Claude Code refreshes capabilities without a disconnect-reconnect dance. If you build your own server, emit this when you swap tool definitions and you save users a restart.

    What it still gets wrong

    Honest take: claude mcp list still does not surface scope information for every entry in a useful way — there is an open issue on the anthropics/claude-code repo asking for it (#8288 if you want to track). Project-scoped servers from .mcp.json have a separate history of not appearing in the list output (#5963) depending on how you opened the project. If you cannot find a server, check both ~/.claude.json and ./.mcp.json directly.

    The other rough edge is the project-approval prompt. The first time you open a repo with a new .mcp.json, Claude Code asks you to approve each project-scoped server. That is the right security default. It is also infuriating in CI or any non-interactive shell, where the prompt blocks the session. The current workaround is to bake the servers in at user scope on build agents so the project-scope approval never fires in CI. A cleaner non-interactive approval flow is the single most-requested fix I see in real teams.

    The setup I would run on a new machine today

    User-scope: GitHub, a code search server, and a single notes/Notion server. Project-scope in each repo’s .mcp.json: whatever database the project owns and whatever observability backend it reports to. Local-scope: anything experimental I am evaluating but do not want my team or my other repos to inherit.

    Pin --transport http on everything remote. Skip Desktop Extensions (.dxt) for anything you want versioned with the codebase — they are a Claude Desktop convenience, not a Claude Code primitive, and they hide the config from your team. Run claude mcp list when something is off and read .mcp.json directly when list is unhelpful.

    That is the whole working model. The pieces that matter — three scopes, Streamable HTTP, Tool Search — fit on a single screen. The pieces that have not caught up yet — list output, non-interactive approvals — are visible in the issue tracker and will move.

  • Claude Code Hooks: The Workflow Control Layer That Actually Enforces Your Rules

    Claude Code Hooks: The Workflow Control Layer That Actually Enforces Your Rules

    Last refreshed: May 15, 2026

    You’ve been there. You add a rule to CLAUDE.md — “always run prettier after editing files” — and Claude follows it, most of the time. Then it doesn’t. The formatter doesn’t run, the lint check gets skipped, and you’re back to reviewing diffs manually.

    Hooks fix this. Claude Code hooks are shell commands, HTTP endpoints, or LLM prompts that fire deterministically at specific points in Claude’s agentic loop. Unlike CLAUDE.md instructions, which are advisory, hooks are enforced at the execution layer — Claude cannot skip them.

    As of early 2026, Claude Code ships with 21 lifecycle events across four hook types. This article covers the two that matter most for daily workflow: PreToolUse and PostToolUse.

    How Hooks Work Architecturally

    Claude Code’s agent loop is a continuous cycle: receive input → plan → execute tools → observe results → repeat. Hooks intercept this loop at named checkpoints.

    Every hook is defined in .claude/settings.json under a hooks key. A hook entry has three parts: the lifecycle event name, an optional matcher (a regex against tool names), and the handler definition — either a shell command, an HTTP endpoint, or an LLM prompt.

    {
      "hooks": {
        "PostToolUse": [
          {
            "matcher": "Write|Edit",
            "hooks": [
              {
                "type": "command",
                "command": "npx prettier --write "$CLAUDE_TOOL_INPUT_FILE_PATH""
              }
            ]
          }
        ]
      }
    }

    That’s it. Every file Claude writes or edits now auto-formats. No CLAUDE.md reminders, no hoping Claude remembers — the formatter runs on every single Write or Edit tool call, period.

    PreToolUse: Enforce Before Claude Acts

    PreToolUse fires before Claude executes any tool. Your hook receives the full tool call — name, inputs, arguments — and can return one of three signals:

    • Exit 0 → allow the tool call to proceed
    • Exit 2 → block the tool call; Claude receives your error message and adjusts
    • Exit 1 → hook error; Claude proceeds but logs the failure

    This makes PreToolUse the right place for guardrails. Here’s a real example: blocking npm in a bun project.

    #!/bin/bash
    # .claude/hooks/check-package-manager.sh
    # Blocks npm commands in projects that use bun
    
    if echo "$CLAUDE_TOOL_INPUT_COMMAND" | grep -qE "^npm "; then
      echo "Error: This project uses bun, not npm. Use: bun install / bun run / bun add" >&2
      exit 2
    fi
    exit 0

    Wire it in settings.json:

    {
      "hooks": {
        "PreToolUse": [
          {
            "matcher": "Bash",
            "hooks": [
              {
                "type": "command",
                "command": ".claude/hooks/check-package-manager.sh"
              }
            ]
          }
        ]
      }
    }

    Now when Claude tries npm install, the hook exits 2, Claude sees the error message, and it switches to bun install without you intervening. The correction happens in the same turn.

    Another production pattern: blocking writes to protected paths.

    #!/bin/bash
    # Prevent Claude from modifying migration files already run in production
    if echo "$CLAUDE_TOOL_INPUT_FILE_PATH" | grep -qE "db/migrations/"; then
      echo "Error: Migration files are immutable after deployment. Create a new migration instead." >&2
      exit 2
    fi
    exit 0

    PostToolUse: React After Claude Acts

    PostToolUse fires after a tool completes successfully. It can’t block execution, but it can provide feedback — and it can run any side-effect you need automatically.

    Auto-format every edit:

    {
      "hooks": {
        "PostToolUse": [
          {
            "matcher": "Write|Edit",
            "hooks": [
              {
                "type": "command",
                "command": "npx prettier --write "$CLAUDE_TOOL_INPUT_FILE_PATH" 2>/dev/null || true"
              }
            ]
          }
        ]
      }
    }

    Run tests after code changes:

    #!/bin/bash
    # Run affected tests after any source file edit
    FILE="$CLAUDE_TOOL_INPUT_FILE_PATH"
    if echo "$FILE" | grep -qE "\.(ts|js|py)$"; then
      if [ -f "package.json" ]; then
        npx jest --testPathPattern="$(basename ${FILE%.*})" --passWithNoTests 2>&1 | tail -5
      fi
    fi

    Desktop notification on task completion:

    {
      "hooks": {
        "Stop": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "osascript -e 'display notification "Claude finished" with title "Claude Code"'"
              }
            ]
          }
        ]
      }
    }

    Environment Variables Available to Hooks

    Claude Code exposes context about the triggering tool call through environment variables. The ones you’ll use most:

    Variable Value
    $CLAUDE_TOOL_NAME Name of the tool being called (e.g., Edit, Bash, Write)
    $CLAUDE_TOOL_INPUT_FILE_PATH File path for Edit, Write, Read calls
    $CLAUDE_TOOL_INPUT_COMMAND Shell command for Bash calls
    $CLAUDE_SESSION_ID Current session ID — useful for audit logging
    $CLAUDE_TOOL_RESULT_OUTPUT Output of the tool (PostToolUse only)

    These are injected by Claude Code before your hook runs. You don’t configure them — they’re always there.

    The Model Question: Which Claude Runs Agentic Tasks?

    One practical consideration for hook-heavy workflows: the default model affects how well Claude responds to hook feedback. As of May 2026:

    • claude-opus-4-7 ($5/MTok input, $25/MTok output) — highest agentic coding capability; best at interpreting hook rejection messages and self-correcting without re-asking
    • claude-sonnet-4-6 ($3/MTok input, $15/MTok output) — strong balance of speed and reasoning; handles most hook-corrected flows well
    • claude-haiku-4-5-20251001 ($1/MTok input, $5/MTok output) — fastest; may require more explicit hook messages to course-correct reliably

    For workflows with complex PreToolUse guardrails — especially ones that provide long error messages with corrective instructions — Opus 4.7 handles the feedback loop most reliably. For simpler PostToolUse automation (formatters, notifications), model choice doesn’t matter; the hook runs regardless.

    To configure the model: export ANTHROPIC_MODEL=claude-opus-4-7 before launching Claude Code, or set it in your team’s .env.

    Hooks vs. CLAUDE.md: When to Use Each

    CLAUDE.md is the right place for context, preferences, and guidance — things you want Claude to know about your project. Hooks are the right place for behavior that must happen every time without exception.

    The practical test: if failing to follow the instruction costs you five minutes of manual cleanup, put it in a hook. If it’s a style preference or a reminder about architecture decisions, put it in CLAUDE.md. The two are complementary — you’ll likely end up with both in any mature project setup.

    A team that gets this right builds CLAUDE.md as documentation for Claude and hooks as the CI/CD equivalent for the agentic loop.

    Getting Started

    The fastest path to a working hook setup:

    1. Create .claude/settings.json in your project root if it doesn’t exist
    2. Add a PostToolUse hook wired to your formatter — this is low-risk and immediately valuable
    3. Test it by asking Claude to edit a file; the formatter should run automatically
    4. Add PreToolUse guardrails for any tool calls that have caused problems in the past

    The official hooks reference is at code.claude.com/docs/en/hooks — it covers all 21 lifecycle events, HTTP handler format, and the full JSON output schema for hook responses.

    Hooks are the difference between Claude Code as a powerful suggestion engine and Claude Code as a reliable automation layer. Once you have a PostToolUse formatter running on every edit, going back feels like working without version control.

  • Claude Code for Teams: What to Commit, What to .gitignore, and What Actually Survives a Pull Request

    Claude Code for Teams: What to Commit, What to .gitignore, and What Actually Survives a Pull Request

    Last refreshed: May 15, 2026

    Most teams I see roll out Claude Code by handing every engineer the install command and walking away. Three weeks later, half the repo has personal preferences committed to .claude/settings.json, the other half has a CLAUDE.md that contradicts the actual review process, and someone’s customized subagent is silently making code changes nobody else on the team understands.

    There is a better way, and it lives in the split between three files: CLAUDE.md, .claude/settings.json, and .claude/settings.local.json. Get this split right, and Claude Code becomes a force multiplier for the team. Get it wrong, and you are shipping AI-generated code that nobody owns.

    The Three-File Split

    Here is the rule, no exceptions:

    CLAUDE.md — committed. Project root. Every engineer’s session reads this at startup. Put your architectural decisions, preferred libraries, naming conventions, and a review checklist here. If you would not write it on a whiteboard for a new hire, it does not belong here.

    .claude/settings.json — committed. Team-wide tool permissions, default models, and hooks. This is the file that keeps personal flagship-model enthusiasts from blowing through your team’s budget when claude-sonnet-4-6 would have done the job. If you let everyone default to claude-opus-4-7 for routine refactors, your monthly invoice will tell you about it.

    .claude/settings.local.json — gitignored. Personal preferences, individual MCP server configs, anything that varies by engineer. Add this line to your .gitignore on day one:

    .claude/settings.local.json

    If you do not, someone will commit credentials by Friday. Audit your existing repo right now: git log --all --full-history -- .claude/settings.local.json will surface any history that needs scrubbing.

    The mistake I see most often is teams committing settings.local.json because someone copied a tutorial that did not make the distinction clear. That copy-paste error is the single most common Claude Code rollout failure I have seen this year.

    Shared Subagents Are the Real Win

    Project subagents live in .claude/agents/ and they ship with the repo. This is where teams compound value. A subagent for security review, one for accessibility audits, one for SQL migration safety — defined once, used by every engineer, every PR.

    A subagent definition is a markdown file with YAML frontmatter and a system prompt. When you commit it, every teammate’s claude invocation can call it. The subagent inherits your CLAUDE.md context automatically, so you do not have to redefine the project’s coding standards inside each agent.

    Here is the trap: do not put twelve subagents in there on day one. Start with one. The team’s most painful repeated review task is the right candidate. Whatever takes a long time and pulls in multiple engineers per PR — that is your first subagent. After two weeks of using it, you will know whether the second one is worth defining.

    CLAUDE.md Is a Living Document, Not a Manifesto

    The longest CLAUDE.md files I see are the worst-performing. Engineers do not read 4,000-word context files, and neither does Claude in any useful way — at some point you are paying for tokens that just dilute the signal.

    The CLAUDE.md files that actually shape behavior are usually compact, structured around three things:

    1. What this codebase is and what it is not.
    2. The handful of rules that get a PR rejected — test coverage, naming, error handling, dependency policy.
    3. A pointer to where deeper documentation lives.

    If your CLAUDE.md has a “philosophy” section, delete it. If it has a “history of the project” section, delete it. The file is read every session — make every line earn its tokens.

    CI/CD: Run Claude Code on PRs, Not in Place of Reviewers

    The pattern that works in CI is automated triage, not automated approval. A GitHub Actions workflow that runs Claude Code on every PR to check for things humans miss — missing tests, secrets in logs, public APIs without docstrings — adds value. A workflow that approves and merges PRs adds liability.

    Anthropic’s official GitHub Actions integration handles the auth and runs Claude Code headlessly. The realistic use cases:

    • Comment on PRs with a structured review (not a merge gate).
    • Auto-label PRs based on the diff.
    • Flag suspected regressions before a human reviewer opens the PR.

    Avoid: anything that auto-merges, anything that posts directly to production-facing systems, anything that calls a paid API on every commit to a feature branch. The bill compounds quickly when CI fires Claude on every push to every developer branch. Gate the workflow on PR-target branches only, or on labels.

    Where Claude Code for Teams Loses Today

    The honest list:

    • No native role-based permissions inside a single repo. If you want a junior engineer’s Claude Code to be more restricted than a senior’s, you have to enforce it through settings.json and trust everyone to not edit it. The Enterprise plan adds SSO, SCIM, and audit logs at the workspace level, but inside the repo, Claude Code itself does not differentiate by role.
    • No first-class secret scanning before commits. Hooks can plug this gap, but you have to wire pre-commit yourself.
    • Shared MCP servers are still per-developer auth. A team-shared Linear or Jira MCP, for example, still requires each engineer to authenticate individually.

    The Team plan addresses workspace-level governance through Premium seats, which is the tier that actually unlocks Claude Code for teammates. The Enterprise plan layers on SSO, SCIM, and audit logs. Neither makes the in-repo configuration questions go away — those are still your team’s problem to solve.

    Model Selection Is a Team Decision

    This one matters more than people realize. Default everyone in .claude/settings.json to claude-sonnet-4-6 for day-to-day work, with claude-opus-4-7 available for explicitly hard tasks. The current Anthropic lineup as of this writing — flagship claude-opus-4-7, workhorse claude-sonnet-4-6, fast claude-haiku-4-5-20251001 — is documented at docs.anthropic.com/en/docs/about-claude/models, and the model strings change frequently enough that hard-coding them in scripts has bitten me twice this year. Read that page, do not memorize it.

    A team that defaults to flagship for everything and a team that defaults to workhorse with selective escalation will see meaningfully different invoices for substantially the same productivity. Make the choice consciously.

    The 20-Minute Setup

    If you are rolling Claude Code out to a team next week:

    1. Add .claude/settings.local.json to .gitignore. First commit, today.
    2. Write a focused CLAUDE.md covering review-blocking rules. Ship it short.
    3. Create one subagent in .claude/agents/ for the team’s most painful review task.
    4. Add a single GitHub Actions workflow that runs Claude Code on PRs in comment-only mode.
    5. Schedule a 30-minute team review of the CLAUDE.md every two weeks. Delete more than you add.

    That is it. Everything else is iteration. The teams that succeed with Claude Code treat the configuration as code — versioned, reviewed, and pruned. The teams that fail treat it as a personal productivity tool that happens to be in a shared repo.

    Decide which kind of team you want to be before the third engineer commits.

  • Claude Code Ultraplan and Ultrareview: Anthropic’s New Agentic Planning Layer Explained

    Claude Code Ultraplan and Ultrareview: Anthropic’s New Agentic Planning Layer Explained

    Last refreshed: May 15, 2026

    Two new Claude Code capabilities shipped in the April sprint that have received almost no coverage despite being significant workflow expansions: Ultraplan, a cloud-hosted agentic planning workflow, and Ultrareview, a deep multi-pass code review command. Together they represent Claude Code’s first serious steps toward being an agentic planning tool, not just an interactive coding assistant.

    Ultraplan: Cloud-Hosted Agentic Planning

    Ultraplan is currently in early preview. The workflow is three steps:

    1. Draft in the CLI — from your terminal, describe the task or project you want Claude Code to plan. Ultraplan generates a structured execution plan: steps, dependencies, tool calls, expected outputs, error-handling branches.
    2. Review in the browser — the plan is pushed to a cloud-hosted web editor where you can read it in a structured interface, add comments, modify steps, flag concerns, and approve or reject sections. This is the human-in-the-loop gate that makes agentic execution trustworthy.
    3. Run remotely or pull back local — once approved, the plan can execute in Anthropic’s cloud infrastructure (no local machine required, runs while your laptop is off) or be pulled back to execute locally with full observability in your terminal.

    The remote execution capability is the most significant aspect. This is Claude Code’s first “runs while your laptop is closed” feature — distinct from Cowork Routines (which are consumer-facing) and designed specifically for developer workflows. A migration plan, a batch refactoring job, a test suite generation task, or a dependency upgrade across a large codebase can be approved, handed to cloud execution, and completed overnight without a machine staying on.

    When to Use Ultraplan

    Ultraplan is designed for tasks where you want to review the approach before committing to execution — not for quick, single-step tasks. The review step adds 5–15 minutes to the workflow. That is worth it when:

    • The task spans multiple files, services, or systems where a wrong step has cascading effects
    • You are working in a production codebase where mistakes have real consequences
    • The task will take more than 30 minutes to execute and you want human review before investing that time
    • You are using remote execution and cannot monitor progress in real time
    • You are delegating the task to a junior developer or teammate who will execute the plan

    For quick tasks — generate a function, fix a specific bug, explain this code — use standard Claude Code. Ultraplan’s value scales with task complexity and execution risk.

    Ultrareview: Deep Multi-Pass Code Review

    The claude ultrareview subcommand applies multiple sequential review passes to code, each with a different evaluation focus:

    • Security review — injection vulnerabilities, authentication gaps, trust boundary violations, insecure dependencies, secrets exposure
    • Performance review — algorithmic complexity, unnecessary allocations, database query patterns, caching opportunities, concurrency issues
    • Maintainability review — naming clarity, function size and cohesion, documentation gaps, test coverage, coupling and cohesion

    Each pass generates findings, and Ultrareview synthesizes them into a prioritized report with severity ratings and specific remediation recommendations. The output is designed to go directly into a pull request review comment or a team review document.

    Ultrareview vs. Standard Review

    Standard claude review applies a single review pass optimized for breadth — it catches obvious issues quickly across all dimensions. Ultrareview applies specialized depth in each dimension sequentially. The trade-off is token cost and time: Ultrareview consumes 3–5× more tokens than standard review and takes proportionally longer.

    The recommended workflow: use standard review on every pull request as part of your CI pipeline. Reserve Ultrareview for high-stakes merges — releases, security-sensitive features, architecture changes, any code that will touch production payment or authentication flows.

    Both features are available now to Claude Code users on Pro and above. Ultraplan is in early preview — activate it via claude ultraplan --enable-preview. Ultrareview is generally available — run claude ultrareview [file or directory] from any Claude Code session.

  • Claude Opus 4.7 Is Secretly ~40% More Expensive Than Opus 4.6 — Here’s Why

    Claude Opus 4.7 Is Secretly ~40% More Expensive Than Opus 4.6 — Here’s Why

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. This article compares Claude Opus 4.7 pricing to Opus 4.6 as a historical baseline. Opus 4.7 is the current flagship. Both models share the $5/$25.00 per MTok list price.. See current model tracker →

    Anthropic announced Claude Opus 4.7 with the same list pricing as Opus 4.6: $5 per million input tokens, $25 per million output tokens. What Anthropic did not announce — and what Simon Willison surfaced through direct tokenizer analysis — is that Opus 4.7 generates approximately 1.46× more tokens for the same text output as Opus 4.6. That is a ~40% real-world cost increase at unchanged list prices.

    This is not a criticism of the model. Opus 4.7 is genuinely better — 3× higher vision resolution, a new xhigh effort level, improved instruction following, higher-quality interface and document generation. The performance gains are real. The cost increase is also real, and it is not being communicated transparently in Anthropic’s pricing documentation. If you are budgeting for Claude API usage, you need to account for this.

    What Token Inflation Means

    Token inflation occurs when a model generates more tokens to express the same semantic content. It happens for several reasons: more detailed reasoning traces, more verbose explanations, additional caveats and structure, or architectural changes in how the model constructs its output. Opus 4.7 appears to produce more elaborated, structured responses than 4.6 by default — which accounts for the 1.46× multiplier.

    The practical effect: if you were spending $10,000/month on Opus 4.6 for a production application, the same application workload on Opus 4.7 costs approximately $14,600/month — before any intentional use of the new xhigh effort level, which adds further token consumption on top of the baseline inflation.

    How to Measure Your Actual Exposure

    Do not estimate — measure. Here is the four-step process:

    1. Pull your last 30 days of Anthropic API usage data from your platform dashboard. Note your average output token count per call for your primary workloads.
    2. Run a representative sample of those same workloads on Opus 4.7 using the API directly, with identical prompts and system messages. Log output token counts for each call.
    3. Calculate your actual multiplier — it may be higher or lower than 1.46× depending on your specific prompt patterns and use cases. Tasks with highly constrained output formats (structured JSON, fixed-length summaries) will see lower inflation than open-ended generation.
    4. Apply the multiplier to your budget model and adjust your spend projections before migrating production workloads to Opus 4.7.

    Mitigation Strategies

    Several approaches can reduce the cost impact while preserving Opus 4.7’s quality gains:

    • Explicit length constraints in system prompts. Adding “Respond in 200 words or fewer” or “Use bullet points, not paragraphs” constraints does not reduce quality on most tasks but meaningfully constrains token generation. Test which of your prompts accept length constraints without quality loss.
    • Model routing by task type. Use the new gateway model picker in Claude Code, or implement explicit routing in your API calls: Opus 4.7 for the tasks where quality genuinely requires it, Sonnet 4.6 or Haiku 4.5 for high-volume tasks where speed and cost matter more than peak quality. The cost difference between Haiku and Opus is roughly 30×.
    • Avoid xhigh effort unless necessary. The new xhigh effort level in Opus 4.7 consumes significantly more tokens than the default effort setting. Reserve it for tasks where maximum quality is genuinely required — complex reasoning, high-stakes code generation, detailed document analysis. Do not set it as a default.
    • Evaluate Sonnet 4.6 for your use case. For many production workloads, Claude Sonnet 4.6 at $3/$15 per million tokens delivers quality that is indistinguishable from Opus 4.7 at the task level. The Opus tier is most clearly differentiated on the most difficult tasks — extended chain-of-thought reasoning, complex multi-step coding, nuanced creative judgment. Benchmark your specific workloads before assuming Opus is required.

    The Transparency Gap

    Anthropic’s pricing page lists token costs accurately. What it does not document is how output token counts change across model versions for equivalent tasks. This is an industry-wide gap, not an Anthropic-specific failing — no major AI provider documents per-task token consumption differences between model versions in their pricing documentation.

    The practical implication for any team managing AI infrastructure: treat “same price per token” announcements as partial information. Always benchmark your actual workloads on new model versions before migrating production traffic. The 1.46× multiplier Willison measured is for general text — your specific workload multiplier will be different, and you need to know it before your invoice arrives.

    Claude Opus 4.7 is available now through the Anthropic API at platform.claude.com. API pricing: $5/M input tokens, $25/M output tokens. Measure before you migrate.

  • Claude Code v2.1.126: Gateway Model Picker, PowerShell Default on Windows, and the Week’s Full Release Stack

    Claude Code v2.1.126: Gateway Model Picker, PowerShell Default on Windows, and the Week’s Full Release Stack

    Last refreshed: May 15, 2026

    Claude Code shipped v2.1.126 today, May 1, 2026. This is the 9th release in April’s sprint and continues what has been a 2–3 releases per week cadence throughout the month. Here is the complete picture of what shipped this week across v2.1.120 through v2.1.126, with operational context for each feature that actually matters.

    v2.1.126 — Today’s Release

    Gateway Model Picker

    The gateway model picker allows you to route different tasks within a single Claude Code session to different models. This is the first step toward Claude Code as a multi-model orchestration layer rather than a single-model coding assistant. Practical use: run Haiku 4.5 on file reading, search, and summarization tasks where speed matters; route Opus 4.7 at complex reasoning, architecture decisions, and code generation where quality is the priority. The cost reduction on high-volume workflows can be material — Haiku is roughly 30× cheaper per token than Opus.

    PowerShell as Primary Shell on Windows — Git Bash No Longer Required

    This is the most significant quality-of-life change in this release for enterprise Windows shops. Claude Code previously required Git Bash as its terminal environment on Windows, which meant every Windows developer needed a non-standard shell installation, created friction in corporate IT environments with software approval processes, and produced a different developer experience than Mac/Linux teammates.

    Starting with v2.1.126, PowerShell is the primary shell on Windows. Git Bash is no longer required. For enterprise teams where half the developer fleet runs Windows and software installation requires IT approval, this removes a significant deployment barrier. Claude Code is now a standard Windows application from an IT management perspective.

    OAuth Code Terminal Input for WSL2, SSH, and Containers

    Authentication in headless environments — WSL2 sessions, SSH remote development, Docker containers — previously required workarounds. v2.1.126 adds OAuth code terminal input: Claude Code displays the authorization code directly in the terminal, you paste it into your browser, and authentication completes without requiring a browser redirect to the headless environment. Eliminates the most common authentication friction point for remote and containerized development workflows.

    claude project purge

    New command that cleans up stale project data accumulated across sessions. For teams running Claude Code in CI/CD pipelines or long-running agent workflows, project data can accumulate and affect performance. claude project purge gives you explicit control over that cleanup rather than relying on automatic garbage collection.

    v2.1.120–122 — April 28 Stack

    alwaysLoad MCP Option

    MCP servers can now be configured to always load regardless of context window state. Previously, Claude Code would make decisions about which MCP servers to initialize based on available context. alwaysLoad: true in your MCP server config guarantees that server is always available — critical for production deployments where MCP tools need to be reliably present, not conditionally loaded.

    claude ultrareview Subcommand

    claude ultrareview triggers a deep, multi-pass code review that goes beyond standard review. It applies multiple review personas in sequence — security researcher, performance engineer, maintainability analyst — and synthesizes findings into a prioritized report. For code that needs to meet high standards before production merge, ultrareview is the command. It consumes more tokens than standard review, so use it on pull requests that matter, not every commit.

    claude plugin prune

    Removes unused plugins from your Claude Code installation. As the plugin ecosystem has grown and plugin auto-update behavior has been refined in recent releases, teams accumulate plugins that are no longer active in their workflow. claude plugin prune audits your installed plugins against recent usage and removes those that have not been invoked within a configurable time window.

    Type-to-Filter Skills Search

    The skills picker now supports live type-to-filter — start typing a skill name and the list filters in real time. For teams with large skill libraries or plugin collections, this eliminates the scroll-and-hunt workflow that slowed skill invocation. Small UX change, large daily time savings at scale.

    ANTHROPIC_BEDROCK_SERVICE_TIER Environment Variable

    New environment variable that allows Claude Code running on Amazon Bedrock to specify service tier at the environment level rather than per-request. For teams using Claude Code through Bedrock as their primary deployment path — common in regulated industries that require AWS-native infrastructure — this simplifies configuration management across multiple environments and removes per-request overhead.

    OpenTelemetry Improvements

    Extended OpenTelemetry trace data now includes more granular span information for Claude Code operations. For enterprise teams with existing observability infrastructure (Datadog, Grafana, Honeycomb), Claude Code activity is now more fully integrated into your trace timeline — you can see exactly where Claude Code operations land within the context of your broader application traces.

    v2.1.123 — April 29

    Fixed OAuth 401 retry loop triggered when CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS was set. If you were seeing repeated authentication failures in environments with that flag set, update to v2.1.123 or later immediately.

    Update Now

    Update via npm install -g @anthropic-ai/claude-code@latest or through your package manager. v2.1.126 is the current stable release. For teams running Claude Code in CI/CD, update your Docker base images or pipeline steps to pin to 2.1.126.

  • Claude Code Is Shipping 2–3 Releases Per Week — What the v2.1 Cadence Means for Engineering Teams

    Claude Code Is Shipping 2–3 Releases Per Week — What the v2.1 Cadence Means for Engineering Teams

    Last refreshed: May 15, 2026

    Between April 15 and April 29, 2026, the Claude Code team shipped releases from v2.1.89 to v2.1.123 — 34 version increments in 14 days, or roughly 2–3 production releases per week. For an agentic coding tool that engineering teams run in their daily development workflow, this release cadence is worth understanding, both for what it signals about the product’s development velocity and for the practical implications of staying current.

    What’s Driving the Cadence

    The v2.1 series is where Claude Code’s parallel agents architecture is being built out. The desktop redesign for parallel agents shipped on April 14, and the v2.1 releases since then represent the iterative work of making parallel agent workflows — running multiple agents simultaneously from a single workspace — stable and usable at production quality. Rapid iteration on a new architectural feature explains the compressed release schedule better than any other factor.

    The new onboarding guide for Claude Code teams, published April 28 on code.claude.com, is a related signal. Documentation for team-scale adoption typically follows (not precedes) the stability work that makes team-scale adoption advisable. Publishing the onboarding guide now suggests the team considers the core parallel agents architecture stable enough for broader engineering team adoption.

    Parallel Agents: The Architecture Change That Matters

    The April 14 desktop redesign for parallel agents is the most significant Claude Code architectural change of the quarter. Previously, Claude Code operated as a single-agent tool — one active task at a time per workspace. The parallel agents redesign allows developers to run multiple agents simultaneously, each working on independent tasks within the same workspace, with Claude coordinating between them.

    The practical applications are significant: running tests while implementing a feature, refactoring one module while debugging another, generating documentation in parallel with code review. Tasks that previously required sequential attention can now run concurrently, compressing the time from specification to working code.

    Implications for Engineering Teams Evaluating Adoption

    The combination of the new onboarding guide and the parallel agents architecture makes this the right moment for engineering teams that have been evaluating Claude Code to make a decision. The tool has moved from “impressive demo” to “documented team workflow” with the April 28 guide, and the parallel agents capability meaningfully changes the productivity math for teams doing complex, multi-threaded development work.

    For teams already using Claude Code, staying current with the v2.1 series matters more than it did in earlier versions. The 2–3 weekly releases aren’t cosmetic — they’re iterating on the parallel agents infrastructure that the most powerful new workflows depend on. Check the changelog at code.claude.com/docs/en/changelog before major projects to ensure you’re running a recent build.

    Source: Claude Code Changelog | GitHub Releases