Tag: Anthropic

  • Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026 (And Why It Matters)

    Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026 (And Why It Matters)

    Last refreshed: May 15, 2026

    If you’ve been running Claude Code’s claude -p command in production, kicking off background jobs through the Claude Agent SDK, or wiring the Agent SDK into a third-party app, the way you pay for that work is about to change.

    Starting June 15, 2026, Anthropic is splitting Claude subscription billing into two separate buckets: one for the things you do interactively (Claude.ai chat, Claude Code in your terminal, Claude Cowork), and a brand-new credit pool that only covers programmatic, autonomous, and SDK-driven work.

    This is a meaningful shift. It’s also one of the most under-explained changes Anthropic has made to subscription pricing this year. If you don’t know about it before June 15, you can find yourself with stopped automations, surprise overage charges, or both.

    This guide walks through exactly what’s changing, what the credits cover, what they don’t cover, what each plan gets, and how to plan for it before the cutover.

    The short version

    Claude subscription plans (Pro, Max, Team, Enterprise) currently have one shared usage limit. Whether you’re chatting with Claude on the web, using Claude Code in your terminal, or running unattended jobs through the Agent SDK, all of that draws from the same plan-level allowance.

    On June 15, 2026, Anthropic is separating those two modes of use:

    • Bucket 1 — Interactive use: Claude.ai chat, Claude Code in the terminal/IDE, Claude Cowork. Uses your existing subscription usage limits, exactly as before.
    • Bucket 2 — Agent SDK monthly credit: A separate, dollar-denominated credit pool. Funds the Claude Agent SDK, the claude -p non-interactive command, the Claude Code GitHub Actions integration, and any third-party app that authenticates via the Agent SDK.

    The two buckets do not commingle. Agent SDK work cannot draw from your interactive subscription limit, and interactive use cannot draw from your Agent SDK credit. If you exhaust your Agent SDK credit and don’t have extra usage enabled, your background jobs simply stop until the credit refreshes the following month.

    What each plan gets

    Here is the official monthly Agent SDK credit by plan, as published in Anthropic’s Help Center (verified May 15, 2026):

    • Pro: $20/month
    • Max 5x: $100/month
    • Max 20x: $200/month
    • Team — Standard seats: $20/month per seat
    • Team — Premium seats: $100/month per seat
    • Enterprise — usage-based: $20/month
    • Enterprise — seat-based Premium seats: $200/month

    Important detail buried in the announcement: Enterprise seat-based plans on Standard seats are not eligible to claim the Agent SDK credit at all. If you administer one of those plans and have engineers running automation, that’s a gap to plan around.

    What the credit covers (and what it doesn’t)

    Anthropic’s documentation is specific about what counts as Agent SDK use, so this is worth reading carefully.

    Covered by the credit:

    • Claude Agent SDK usage in your own Python or TypeScript projects
    • The claude -p command in Claude Code (non-interactive mode)
    • The Claude Code GitHub Actions integration
    • Third-party apps that authenticate with your Claude subscription through the Agent SDK

    Not covered (these still draw from your normal subscription limits):

    • Interactive Claude Code in your terminal or IDE
    • Claude conversations on web, desktop, or mobile
    • Claude Cowork
    • Other features that draw from extra usage

    The plain-English version: if a human is sitting at the keyboard waiting for the response, that’s interactive use. If a script kicks off the work and the result lands somewhere else later, that’s Agent SDK use.

    How the credit actually works in practice

    Five mechanics matter for budgeting:

    1. Per-user, never pooled. Each eligible user on a Team or Enterprise plan claims their own credit. There is no organization-level pool. Credits cannot be transferred between users, shared, or stockpiled across accounts.

    2. Refreshes monthly with the billing cycle. Whatever you don’t spend in a given month evaporates. Unused credits do not roll over.

    3. One-time opt-in. You claim your credit through your Claude account once. After that initial claim, it refreshes automatically each cycle.

    4. Drains first, before any other source. When an Agent SDK request fires, it pulls from your monthly credit before any other paid usage source kicks in. This is good — it means you actually use what you’ve already paid for.

    5. After the credit, requests either flow to extra usage or stop entirely. When your monthly credit hits zero, additional Agent SDK requests draw from extra usage at standard API rates — but only if you have extra usage enabled. If you haven’t enabled extra usage, your Agent SDK requests stop until the next refresh.

    That last point is the one most likely to bite teams. If you’re running a daily cron job through the Agent SDK and you don’t enable extra usage, the day your credit runs out is the day your automation goes silent — without obvious warning if you’re not watching the credit balance.

    Why Anthropic is doing this

    Anthropic frames this as separating individual experimentation from production automation. From the Help Center documentation: “The Agent SDK monthly credit is sized for individual experimentation and automation. Teams running shared production automation should use the Claude Developer Platform with an API key for predictable pay-as-you-go billing.”

    The translation: a single user’s $20 or $200 of Agent SDK credit was never going to cover a real production workload anyway. Anthropic is making explicit what was already true under the hood — that a subscription was a chat product, and serious unattended automation belongs on the API.

    What this also does, structurally, is protect interactive subscription users from getting their experience degraded by heavy autonomous workloads sharing the same pool. If you’ve ever hit a subscription rate limit during a normal chat session because something else on your account was burning tokens in the background, this change removes that failure mode.

    What you should do before June 15, 2026

    If you run any unattended Claude work (the most important group):

    Audit every place your subscription is being used by something other than a human at a keyboard. The big four to check:

    • claude -p commands in cron jobs, CI pipelines, or shell scripts
    • Claude Code GitHub Actions workflows
    • Custom Python or TypeScript projects using the Agent SDK
    • Any third-party tool that asks for “Sign in with Claude” — those go through the Agent SDK

    For each one, estimate dollar consumption per day at standard API rates. If the total approaches or exceeds your plan’s Agent SDK monthly credit, you have three options: enable extra usage to allow overage, move that workload to a Claude Developer Platform API key (more predictable for sustained loads), or downsize the workload itself.

    If you administer a Team or Enterprise plan:

    Eligible users on your team will receive an email with claim instructions before June 15, 2026. You don’t need to take action yourself, but it’s worth communicating internally that the credits are per-user, can’t be pooled, and that any team-wide automation should be on an API key, not on a subscription seat.

    If you’re a solo Pro or Max user who only chats with Claude:

    You probably don’t need to do anything. The split affects you only if you’re running scripts or background jobs. If you’ve never used claude -p or the Agent SDK directly, your interactive usage limits don’t change.

    Frequently Asked Questions

    What happens to my Agent SDK usage on June 14 vs. June 15, 2026?

    Before June 15, Agent SDK and claude -p usage counts against your subscription’s general usage limits. Starting June 15, that same usage no longer touches your subscription limits and instead draws from the new Agent SDK monthly credit pool. Your interactive Claude Code, web chat, and Cowork usage continues to work exactly as before.

    Can I share the Agent SDK credit across my team?

    No. Per Anthropic’s official documentation, “Credits are per-user. Each eligible user on your team claims their own credit. Credits can’t be pooled, transferred, or shared across the organization.” If your team needs shared automation budget, the Claude Developer Platform with an API key is the recommended path.

    Do unused Agent SDK credits roll over?

    No. Unused credits expire at the end of each billing cycle and do not carry into the next month.

    What happens if I run out of Agent SDK credit mid-month?

    If you have extra usage enabled, additional requests flow to extra usage at standard API rates (the same per-token prices listed in Anthropic’s pricing documentation). If extra usage is not enabled, your Agent SDK requests stop until your credit refreshes at the start of the next billing cycle.

    Does this affect Claude API customers using their own API key?

    No. If you authenticate with the Agent SDK using a Claude Developer Platform API key, nothing changes. Pay-as-you-go billing continues, and you do not receive an Agent SDK monthly credit. The credit only applies to subscription-authenticated Agent SDK use.

    Is interactive Claude Code in my terminal still covered by my subscription?

    Yes. Interactive Claude Code (typing commands and getting responses in your terminal or IDE) continues to draw from your subscription usage limits exactly as before. Only the non-interactive claude -p mode and direct Agent SDK calls move to the new credit pool.

    What’s the dollar value of the credit on each plan?

    As of May 15, 2026: Pro $20, Max 5x $100, Max 20x $200, Team Standard $20/seat, Team Premium $100/seat, Enterprise usage-based $20, Enterprise seat-based Premium $200. Enterprise seat-based Standard seats do not receive a credit.

    Related Reading

    How we sourced this

    Every factual claim in this article was triple-checked across the following sources, all reviewed on May 15, 2026:

    • Anthropic Help Center: Use the Claude Agent SDK with your Claude plan (primary source for credit amounts, eligibility, and mechanics)
    • Anthropic Pricing Documentation: docs.claude.com/en/docs/about-claude/pricing (primary source for standard API rates and tool-use pricing)
    • Independent press coverage from The New Stack, The Decoder, and InfoWorld confirming the announcement and its scope

    If you spot a number that’s drifted out of sync with Anthropic’s current published rates, treat the official documentation as authoritative. The pricing surface around Claude is moving quickly in 2026, and we date-stamp specifics so readers know which facts to re-verify.

  • Elon Musk Isn’t Building the Everything App—He’s Building the Everything App’s Power Grid

    The Pivot in One Sentence
    xAI has merged into SpaceX and leased its Colossus 1 supercluster—220,000 NVIDIA GPUs, 300 megawatts of compute—entirely to Anthropic, while simultaneously targeting 2 gigawatts of total capacity at Memphis. Elon Musk is no longer primarily trying to win the AI model race. He’s becoming the AI industry’s infrastructure landlord.

    Earlier in this series, we asked whether Grok and xAI were building the everything app through X—the social-financial superapp thesis. The answer we arrived at was: maybe, but with real limitations on the model quality and consumer trust needed to pull it off.

    Then something happened that reframed the entire question. In early May 2026, xAI merged into SpaceX. Days later, Anthropic—one of xAI’s most direct AI competitors—announced it was renting the entire compute capacity of Colossus 1. All 220,000 GPUs. All 300 megawatts. For Claude. For a reported $3 to $6 billion per year.

    Musk’s comment when asked about leasing infrastructure to a competitor: “No one set off my evil detector.”

    That’s the tell. When you’re building the everything app, you don’t rent your most powerful asset to your rivals. You use it. The fact that Musk is doing exactly that reveals a strategic logic that the Grok-as-everything-app frame completely misses.

    The pivot isn’t from everything app to compute landlord. It’s the recognition that owning the power grid is more valuable than owning any single app that runs on it.

    What Colossus Actually Is

    Colossus is not a single data center. It’s a multi-building supercomputing complex in Memphis, Tennessee—and it is currently the largest single-site AI training installation in the world.

    Colossus 1, the original facility, holds H100, H200, and GB200 accelerators across more than 220,000 GPU units. That is the cluster Anthropic is now renting entirely.

    Colossus 2, the expansion xAI is keeping for its own Grok development, has already expanded to 555,000 NVIDIA GPUs with approximately $18 billion in hardware investment and 2 gigawatts of target power capacity—reached in January 2026 with the purchase of a third Memphis building. Musk’s stated goal: one million GPUs at the Memphis complex, with more AI compute than every other company combined within five years.

    As a point of reference: most frontier AI labs operate training clusters in the tens of thousands of GPUs. Microsoft’s Azure AI infrastructure, the largest hyperscaler allocation for AI, operates in the hundreds of thousands across distributed global regions. Colossus at 555,000+ GPUs in a single complex is a different category of infrastructure entirely.

    And Musk has publicly noted that xAI is only using about 11% of its available compute for Grok. The rest is—in his framing—available. Available to sell. Available to rent. Available to become the compute backbone of the AI industry whether xAI wins the model race or not.

    The xAI-SpaceX Merger: What It Actually Means

    The May 2026 merger of xAI into SpaceX as an independent entity is more than an org chart change. It’s a signals-to-strategy reveal.

    SpaceX has three things xAI needs at scale: capital (SpaceX generates billions in launch revenue annually), real estate and construction expertise (SpaceX builds rockets and factories at speed), and most critically—rockets. Starship can put mass into orbit economically in a way no other launch vehicle can. SpaceX is already moving toward a Starlink constellation of thousands of satellites. The infrastructure to extend that into orbital data centers is not theoretical.

    Anthropic’s announcement noted not just the Colossus 1 ground lease—it also expressed interest in working with SpaceX to develop multiple gigawatts of compute capacity in space. Orbital data centers. Satellite-delivered AI compute. The kind of infrastructure that has zero latency for any application that needs compute without a physical data center address.

    Musk has discussed launching a million data-center satellites as a longer-term infrastructure play. That number sounds unreasonable until you consider that SpaceX already operates over 7,000 Starlink satellites and is building Starship specifically for high-volume orbital delivery. The orbital compute thesis isn’t science fiction for SpaceX. It’s a product roadmap.

    What the xAI-SpaceX merger does is remove the pretense that these are separate businesses. They’re one integrated infrastructure play: ground-based GPU superclusters plus orbital compute capacity, connected by the world’s only commercially viable heavy-lift reusable rocket.

    The Anthropic Deal: A Strategic Reading

    Let’s be specific about what this deal represents for both sides.

    For Anthropic, the deal addresses an acute bottleneck. Anthropic’s annualized revenue grew from roughly $9 billion at end of 2025 to approximately $30 billion by early April 2026—a trajectory that implies an 80-fold increase in usage in Q1 alone. Claude Pro and Claude Max subscriber growth is outpacing Anthropic’s ability to provision compute fast enough. Renting Colossus 1 immediately unlocks 300 megawatts of capacity that would take 18-24 months to build from scratch. For Anthropic, this is a compute emergency solution with strategic upside.

    For xAI, the deal is more nuanced. Colossus 1 was already built and operational. xAI is keeping Colossus 2 for Grok development. Renting Colossus 1 generates—depending on which analyst estimate you use—between $3 billion and $6 billion annually in revenue while the asset runs at capacity rather than sitting idle. That revenue funds Colossus 2 expansion, Colossus 3, and whatever comes next. The compute landlord model is self-funding.

    The strategic implication: xAI doesn’t need Grok to win the model race for this business model to work. If Claude dominates, Anthropic needs more compute and pays xAI for it. If GPT dominates, OpenAI and its partners need more compute. If Gemini dominates, Google builds its own, but every smaller lab comes to whoever has available capacity. xAI wins in every scenario except the one where everyone else simultaneously builds their own supercomputing megacomplexes—which requires the capital and construction expertise that most AI labs don’t have.

    The Grok Situation: Honest Assessment

    The Anthropic deal does raise real questions about Grok’s trajectory. Grok app downloads have reportedly declined significantly in 2026 as ChatGPT and Claude have gained consumer mindshare. In April 2026, Elon Musk testified in the ongoing OpenAI litigation that xAI trained Grok on OpenAI model outputs—a revelation that raised questions about Grok’s training methodology and original capability claims.

    If xAI is using only 11% of its compute for Grok and is renting the rest to a competitor, the implicit message is that xAI is not currently running a max-effort campaign to win the frontier model race. It’s building infrastructure and waiting—or pivoting to a business model where the model race outcome matters less.

    This is not necessarily a failure. It may be a more durable strategy. The history of technology infrastructure is full of examples where the company that built the picks and shovels during a gold rush outlasted the miners. AWS didn’t win by building the best e-commerce site. It built the infrastructure that every e-commerce site ran on. The question is whether xAI’s compute infrastructure can fill that role for AI—and the Anthropic deal is the first real evidence that the answer might be yes.

    The “Everything App Ability” Thesis

    Here’s the reframe that this pivot suggests: maybe the right question isn’t which company will build the everything app. Maybe the right question is which company will own the infrastructure that makes the everything app possible for everyone else.

    Every company in this series—Microsoft, Google, Notion, OpenAI, Perplexity, Mistral, Zapier—needs compute. Massive, reliable, cost-effective GPU compute. The frontier model companies are burning through capital building their own clusters because the alternative is depending on hyperscalers (AWS, Azure, GCP) that charge premium rates and may eventually compete directly.

    xAI with Colossus is offering a third option: AI-native compute infrastructure, built by a company that doesn’t directly compete on most application layers, at a scale that’s difficult to replicate, at a location (Memphis) with power grid access that many coastal data center markets can’t match.

    If you’re building the everything app and you need the compute to run it—Colossus may become the place you go when AWS is too slow, Google is a competitor, and building from scratch takes two years you don’t have.

    That’s not the everything app. That’s the everything app’s power grid. And historically, the entity that owns the power grid captures durable, compounding value regardless of which specific applications win the consumer layer.

    Space: The Long Game

    The orbital compute angle deserves more than a footnote because it’s where this thesis could either collapse into fantasy or become genuinely transformative.

    The practical case for orbital data centers is latency equalization: compute in low Earth orbit can serve any point on the Earth’s surface within milliseconds, without the geographic concentration that makes terrestrial data centers vulnerable to regional power outages, natural disasters, or regulatory shutdown. For AI applications that need global deployment at consistent latency—real-time translation, autonomous vehicle coordination, financial systems—orbital compute offers something no ground-based data center geography can.

    SpaceX’s Starship dramatically changes the economics of getting mass to orbit. Current launch costs for payloads are measured in thousands of dollars per kilogram. Starship’s target is hundreds of dollars per kilogram—an order-of-magnitude reduction that makes orbital infrastructure financially viable in a way it never was before. The satellite internet analogy is instructive: Starlink was also considered impractical until SpaceX dramatically reduced launch costs, then deployed at a scale that changed the calculus entirely.

    Anthropic’s stated interest in orbital compute capacity with SpaceX isn’t a polite corporate gesture. It’s Anthropic hedging its long-term compute dependency on a technology only SpaceX can currently deliver. If even a fraction of that orbital compute vision materializes, xAI/SpaceX’s infrastructure moat becomes essentially unreplicable by any company that doesn’t own a heavy-lift reusable rocket program.

    What This Means for the Everything App Race

    The xAI infrastructure pivot doesn’t remove Grok and X from the everything app conversation entirely. X still has the distribution, the data firehose, the financial services ambitions, and the brand. Those don’t disappear because Colossus 1 is now running Claude.

    But it does add a second thesis that may ultimately matter more: xAI as the infrastructure layer beneath the entire AI economy. Not the everything app—the everything app’s foundation.

    In the history of platform technology, the company that owns the infrastructure layer almost always captures more durable value than the company that owns any individual application. TCP/IP outlasted every early internet application. AWS became more valuable than most of the businesses it hosts. The cloud didn’t belong to any one software company—it belonged to the infrastructure providers who made software deployment cheap and fast.

    If the AI era follows the same pattern, the question isn’t who builds the best everything app. It’s who builds the infrastructure that makes every everything app possible. And as of May 2026, the most credible answer to that question involves 555,000 GPUs in Memphis, a rocket program that can reach orbit, and a business model that profits whether Grok wins or loses.

    Key Takeaway

    Elon Musk pivoted xAI from model competitor to infrastructure landlord. By merging into SpaceX, leasing Colossus 1 to Anthropic, and targeting 2 gigawatts of Memphis compute capacity plus orbital data centers, xAI is positioning to capture value from the AI economy regardless of which application layer wins—the power grid, not the appliance.

    Related Reading

    This article grew out of our everything app series. If you’re tracking where AI consolidation is heading, the full series maps the competitive landscape from nine angles:

    Frequently Asked Questions About xAI, Colossus, and the Compute Landlord Pivot

    Why did xAI merge into SpaceX?

    xAI merged into SpaceX in May 2026 as an independent entity within the broader Musk enterprise. The merger combines xAI’s AI development capabilities with SpaceX’s capital generation, construction expertise, and—critically—rocket launch capabilities. This integration enables the orbital compute strategy: deploying data center satellites via Starship at dramatically lower cost than any competitor could achieve.

    What is the Anthropic-Colossus deal?

    In May 2026, Anthropic agreed to rent the entire compute capacity of Colossus 1—xAI’s first Memphis supercluster, comprising 220,000+ NVIDIA GPUs and 300 megawatts of power. The deal directly addresses Anthropic’s acute compute shortage during a period of explosive Claude usage growth. Anthropic’s annualized revenue grew from roughly $9 billion at end of 2025 to approximately $30 billion by April 2026. Analysts estimate the deal generates between $3 billion and $6 billion annually for xAI/SpaceX.

    How large is the Colossus supercomputer complex?

    As of early 2026, the Colossus complex in Memphis spans three buildings and targets 2 gigawatts of total compute capacity. Colossus 2 (kept by xAI for Grok development) has reached 555,000 NVIDIA GPUs with approximately $18 billion in hardware investment. Long-term targets include one million GPUs at the Memphis site. It is currently the largest single-site AI training installation in the world.

    What are orbital data centers and why does xAI/SpaceX care about them?

    Orbital data centers are computing facilities deployed in low Earth orbit, delivered by rocket. They offer latency equalization (serving any point on Earth within milliseconds), elimination of geographic concentration risk, and compute capacity outside any single regulatory jurisdiction. SpaceX’s Starship reduces launch costs by an order of magnitude compared to existing vehicles, making orbital compute economically viable for the first time. Anthropic’s participation in the deal included expressed interest in developing multiple gigawatts of orbital compute capacity with SpaceX.

    Does the compute landlord strategy mean xAI is giving up on Grok?

    Not necessarily, but the signals are mixed. xAI is reportedly using approximately 11% of its available compute for Grok development—the rest is available to lease. Grok app downloads have declined in 2026, and April 2026 litigation revealed Grok was trained on OpenAI model outputs. The Colossus 1 lease to Anthropic is the clearest evidence that xAI is not running a maximum-effort campaign on frontier model development and is instead diversifying into infrastructure revenue.

    How does the xAI infrastructure play relate to the everything app thesis?

    The xAI pivot suggests a reframe of the everything app question. Rather than competing to be the app users interact with daily, xAI/SpaceX is positioning to own the compute infrastructure that powers any everything app—what we’re calling the “everything app’s power grid.” Historically, infrastructure layer companies (AWS, TCP/IP, electricity grids) capture more durable value than any individual application running on top of them. The Anthropic deal is the first concrete evidence that this model may work at AI scale.

  • Claude Code Pricing in May 2026: What $20, $100, and $200 a Month Actually Buy You

    Claude Code Pricing in May 2026: What $20, $100, and $200 a Month Actually Buy You

    Last refreshed: May 15, 2026

    Claude Code pricing has stopped being a clean sticker number and started being a question of which ceiling you hit first. There is a $20 plan, a $100 plan, and a $200 plan — and underneath all three sits a 5-hour rolling window, a weekly active-hours cap added in August 2025, and a per-model multiplier that quietly makes Opus 4.7 the most expensive thing you can do inside the terminal. If you came looking for the right plan, the honest answer is: it depends on whether you are mostly a Sonnet operator or you live in Opus.

    The three subscription tiers, stripped down

    Pro — $20/month. Access to Claude Code in the terminal, web, and desktop, with both Sonnet 4.6 and Opus 4.7 available. The practical envelope is about 44,000 tokens per 5-hour window and roughly 40–80 weekly active hours on Sonnet, depending on session concurrency. This is the plan for someone running Claude Code a few hours a day on focused work — refactors, scoped feature builds, debugging passes — not someone leaving an agent running while they eat lunch.

    Max 5x — $100/month. Five times the Pro envelope, plus priority during peak demand. The window allocation lands around 88,000 tokens per 5-hour block. This is the tier where you stop thinking about token budgets during a single working day and start thinking about them across a whole week. Picked correctly, it is the cheapest way to use Claude Code as your primary IDE companion without flipping over to API billing.

    Max 20x — $200/month. Twenty times Pro — about 220,000 tokens per window — which translates to roughly 480 Sonnet-hours or about 40 Opus-hours per week before the weekly cap kicks in. Real-world reports from early 2026 had $200/month users watching single Opus prompts eat 10–20% of their daily allocation; Anthropic publicly acknowledged the problem, expanded capacity, and doubled the 5-hour rate limit for Pro and Max accounts. If you are running Claude Code across multiple repos all week and reaching for Opus on the hard problems, this is the tier that stops you from staring at a rate-limit wall.

    The API, as a sanity check

    If you want a sanity check on whether the subscription math works, price the same workload against the API:

    • Claude Haiku 4.5 (claude-haiku-4-5-20251001): $1.00 input / $5.00 output per million tokens
    • Claude Sonnet 4.6 (claude-sonnet-4-6): $3.00 input / $15.00 output per million tokens
    • Claude Opus 4.7 (claude-opus-4-7): $5.00 input / $25.00 output per million tokens

    Prompt caching is the lever almost nobody uses correctly. Cache writes cost 1.25x input price for the 5-minute TTL or 2.0x for the 1-hour TTL, but cache reads cost 0.10x — a 90% discount on every subsequent request that hits the same context. If your .clauderules file, project map, and the file you are editing are all stable for an hour, the bill on a long pairing session can drop by an order of magnitude. The Batch API knocks another 50% off both directions for asynchronous workloads, which is worth knowing if you are running large refactor sweeps.

    One trap on Opus 4.7 specifically: the model uses a new tokenizer that inflates token counts by up to 35% on identical text compared to Opus 4.6. The headline price did not change, but your effective spend per request did — sometimes by nothing, sometimes by a third, depending on the content. If you migrated from Opus 4.6 and your bill went up without your prompt patterns changing, that is the reason.

    How to actually choose

    The cleanest way to pick a plan is to first decide your model mix, then your weekly hours.

    If you are mostly a Sonnet operator — long agentic runs, multi-file edits, codebase Q&A, with Opus only reached for on the architectural questions — Pro at $20 is plausible up to about 5–8 hours of focused use per day, Max 5x covers most full-time individual developers, and Max 20x is overkill unless you are running multiple sessions in parallel.

    If you live in Opus — long-horizon agentic work, hard refactors across many files, anything where you would rather have one good attempt than three Sonnet retries — Pro will frustrate you within two weeks, Max 5x is the realistic floor, and Max 20x is the only tier that gives you a defensible Opus envelope without bouncing over to API billing.

    And if you are running Claude Code across multiple repos all week, leaving agents to grind on tasks while you do other things, Max 20x is the only subscription that holds up — and even then, the weekly cap is real. Use the API for the spillover and you will still come out cheaper than trying to brute-force a smaller plan.

    The number that matters

    One developer’s public report this year: roughly 10 billion tokens consumed across Claude Code over eight months. API metered cost would have exceeded $15,000. The same workload on Max at $100/month for the same window came in around $800 — about 93% cheaper. That is the gap that makes the subscription model worth taking seriously, even when the rate limits feel arbitrary. The $200 tier is not a vanity number; it is the price Anthropic charges to stop being a meaningful constraint on your workflow.

    The right way to read Claude Code pricing in May 2026 is not to ask which plan is cheapest. It is to ask which plan is the cheapest one that disappears — the one that stops appearing in your day. For most full-time developers reaching for Opus regularly, that plan is Max 20x. For everyone else, Max 5x is the first plan that actually gets out of your way.

  • LLMs.txt in 2026: The 4-Element Spec, The Robots.txt Pairing, and How to Verify Crawlers Are Reading It

    LLMs.txt in 2026: The 4-Element Spec, The Robots.txt Pairing, and How to Verify Crawlers Are Reading It

    If you publish an llms.txt file this week, no major model is going to fetch it tonight. That is the honest 2026 read on the spec — and yet the file is still worth shipping for narrow, specific reasons. This guide covers the 4-element specification published at llmstxt.org, the robots.txt pairing that actually controls AI crawler behavior right now, and a server-log filter you can run to verify whether anyone is reading the file you just shipped.

    What llms.txt actually is (and what it isn’t)

    llms.txt is a Markdown file served at the site root — /llms.txt — proposed by Jeremy Howard of Answer.AI on September 3, 2024. The spec at llmstxt.org defines four elements: a required H1 with the project or site name; a blockquote summary; zero or more Markdown content sections (no headings); and zero or more H2-delimited file-list sections containing annotated Markdown links to deeper content. That is the entire specification. There is no header convention, no schema requirement, no robots-style allow/deny syntax.

    What llms.txt is not: it is not a substitute for robots.txt, it is not an access-control mechanism, and as of May 2026 it is not consumed at inference time by ChatGPT, Claude, Gemini, Perplexity, or Copilot in any documented production system. Server-log audits across multiple independent practitioners show GPTBot, ClaudeBot, and Google-Extended do not request /llms.txt in meaningful volume during routine crawls.

    The realistic 2026 use case is developer tooling. AI coding assistants and IDE agents — Cursor, GitHub Copilot, Claude Code, and similar tools — retrieve docs in real time, and a curated llms.txt cuts token waste by pointing them at canonical Markdown sources instead of HTML-rendered pages bloated with nav and tracking. Companies like Anthropic, Stripe, Cursor, Cloudflare, Vercel, Mintlify, Supabase, and LangGraph ship llms.txt for that reason.

    The 4-element template — a working example

    Here is a real, valid llms.txt for a hypothetical SaaS docs site. Copy this structure, change the project name, and you have a shippable file in under 30 minutes:

    # Acme Analytics
    
    > Acme Analytics is a self-hosted product analytics platform for SaaS teams. This file points AI assistants and IDE agents at canonical Markdown documentation, not the rendered HTML.
    
    Authoritative Markdown sources for product, API, and SDK documentation. Use the `.md` variant of any docs page (append `.md` to the URL) for a clean, agent-friendly version.
    
    ## Getting Started
    
    - [Quickstart](https://acme.example/docs/quickstart.md): 10-minute setup, install through first event.
    - [Concepts](https://acme.example/docs/concepts.md): events, properties, identities, sessions — definitions and examples.
    
    ## API Reference
    
    - [REST API Reference](https://acme.example/docs/api/rest.md): every endpoint, request/response schema, rate limits.
    - [Webhook Reference](https://acme.example/docs/api/webhooks.md): payload contracts and retry behavior.
    
    ## SDKs
    
    - [JavaScript SDK](https://acme.example/docs/sdk/js.md): browser and Node, including server-side rendering notes.
    - [Python SDK](https://acme.example/docs/sdk/python.md): server-side ingestion patterns.
    
    ## Optional
    
    - [Changelog](https://acme.example/docs/changelog.md): version history, breaking changes flagged inline.
    

    Two practitioner notes. First, the spec uses an “Optional” H2 as a soft signal — links under that heading can be skipped by aggressive token budgets. Second, the file is most useful when every linked URL has a parallel .md Markdown version. If your site is pure HTML, llms.txt without paired Markdown does little.

    The robots.txt pairing — this is what actually controls AI bots today

    The lever that meaningfully controls AI crawler behavior in 2026 is robots.txt with user-agent–specific rules. Anthropic publishes official documentation for three bots — ClaudeBot for training, Claude-User for user-initiated fetches, and Claude-SearchBot for search indexing — and confirms all three honor robots.txt. OpenAI runs GPTBot (training) and OAI-SearchBot (live ChatGPT search). Google’s AI training opt-out is the Google-Extended user-agent. Perplexity uses PerplexityBot.

    The two-bucket pattern most practitioner sites should ship: block training-only crawlers, allow search and user-initiated retrieval so your content can still be cited in answers.

    # Allow AI search and user-fetch traffic (citations, attribution)
    User-agent: Claude-SearchBot
    Allow: /
    
    User-agent: Claude-User
    Allow: /
    
    User-agent: OAI-SearchBot
    Allow: /
    
    User-agent: PerplexityBot
    Allow: /
    
    # Block training-only crawlers
    User-agent: ClaudeBot
    Disallow: /
    
    User-agent: GPTBot
    Disallow: /
    
    User-agent: Google-Extended
    Disallow: /
    
    # Standard search crawler — leave open
    User-agent: Googlebot
    Allow: /
    
    Sitemap: https://example.com/sitemap.xml
    

    One operational caveat: robots.txt is policy, not enforcement. Anthropic, OpenAI, and Google have all publicly committed their named bots to compliance, but unnamed scrapers and residential-IP harvesters routinely ignore it. For sites with sensitive content, pair robots.txt with WAF or Cloudflare bot-management rules at the edge.

    Structured data still does more heavy lifting than llms.txt

    If your goal is AI citation rather than IDE-agent retrieval, structured data on the page itself moves the needle more than llms.txt. The minimum stack for any article you want cited: Article schema with named author and publisher, FAQPage schema on any post that answers a discrete question, and speakable markup on the answer paragraphs. These get parsed during normal HTML fetches by every major AI crawler — no separate file required.

    How to verify your llms.txt is actually being read

    Ship the file, then run this server-log filter weekly for 30 days. On any standard access-log format (nginx, Apache, or a Cloudflare log push), grep for requests to /llms.txt and break them down by user-agent:

    grep "GET /llms.txt" /var/log/nginx/access.log \
      | awk -F\" '{print $6}' \
      | sort | uniq -c | sort -rn
    

    What you will almost certainly see in May 2026: a steady trickle of human curl requests, the occasional IDE agent fetch tagged with a Cursor or VS Code user-agent, and effectively zero hits from GPTBot, ClaudeBot, or Google-Extended. That null result is itself the measurement — it tells you llms.txt is a developer-experience asset right now, not an AI-citation asset, and your investment should match that reality.

    The recommended 2026 rollout

    For most sites, the right sequence is: ship the robots.txt user-agent rules above first, because those are enforceable today and shape every AI crawler interaction. Add structured data to every article that competes for AI citation. Then publish llms.txt — under 30 minutes of work — for the IDE-agent and dev-tooling upside, with no expectation of immediate search lift. When OpenAI, Anthropic, or Google publicly confirm production llms.txt consumption, you are already in position.

  • Claude MCP in 2026: What Actually Changed and How to Configure It Without Wasting Tokens

    Claude MCP in 2026: What Actually Changed and How to Configure It Without Wasting Tokens

    Last refreshed: May 15, 2026

    If you set up Claude MCP six months ago and have not touched the config since, three things have changed underneath you: the recommended transport, how tools are loaded into context, and how teams share server configs. None of these are cosmetic. If you ignore them, you are leaving tokens, money, and stability on the table.

    This is the working Claude MCP setup I use in May 2026 — what the claude mcp add command actually does, which scope to pick, what the deprecation of SSE means in practice, and where Claude Code still falls short.

    The three-scope mental model

    Every MCP server you wire into Claude Code lives at exactly one of three scopes. Get this wrong and you will either leak credentials into git or wonder why your teammate cannot use the same database the AI just queried.

    • Local (default): the server is available only to you, only inside the current project. Config is written into your project’s entry inside ~/.claude.json. Good for project-specific servers like a dev database or a Sentry project key you do not want other repos to inherit.
    • User: the server is available to you across every project on your machine. Also stored in ~/.claude.json. This is where GitHub, search providers, and personal productivity servers belong.
    • Project: the server is written to a .mcp.json file at the repo root and shared with the whole team via git. Claude Code prompts for approval the first time a teammate opens the project — by design, because anyone who can push to the repo can wire a new server into your environment.

    When the same server is defined in more than one scope, Claude Code resolves it in this order: local beats project beats user beats plugin-provided. This is the part that bites people the most. If you have a “github” entry at user scope and someone adds a different “github” entry at project scope in .mcp.json, the project definition wins for that repo. Run claude mcp list when something behaves strangely.

    The commands you actually need

    The CLI is more useful than the docs make it look. Three commands cover ~90% of real setup work:

    # Add a remote HTTP MCP server at user scope (available everywhere)
    claude mcp add --transport http hubspot --scope user https://mcp.hubspot.com/anthropic
    
    # Add a local stdio server scoped only to this project
    claude mcp add my-db -s local -- node ./scripts/db-mcp.js
    
    # Share a server with your team via the repo's .mcp.json
    claude mcp add my-server -s project -- node server.js

    The short flag is -s, the long is --scope. The -- separator is required for stdio servers because everything after it is treated as the literal command to spawn. Forget it and Claude Code will try to interpret your Node arguments as its own flags.

    SSE is dead. Use Streamable HTTP.

    If your MCP server documentation still tells you to use the sse transport, the documentation is stale. The MCP spec dated 2025-03-26 introduced Streamable HTTP and simultaneously deprecated HTTP+SSE. Through 2026, vendor after vendor has set hard cutoff dates — Atlassian’s Rovo MCP server keeps SSE around until June 30, 2026 and then drops it; Keboola pulled SSE on April 1; Cumulocity’s AI Agent Manager flipped to Streamable HTTP on May 8.

    Why this matters beyond a name change: SSE required Claude Code to hold a persistent connection to a single server replica, which broke horizontal scaling and made every transient network blip a reconnection drama. Streamable HTTP is stateless. Multiple replicas behind a load balancer just work. If you have flaky MCP connections in production, the first thing to check is whether the server is still on SSE.

    For new setups, use --transport http. The older --transport sse still functions but is on the deprecation path.

    Tool Search is the feature you should actually care about

    The single biggest change in how Claude Code uses MCP in 2026 is lazy tool loading via Tool Search. Older MCP clients dumped every tool schema from every connected server into the model’s context window at the start of every conversation. With ten servers wired up that could easily be 20,000+ tokens of overhead before you typed a single character.

    Tool Search inverts this. Claude Code keeps only the server names and short descriptions resident. When a tool is actually needed, it fetches that tool’s full schema on demand. Anthropic’s own documentation says this reduces tool-definition context usage by roughly 95% versus eager-loading clients. In practice that means you can run a serious MCP fleet — GitHub, Sentry, a database, a search provider, your internal API — without quietly burning through your context budget. The Sonnet 4.6 and Opus 4.7 1M-token context window does not save you here, because anything you let crowd the prompt is also being re-read on every turn.

    Companion feature: list_changed notifications. An MCP server can now tell Claude Code “my tool list changed” and Claude Code refreshes capabilities without a disconnect-reconnect dance. If you build your own server, emit this when you swap tool definitions and you save users a restart.

    What it still gets wrong

    Honest take: claude mcp list still does not surface scope information for every entry in a useful way — there is an open issue on the anthropics/claude-code repo asking for it (#8288 if you want to track). Project-scoped servers from .mcp.json have a separate history of not appearing in the list output (#5963) depending on how you opened the project. If you cannot find a server, check both ~/.claude.json and ./.mcp.json directly.

    The other rough edge is the project-approval prompt. The first time you open a repo with a new .mcp.json, Claude Code asks you to approve each project-scoped server. That is the right security default. It is also infuriating in CI or any non-interactive shell, where the prompt blocks the session. The current workaround is to bake the servers in at user scope on build agents so the project-scope approval never fires in CI. A cleaner non-interactive approval flow is the single most-requested fix I see in real teams.

    The setup I would run on a new machine today

    User-scope: GitHub, a code search server, and a single notes/Notion server. Project-scope in each repo’s .mcp.json: whatever database the project owns and whatever observability backend it reports to. Local-scope: anything experimental I am evaluating but do not want my team or my other repos to inherit.

    Pin --transport http on everything remote. Skip Desktop Extensions (.dxt) for anything you want versioned with the codebase — they are a Claude Desktop convenience, not a Claude Code primitive, and they hide the config from your team. Run claude mcp list when something is off and read .mcp.json directly when list is unhelpful.

    That is the whole working model. The pieces that matter — three scopes, Streamable HTTP, Tool Search — fit on a single screen. The pieces that have not caught up yet — list output, non-interactive approvals — are visible in the issue tracker and will move.

  • Claude Code Hooks: The Workflow Control Layer That Actually Enforces Your Rules

    Claude Code Hooks: The Workflow Control Layer That Actually Enforces Your Rules

    Last refreshed: May 15, 2026

    You’ve been there. You add a rule to CLAUDE.md — “always run prettier after editing files” — and Claude follows it, most of the time. Then it doesn’t. The formatter doesn’t run, the lint check gets skipped, and you’re back to reviewing diffs manually.

    Hooks fix this. Claude Code hooks are shell commands, HTTP endpoints, or LLM prompts that fire deterministically at specific points in Claude’s agentic loop. Unlike CLAUDE.md instructions, which are advisory, hooks are enforced at the execution layer — Claude cannot skip them.

    As of early 2026, Claude Code ships with 21 lifecycle events across four hook types. This article covers the two that matter most for daily workflow: PreToolUse and PostToolUse.

    How Hooks Work Architecturally

    Claude Code’s agent loop is a continuous cycle: receive input → plan → execute tools → observe results → repeat. Hooks intercept this loop at named checkpoints.

    Every hook is defined in .claude/settings.json under a hooks key. A hook entry has three parts: the lifecycle event name, an optional matcher (a regex against tool names), and the handler definition — either a shell command, an HTTP endpoint, or an LLM prompt.

    {
      "hooks": {
        "PostToolUse": [
          {
            "matcher": "Write|Edit",
            "hooks": [
              {
                "type": "command",
                "command": "npx prettier --write "$CLAUDE_TOOL_INPUT_FILE_PATH""
              }
            ]
          }
        ]
      }
    }

    That’s it. Every file Claude writes or edits now auto-formats. No CLAUDE.md reminders, no hoping Claude remembers — the formatter runs on every single Write or Edit tool call, period.

    PreToolUse: Enforce Before Claude Acts

    PreToolUse fires before Claude executes any tool. Your hook receives the full tool call — name, inputs, arguments — and can return one of three signals:

    • Exit 0 → allow the tool call to proceed
    • Exit 2 → block the tool call; Claude receives your error message and adjusts
    • Exit 1 → hook error; Claude proceeds but logs the failure

    This makes PreToolUse the right place for guardrails. Here’s a real example: blocking npm in a bun project.

    #!/bin/bash
    # .claude/hooks/check-package-manager.sh
    # Blocks npm commands in projects that use bun
    
    if echo "$CLAUDE_TOOL_INPUT_COMMAND" | grep -qE "^npm "; then
      echo "Error: This project uses bun, not npm. Use: bun install / bun run / bun add" >&2
      exit 2
    fi
    exit 0

    Wire it in settings.json:

    {
      "hooks": {
        "PreToolUse": [
          {
            "matcher": "Bash",
            "hooks": [
              {
                "type": "command",
                "command": ".claude/hooks/check-package-manager.sh"
              }
            ]
          }
        ]
      }
    }

    Now when Claude tries npm install, the hook exits 2, Claude sees the error message, and it switches to bun install without you intervening. The correction happens in the same turn.

    Another production pattern: blocking writes to protected paths.

    #!/bin/bash
    # Prevent Claude from modifying migration files already run in production
    if echo "$CLAUDE_TOOL_INPUT_FILE_PATH" | grep -qE "db/migrations/"; then
      echo "Error: Migration files are immutable after deployment. Create a new migration instead." >&2
      exit 2
    fi
    exit 0

    PostToolUse: React After Claude Acts

    PostToolUse fires after a tool completes successfully. It can’t block execution, but it can provide feedback — and it can run any side-effect you need automatically.

    Auto-format every edit:

    {
      "hooks": {
        "PostToolUse": [
          {
            "matcher": "Write|Edit",
            "hooks": [
              {
                "type": "command",
                "command": "npx prettier --write "$CLAUDE_TOOL_INPUT_FILE_PATH" 2>/dev/null || true"
              }
            ]
          }
        ]
      }
    }

    Run tests after code changes:

    #!/bin/bash
    # Run affected tests after any source file edit
    FILE="$CLAUDE_TOOL_INPUT_FILE_PATH"
    if echo "$FILE" | grep -qE "\.(ts|js|py)$"; then
      if [ -f "package.json" ]; then
        npx jest --testPathPattern="$(basename ${FILE%.*})" --passWithNoTests 2>&1 | tail -5
      fi
    fi

    Desktop notification on task completion:

    {
      "hooks": {
        "Stop": [
          {
            "hooks": [
              {
                "type": "command",
                "command": "osascript -e 'display notification "Claude finished" with title "Claude Code"'"
              }
            ]
          }
        ]
      }
    }

    Environment Variables Available to Hooks

    Claude Code exposes context about the triggering tool call through environment variables. The ones you’ll use most:

    VariableValue
    $CLAUDE_TOOL_NAMEName of the tool being called (e.g., Edit, Bash, Write)
    $CLAUDE_TOOL_INPUT_FILE_PATHFile path for Edit, Write, Read calls
    $CLAUDE_TOOL_INPUT_COMMANDShell command for Bash calls
    $CLAUDE_SESSION_IDCurrent session ID — useful for audit logging
    $CLAUDE_TOOL_RESULT_OUTPUTOutput of the tool (PostToolUse only)

    These are injected by Claude Code before your hook runs. You don’t configure them — they’re always there.

    The Model Question: Which Claude Runs Agentic Tasks?

    One practical consideration for hook-heavy workflows: the default model affects how well Claude responds to hook feedback. As of May 2026:

    • claude-opus-4-7 ($5/MTok input, $25/MTok output) — highest agentic coding capability; best at interpreting hook rejection messages and self-correcting without re-asking
    • claude-sonnet-4-6 ($3/MTok input, $15/MTok output) — strong balance of speed and reasoning; handles most hook-corrected flows well
    • claude-haiku-4-5-20251001 ($1/MTok input, $5/MTok output) — fastest; may require more explicit hook messages to course-correct reliably

    For workflows with complex PreToolUse guardrails — especially ones that provide long error messages with corrective instructions — Opus 4.7 handles the feedback loop most reliably. For simpler PostToolUse automation (formatters, notifications), model choice doesn’t matter; the hook runs regardless.

    To configure the model: export ANTHROPIC_MODEL=claude-opus-4-7 before launching Claude Code, or set it in your team’s .env.

    Hooks vs. CLAUDE.md: When to Use Each

    CLAUDE.md is the right place for context, preferences, and guidance — things you want Claude to know about your project. Hooks are the right place for behavior that must happen every time without exception.

    The practical test: if failing to follow the instruction costs you five minutes of manual cleanup, put it in a hook. If it’s a style preference or a reminder about architecture decisions, put it in CLAUDE.md. The two are complementary — you’ll likely end up with both in any mature project setup.

    A team that gets this right builds CLAUDE.md as documentation for Claude and hooks as the CI/CD equivalent for the agentic loop.

    Getting Started

    The fastest path to a working hook setup:

    1. Create .claude/settings.json in your project root if it doesn’t exist
    2. Add a PostToolUse hook wired to your formatter — this is low-risk and immediately valuable
    3. Test it by asking Claude to edit a file; the formatter should run automatically
    4. Add PreToolUse guardrails for any tool calls that have caused problems in the past

    The official hooks reference is at code.claude.com/docs/en/hooks — it covers all 21 lifecycle events, HTTP handler format, and the full JSON output schema for hook responses.

    Hooks are the difference between Claude Code as a powerful suggestion engine and Claude Code as a reliable automation layer. Once you have a PostToolUse formatter running on every edit, going back feels like working without version control.

  • Claude Context Window — Every Question Answered (Complete FAQ 2026)

    Last refreshed: May 15, 2026

    Tygart Media · Claude Context Window Reference

    Claude Context Window — Every Question Answered

    Updated May 9, 2026 · Sizes verified from Anthropic’s official models page · Based on production use

    Context window questions answered from someone who actually uses the 1M token window in production — not from a spec sheet alone.

    Covers window sizes by model, what 1M tokens holds, the memory vs context distinction, performance at long context, and API-specific details. Full explainer: Claude Context Window Size 2026

    Size Questions

    What is Claude’s context window size in 2026?

    Model API String Context Window Max Output
    Claude Opus 4.7 claude-opus-4-7 1,000,000 tokens 128,000 tokens
    Claude Sonnet 4.6 claude-sonnet-4-6 1,000,000 tokens 64,000 tokens
    Claude Haiku 4.5 claude-haiku-4-5-20251001 200,000 tokens 64,000 tokens

    Source: Anthropic’s official models page, verified May 9, 2026.

    What does 1 million tokens actually hold?

    • ~750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
    • A full mid-size codebase — a 50,000-line Python project with comments
    • ~60–100 research PDFs at 20–30 pages each, all simultaneously
    • Hours of meeting transcripts — a full workday of recorded calls, transcribed
    • Our full WordPress site audit — 200+ posts worth of content loaded in one session for comprehensive SEO analysis

    The shift from 200K to 1M wasn’t just “more room.” It changed what we could ask Claude to do in a single session — whole-codebase reasoning, multi-document synthesis, full-history context.

    How many pages can Claude read at once?

    A typical 20-page PDF is roughly 10,000–15,000 tokens, so at 1M tokens you could load 60–100 such documents simultaneously. A 300-page book runs roughly 150,000–200,000 tokens — Claude can hold 5–6 full books in context at once. In practice, the constraint is usually time to upload and your session structure, not the window ceiling.

    What’s the difference between context window and memory?

    Three distinct things that get conflated:

    • Context window: Everything Claude can see right now in this session. Temporary — disappears when the session ends.
    • claude.ai memory: Facts extracted from past conversations and injected as a summary into new sessions. Persistent but compressed — a small snippet in the context, not the full history.
    • Managed Agents memory stores / Dreaming: Developer-layer knowledge graphs that agents build and refine between sessions. More structured than consumer memory, requires API implementation.

    The 1M context window is your working memory for one session. Memory systems are what carry information across sessions — they work by injecting a summary into the new session’s context, not by giving Claude access to the full prior history.


    Performance Questions

    Does performance degrade at very long context lengths?

    The honest answer: yes, somewhat, and it depends on the task. The “lost in the middle” pattern is real — models tend to weight the beginning and end of very long contexts more heavily than the middle. For tasks that require pinpointing specific information buried deep in a 500-page document, performance is lower than for shorter contexts. For tasks that benefit from broad synthesis across a large body of material — architectural review, theme identification, cross-document comparison — long context is a net positive. Structure important information at natural reference points rather than burying it in the middle of a large document.

    How does Opus 4.7’s context window differ from Sonnet 4.6?

    Same 1M input context window. The difference is max output: Opus 4.7 can generate up to 128,000 tokens in a single response; Sonnet 4.6 caps at 64,000. For most tasks this doesn’t matter. It matters for generating very long documents, large codebases in a single pass, or batch outputs that need to be very long. If you’re not generating 64K+ token outputs, choose between models on capability and cost, not on output ceiling.

    What happens when I hit the context window limit?

    Earlier messages begin dropping out of the active context. Claude can no longer reference information from those dropped messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification as you approach the limit. In API usage, the context window limit is enforced hard — requests exceeding it return an error.


    API and Technical Questions

    Is the 1M context window available on the free plan?

    The model available to free plan users supports the 1M window technically, but free plan rate limits mean sustained heavy long-context use hits limits quickly. The window is available; using it intensively for extended periods is more practical on paid tiers.

    What’s the extended output option on the Batch API?

    On the Message Batches API, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 300,000 output tokens using the output-300k-2026-03-24 beta header. This applies only to batch processing — not to synchronous API calls. Useful for large documentation generation, book-length content, or large codebase outputs in batch.

    Can I query context window limits programmatically?

    Yes. The Models API returns max_input_tokens, max_tokens, and a capabilities object for every available model. If you’re building systems that need to programmatically enforce context limits or route by capability, this is the right way to get current values rather than hardcoding from documentation.

    Does context window size affect API cost?

    Only indirectly — you pay for tokens consumed, not for context window capacity. A 1M token window doesn’t cost more than a 200K window. You pay for the tokens you actually send and receive. Loading a 500K-token document into context costs the same per token regardless of whether the model has a 200K or 1M window. The window size determines whether the request is possible at all — not what it costs per token.

  • Claude Pricing — Every Question Answered (Complete FAQ 2026)

    Last refreshed: May 15, 2026

    Tygart Media · Claude Pricing Reference

    Claude Pricing — Every Question Answered

    Updated May 9, 2026 · All prices verified from Anthropic’s official pricing page · Model strings current

    Subscription vs. API. Free vs. Pro vs. Max. Managed Agents on top. What actually changed in May 2026. The answers without the marketing layer.

    Covers subscription plans, API token rates, Managed Agents pricing, Claude Security, and the May 2026 rate limit changes. Full pricing page: Claude AI Pricing — All Plans

    Plan Pricing

    What does each Claude plan cost?

    Plan Price Claude Code Best For
    Free $0 Casual / evaluation use
    Pro $20/mo Individual daily power use
    Max 5× $100/mo Heavy individual use, no peak throttle
    Max 20× $200/mo Highest individual ceiling available
    Team Standard $25/seat/mo (annual) · $30 monthly Shared team access, no coding
    Team Premium $100/seat/mo (annual) · $125 monthly Shared team access + coding
    Enterprise Custom Large orgs, custom limits, SSO

    All subscription prices are per-user per-month. Annual billing locks in the lower rate.

    What’s the difference between Pro and Max?

    Same models, same Claude Code access. Max gives you more usage within the 5-hour rolling window — 5× or 20× Pro’s limit depending on tier — and eliminates peak-hours throttling. If you regularly hit Pro’s limits mid-session, Max is the upgrade. If you haven’t hit limits on Pro, you don’t need Max.

    Did the May 2026 SpaceX deal change subscription pricing?

    May 6, 2026Prices unchanged. Limits doubled. Peak-hours throttling eliminated for Pro and Max. Free plan unchanged.

    The SpaceX Colossus 1 compute expansion doubled the 5-hour rate limit ceiling for Pro, Max, Team, and Enterprise — at no price increase. If you’ve been hitting limits and considering upgrading to Max, check first whether the doubled Pro ceiling now fits your workflow.


    API Pricing

    How does API pricing work?

    API pricing is pay-per-token — you pay for what you use, no subscription required. Rates as of May 2026 (verified from Anthropic’s official models page):

    Model API String Input / MTok Output / MTok
    Claude Opus 4.7 claude-opus-4-7 $5 $25
    Claude Sonnet 4.6 claude-sonnet-4-6 $3 $15
    Claude Haiku 4.5 claude-haiku-4-5-20251001 $1 $5

    Batch API discounts, prompt caching rates, and extended thinking costs apply on top — see Anthropic’s full pricing page for those specifics.

    Is subscription or API cheaper for my use case?

    Subscription wins for consistent daily use (claude.ai interface, Claude Code). API wins for variable-volume programmatic use and batch workloads. The breakeven point: if you’re using Claude heavily enough to hit Pro’s limits even weekly, you’re likely consuming more than $20/month in equivalent API tokens. For batch processing at scale, the Batch API with its discount rate is almost always the most cost-efficient path.

    What’s the real cost of Opus 4.7 vs Sonnet 4.6?

    List price: Opus 4.7 is $5/$25 per MTok input/output vs Sonnet 4.6’s $3/$15 — roughly 1.67× more expensive at list. However, Opus 4.7’s tokenizer produces approximately 1.46× more tokens per task than Sonnet 4.6 on typical workloads, meaning real-world Opus 4.7 costs can run meaningfully higher than the list price ratio implies. For most production API workloads, Sonnet 4.6 is the right default. Use Opus 4.7 when the task genuinely requires maximum reasoning and cost is secondary.


    Managed Agents Pricing

    What does Claude Managed Agents cost?

    Two charges: standard API token rates for whatever model you use, plus $0.08 per session-hour of active runtime. That’s the complete formula — no other managed infrastructure fee on top.

    A session-hour is one hour of active session status. Billing is metered to the millisecond. Idle time, time waiting for your input, and time waiting for tool confirmations do not accrue charges.

    Maximum theoretical monthly runtime cost (24/7 agent): 24 hrs × $0.08 × 30 days = $57.60/month. In practice, token costs become the dominant cost driver well before you approach this ceiling.

    Full breakdown: Claude Managed Agents Complete Pricing Reference

    What does web search cost inside a Managed Agents session?

    $10 per 1,000 searches ($0.01 per search), billed separately from session runtime and token costs. Same rate as web search via the standard API.

    What does Dreaming cost?

    Dreaming uses an advisor/executor billing model. The advisor generates a short plan (typically 400–700 tokens) at the advisor model’s rate; the executor handles the full memory reorganization at its rate. Combined cost stays well below running the advisor model end-to-end. Use max_uses to cap advisor calls per request. Dreaming is developer preview — invitation-only access as of May 2026. Docs: platform.claude.com/docs/en/managed-agents/dreams


    Specialty Model Pricing

    What does Claude Mythos Preview cost?

    $25 per million input tokens, $125 per million output tokens. Invitation-only through Project Glasswing — no self-serve access. Contact Anthropic at anthropic.com/glasswing. Claude Mythos is not available through any subscription tier or standard API access.

    Is Claude Security Beta included in my plan?

    Claude Security Beta is available to all Enterprise customers during the beta period — included as part of Enterprise, no separate per-scan fee. Underlying model is Opus 4.7 ($5/$25 per MTok at API rates). For Enterprise pricing including Claude Security, contact Anthropic sales. Standard API users do not have access during beta.

  • Claude Code — Every Question Answered (Complete FAQ 2026)

    Last refreshed: May 15, 2026

    Tygart Media · Claude Code Reference

    Claude Code — Every Question Answered

    Updated May 9, 2026 · Verified against Anthropic docs · Claude Code v2.1.133

    No preamble. If you’re here, you’re trying to install Claude Code, figure out pricing, or understand what changed. Here are the actual answers.

    This page covers installation, pricing by plan, what’s new in 2026, and the questions that don’t have clean homes in Anthropic’s documentation. Updates as Claude Code ships new versions — currently tracking weekly releases.

    Pricing Questions

    How much does Claude Code cost?

    Claude Code has no separate subscription fee. Access is included in these Claude plans:

    Plan Monthly Cost Claude Code Rate Limits
    Free $0 ❌ Not included
    Pro $20 ✅ Included 5-hr window, doubled May 2026
    Max (5×) $100 ✅ Included 5× Pro limits, no peak throttle
    Max (20×) $200 ✅ Included 20× Pro limits, no peak throttle
    Team Standard $25/seat ❌ Not included
    Team Premium $100/seat ✅ Included 6.25× Pro limits, doubled May 2026
    Enterprise Custom ✅ Included Custom

    API usage (tokens consumed by Claude Code) is billed separately at standard API rates on top of your subscription. For most users, subscription is the dominant cost.

    Is there a Claude Code student discount or Amazon Prime bundle?

    No. As of May 2026, there is no Claude Code-specific student discount and no Amazon Prime Student bundle that includes Claude Code. Pro at $20/month is the cheapest plan that includes Claude Code access. See the full student discount guide for what legitimate options exist for reducing cost.

    What did the May 2026 SpaceX deal change for Claude Code users?

    May 6, 2026 UpdatePeak-hours throttling eliminated for Pro and Max. 5-hour rate limits doubled for Pro, Max, Team Premium, and Enterprise. Free plan unchanged.

    If you’ve been hitting limits during long agentic runs or multi-file refactors, the ceiling is now twice as high. Source: anthropic.com/news/higher-limits-spacex


    Installation Questions

    What are the system requirements for Claude Code?

    • Node.js 18+ required (Node.js 20+ recommended)
    • macOS, Linux, or Windows (Windows support GA as of April 2026 — PowerShell is now the default shell, Git Bash no longer required)
    • Active Anthropic account on a plan that includes Claude Code (Pro, Max, Team Premium, or Enterprise)

    How do I install Claude Code?

    One command:

    npm install -g @anthropic-ai/claude-code

    Then authenticate:

    claude

    Full installation walkthrough with troubleshooting: How to Install Claude Code

    How do I update Claude Code to the latest version?

    npm update -g @anthropic-ai/claude-code

    Current version as of May 9, 2026: v2.1.133 (released May 7, 23:49 UTC). Check your version with claude --version.

    What’s in the latest Claude Code release?

    v2.1.133 (May 7, 2026) key changes:

    • Subagent skill discovery fix — subagents now correctly find project, user, and plugin skills via the Skill tool. Previously a silent failure that broke multi-agent pipelines without obvious error.
    • worktree.baseRef setting (fresh | head) — controls whether EnterWorktree branches from origin/<default> or local HEAD. Default is fresh — this changes prior behavior if you relied on EnterWorktree inheriting unpushed commits.
    • Hooks now receive active effort level via effort.level JSON field and $CLAUDE_EFFORT env var
    • Memory improvement: warm-spare background workers release under memory pressure
    • Fixed parallel sessions hitting 401 from a refresh-token race

    Full release notes: github.com/anthropics/claude-code/releases


    Model Questions

    Which Claude model does Claude Code use?

    By default, Claude Code uses the model Anthropic recommends for coding tasks — currently claude-sonnet-4-6 for most operations, with claude-opus-4-7 available for complex reasoning tasks. The v2.1.126 gateway model picker lets you configure multi-model routing. Current model strings (verified from Anthropic docs):

    • claude-opus-4-7 — most capable, 1M context, 128K max output
    • claude-sonnet-4-6 — balanced speed/intelligence, 1M context, 64K max output
    • claude-haiku-4-5-20251001 — fastest, 200K context

    What happens when Claude Sonnet 4 and Opus 4 retire June 15, 2026?

    If you have any Claude Code configuration or scripts pinning the 20250514 date-string model IDs, those will break. Claude Code’s default model routing will update automatically — but custom configurations pointing to specific deprecated strings won’t. Search your config files for 20250514 now and update to claude-sonnet-4-6 or claude-opus-4-7.


    Capability Questions

    What is Claude Code actually good at vs. not good at?

    Strong: Multi-file refactors, understanding existing codebases, writing tests against real code, debugging with full context, long-horizon tasks that require holding many files in mind simultaneously, architectural reasoning across a full project.

    Less strong: Tasks requiring real-time external data without a tool, highly specialized domain knowledge that isn’t well-represented in training, generating correct code for very niche frameworks with limited documentation.

    Can Claude Code run terminal commands on my machine?

    Yes — with your permission. Claude Code operates in a permission model where it asks before running commands, editing files, or taking actions outside the current working directory. You configure which operations auto-approve and which require confirmation. The claude CLI runs with your local user permissions, not elevated ones.

    What is computer use in Claude Code?

    Computer use (research preview as of April 2026) lets Claude Code open native apps, navigate desktop UI, click through interfaces, and verify results from the terminal — without needing an API or automation script. Available on macOS and Windows within the Cowork desktop app. Useful for tools with no accessible API; slower than direct API integrations when those exist.

    What’s the difference between Claude Code CLI and Claude Code in the IDE?

    The CLI (claude command) is the core product — works in any terminal, any OS, any project. IDE extensions (VS Code, JetBrains) provide UI integration on top of the same underlying capability. Both use the same authentication and the same model. The CLI is the authoritative version for anything involving automation, scripts, or multi-step agentic workflows.

  • Snowflake’s $200M Claude Partnership and India’s Glasswing Gap: Two Enterprise Stories That Matter

    Last refreshed: May 15, 2026

    Two partnership and policy stories from the Anthropic desk that haven’t been covered here yet, both with meaningful implications for how Claude reaches enterprise users and how governments are thinking about AI security risk.

    Part 1: Snowflake’s $200M Partnership — 12,600 Enterprise Customers as Distribution

    In December 2025, Anthropic and Snowflake announced a multi-year, $200M partnership making Claude models available to Snowflake’s 12,600+ enterprise customers across all three major clouds. The partnership makes Claude the AI layer inside Snowflake’s data platform for a client base concentrated in financial services, healthcare, and life sciences — the three regulated verticals where Anthropic has been most deliberately building.

    The specific products:

    • Snowflake Intelligence — powered by Claude Sonnet 4.6, providing conversational data analysis directly within the Snowflake environment
    • Snowflake Cortex AI Functions — supporting Claude Opus 4.5 and newer models for structured AI functions across the Snowflake data warehouse

    Source: anthropic.com/news/snowflake-anthropic-expanded-partnership

    The number that matters most here isn’t $200M — it’s 12,600. That’s the customer count Snowflake brings as a distribution channel. These are enterprise organizations that have already made a procurement decision to standardize on Snowflake for data infrastructure. Embedding Claude inside that infrastructure means Claude becomes the AI system those organizations reach for when they need to query, analyze, or reason about their own data — without requiring a separate AI platform procurement decision.

    This is the distribution model that makes enterprise AI market share move: not direct sales to 12,600 enterprises, but a single partnership that makes Claude the default AI layer inside infrastructure those enterprises already use. Snowflake customers in financial services can run Claude-powered compliance analysis on their own Snowflake data. Healthcare organizations can run Claude-powered analysis on patient data that stays within their existing Snowflake security perimeter.

    The regulated-industry focus is deliberate. Financial services, healthcare, and life sciences are the verticals where data governance requirements are strictest — and where the ability to run AI on your own data, within your own security perimeter, without moving that data to an external AI service, is the deciding factor in procurement. Snowflake’s existing data residency and compliance infrastructure makes that possible in a way that a direct Anthropic API call often doesn’t.

    Part 2: India’s RBI Warning + The Glasswing Gap

    In late April 2026, India’s Finance Ministry and Reserve Bank of India convened meetings on cybersecurity preparedness specifically referencing Claude Mythos risk. Finance Minister Nirmala Sitharaman met with bank executives at North Block to advise pre-emptive hardening. The RBI began consulting with global regulators. CERT-In, major telcos, and fintechs ran parallel risk assessments.

    Source: Business Standard, April 27, 2026 — business-standard.com

    The structural issue underneath the news: Project Glasswing — Anthropic’s defensive cybersecurity consortium that provides early access to Mythos for defensive purposes — named the following founding partners: AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. Zero Indian firms. India is Anthropic’s second-largest market globally. Its government is actively warning its financial sector about Mythos risk. And no Indian organization is in the defender consortium that gets early access to the model and the defensive research that goes with it.

    This is not a small gap. The Mozilla Firefox result (271 vulnerabilities in a month, including 20-year-old bugs) demonstrated what Mythos can do in a real production codebase. If that capability is available to offensive actors — or if non-partner organizations don’t have the same early visibility into what Mythos can find — organizations outside the Glasswing partner network are in a different risk position than those inside it.

    The Tension This Creates

    Anthropic’s distribution into India is accelerating. Cognizant deployed Claude across 350,000 employees. Razorpay built its Agent Studio on the Claude Agent SDK and wired UPI rails through Claude as an authorized payment agent with NPCI. Air India, CRED, and Swiggy are named enterprise customers. India is Anthropic’s second-largest market.

    Meanwhile: India’s government is warning its financial sector about the offensive potential of Claude Mythos, no Indian firm is in the Glasswing defender consortium, and INR-denominated pricing (with 18% GST) makes the effective Pro subscription cost approximately ₹2,240/month for Indian users — a meaningful friction point for the market Anthropic is describing as its #2 global market.

    The distribution is running faster than the partnership infrastructure is opening. Either Project Glasswing expands to include Indian financial institutions and cybersecurity organizations, or India builds its own parallel defensive capacity, or the gap becomes a structural political fact in Anthropic’s India relationship.

    India’s government isn’t opposed to Claude. It’s actively adopting it across both public and private sector. The RBI/Finance Ministry meetings were framed as hardening preparation, not restriction. But the asymmetry — India as top-2 market, zero Indian firms in the defender consortium — is conspicuous enough that it will eventually require a response.

    Frequently Asked Questions

    What does the Snowflake-Anthropic partnership include?

    A multi-year, $200M agreement announced December 2025, making Claude models available to Snowflake’s 12,600+ enterprise customers. Snowflake Intelligence launched powered by Claude Sonnet 4.6 for conversational data analysis (model at time of partnership announcement; verify current model with Snowflake). Snowflake Cortex AI Functions supports Opus 4.5 and newer models. The focus is regulated industries: financial services, healthcare, and life sciences.

    What is Project Glasswing?

    Project Glasswing is Anthropic’s invitation-only defensive cybersecurity program that provides early access to Claude Mythos Preview for organizations working to defend critical infrastructure. Named founding partners include AWS, Apple, Cisco, CrowdStrike, Google, JPMorgan Chase, Microsoft, and Nvidia. Access is invitation-only with no self-serve sign-up. No Indian organizations are currently named as Glasswing partners.

    Why is India’s government warning about Claude Mythos if India is Anthropic’s second-largest market?

    The Indian government’s meetings (RBI, Finance Ministry, CERT-In) were framed as defensive preparation, not restriction. The concern is that Mythos-tier capability could be used offensively against Indian financial infrastructure — a legitimate risk that applies regardless of Anthropic’s commercial relationship with India. The tension is that organizations inside Project Glasswing get early access to defensive research while India’s financial sector, with no Glasswing presence, does not.