Tag: Claude API

  • Claude at Scale: Every Usage Limit, Context Window, and File Size Cap (May 2026)

    Claude at Scale: Every Usage Limit, Context Window, and File Size Cap (May 2026)

    Last refreshed: May 15, 2026

    Claude usage limits at scale - context window, file size, team seats, extra usage
    Once you stop asking what Claude is and start asking how to use it at scale, the limits become the conversation.

    Once you stop asking “what is Claude” and start asking “how do I use Claude at scale,” you run into a different category of question. How big is the context window, actually, in this specific situation? What’s the file upload limit? What happens when one teammate burns through the Team plan? Where does the 1M context window apply and where doesn’t it? When does extra usage kick in and what does it cost?

    The answers exist — they’re just spread across a dozen Anthropic Help Center articles, and the wrong combination of guesses can make you think you’ve hit a hard limit when you’ve actually just hit the wrong setting. This article is the consolidated map. Triple-sourced against Anthropic’s official documentation, verified May 15, 2026.

    The four limits that matter most

    If you’re running Claude in any sustained capacity, four limits will define your experience. Get these right and you have headroom. Get them wrong and you’ll think Claude is broken when it’s actually working as designed.

    1. Context window — how much Claude can read in a single conversation. Varies by model and surface. The 1M window is real but only available in specific places.

    2. File upload size — how big a single file can be. 30 MB cap per file across the board, with workarounds for larger files.

    3. Usage limits — how much Claude work you can do per session/week. Per-user, not pooled. Different limits for chat vs Claude Code vs Agent SDK.

    4. Extra usage / overage — what happens when you hit the cap. Either you’ve enabled it and you keep going at API rates, or you’re stopped until the limit resets.

    Context window: where 1M tokens actually applies

    Per Anthropic’s Help Center documentation (verified May 15, 2026), context window size depends on the model AND on the surface you’re using Claude through. This is the single most-misunderstood limit because the same model can have a different context window in chat than it does in Claude Code or the API.

    Web and desktop chat (claude.ai):

    • Opus 4.7, Opus 4.6, Sonnet 4.6 — 500K tokens on all paid plans
    • All other models — 200K tokens on paid plans

    Claude Code:

    • Opus 4.7 — 1M tokens on Pro, Max, Team, and Enterprise
    • Sonnet 4.6 — 1M tokens on all paid plans, but extra usage must be enabled to access it (except on usage-based Enterprise plans)

    Claude API:

    • Opus 4.7, Opus 4.6, Sonnet 4.6 — 1M tokens at standard pricing (no long-context premium)
    • All other models — 200K tokens

    The practical translation: if you need the full 1M token window, use Claude Code or the API with one of the supported models. The web chat tops out at 500K even on the most capable models. That difference matters when you’re trying to feed Claude an entire codebase, a long video transcript, or a multi-document research bundle.

    File upload size: 30 MB per file, with workarounds

    Per Anthropic’s Help Center, the maximum file size for both uploads and downloads is 30 MB per file. This applies whether you’re uploading a PDF, a CSV, an image, or any other supported file type.

    For PDFs larger than 30 MB, Anthropic’s documentation notes that Claude can process them through its computing environment without loading them into the context window. That’s a real workaround for big PDFs but it doesn’t help you for other large file types.

    If you regularly hit the 30 MB cap, the practical patterns are:

    • Split before upload — break the file into chunks under 30 MB, upload each, work with them as separate sources
    • Convert format — a 35 MB Word doc with embedded images may compress to under 30 MB as a PDF; CSVs can often be reduced by removing unused columns
    • Upload to GCS or S3 and let Claude read via tools — for the Agent SDK / API path, you can put the file in cloud storage and have Claude read it via web fetch or a custom tool, bypassing the upload cap entirely

    Usage limits: per-user, not pooled

    This is the limit that confuses teams the most. Per Anthropic’s Help Center documentation on the Team plan (verified May 15, 2026): each team member has their own set of usage limits. They are not shared across the team.

    If one teammate burns through their session limit, the rest of the team is unaffected. There is no pooled team allowance that one user can drain on behalf of others. The math is per-seat, always.

    The usage limits themselves vary by seat type:

    • Standard Team seats — 1.25x more usage per session than Pro plan. One weekly usage limit applies across all models. Resets seven days after the session starts.
    • Premium Team seats — 6.25x more usage per session than Pro plan. Two weekly limits: one across all models, plus a separate one for Sonnet models specifically. Both reset seven days after session start.

    For the actual numeric token-per-session limits, Anthropic does not publish exact numbers — they describe relative multipliers vs Pro. This is intentional; the underlying math is calibrated against typical workloads rather than a hard token ceiling.

    Extra usage: what happens when you hit the cap

    When a user hits their weekly limit, two things can happen depending on whether the organization has enabled extra usage:

    If extra usage is enabled: additional Claude requests continue to flow at standard API rates (the same per-token pricing published on Anthropic’s pricing docs — $5/$25 MTok for Opus 4.7, $3/$15 for Sonnet 4.6, $1/$5 for Haiku 4.5). Extra usage is billed separately from the subscription. Team and Enterprise admins can enable, cap, and monitor extra usage at the organization level.

    If extra usage is not enabled: the user’s Claude requests stop until their limit resets at the start of the next session window (seven days from when the current session started, not a fixed weekly day).

    The right setting depends on your team’s tolerance for surprise bills versus interrupted workflows. Most production teams enable extra usage with a hard organizational cap so individual users have continuity but the org has predictable spend ceiling.

    Claude Code limits: a separate model

    Claude Code has its own usage limit accounting that exists alongside chat usage limits. Per Anthropic’s Help Center on Claude Code models, usage, and limits (verified May 15, 2026):

    • Interactive Claude Code (typing in terminal/IDE) draws from your subscription’s usage limits, the same pool as web chat
    • Non-interactive claude -p mode currently also draws from subscription usage limits — until June 15, 2026
    • Starting June 15, 2026, non-interactive mode and Agent SDK usage move to a separate per-user monthly Agent SDK credit pool

    The June 15 change is important enough that it gets its own breakdown in our Agent SDK Dual-Bucket Billing article. The short version: if you’re running unattended Claude Code work in cron jobs or CI, your billing model is changing. Plan capacity against the new credit pool.

    The limits that aren’t really limits

    Three things that get reported as limits but are actually configuration choices:

    “My context window keeps filling up.” This is usually caused by long-running conversations accumulating history rather than the model’s actual context window being too small. Starting a new conversation (or running /clear in Claude Code) resets the working context. Long sessions are not a hard limit; they are a working-memory pressure that compounds over turns.

    “Claude won’t read my whole repository.” Repository size is rarely the actual limit; the limit is how much you can load into the context window at once. Tools like Claude Code’s file reading and search work around this by loading files on demand rather than upfront. The 1M context window helps but is not a substitute for selective loading.

    “My team keeps hitting limits even though we’re on Team.” Almost always one of two things: (a) people are mistakenly assuming the seat allowance is shared, when it’s strictly per-user; (b) someone is running heavy automation through a subscription seat instead of a Claude Developer Platform API key (which is the recommended path for sustained team-wide automation, especially after June 15).

    Decision matrix: which limits affect which use case

    Map your use case to the limits that actually apply:

    • Solo chat user on Pro — 500K context on Opus 4.7/4.6/Sonnet 4.6 in chat, weekly session limit, 30 MB upload cap. Hit your limit and you wait or pay extra usage.
    • Solo developer using Claude Code — 1M context on Opus 4.7 (1M on Sonnet 4.6 with extra usage on). Same weekly session limit. June 15 billing change applies if you use claude -p.
    • Small team on Team Standard — Per-seat limits at 1.25x Pro session capacity, not pooled. 30 MB upload cap. June 15 billing change applies per-seat.
    • Team running Claude Code in CI — All of the above plus separate Agent SDK credit pool starting June 15. Strongly consider a Developer Platform API key for the CI workload to get true pay-as-you-go billing.
    • Enterprise running large-scale automation — Subscription limits are the wrong tool. Move to a Developer Platform API key, monitor usage at the org level, set spend caps in the Console.

    What to actually do this week

    1. Identify which surface you’re using Claude through (web, Claude Code, API). Different surfaces have different context windows even for the same model.
    2. If you’re hitting “limit” errors, check whether extra usage is enabled at the organization level before assuming it’s a hard cap.
    3. If you’re a Team admin and your team is reporting hitting limits, audit per-seat usage rather than assuming you need to upgrade the plan — the issue is often one heavy user, not the plan tier.
    4. If anyone on your team is running unattended Claude work, read the Agent SDK billing change before June 15.
    5. If you need the full 1M context window, switch to Claude Code or the API. Web chat tops out at 500K.
    6. For uploads larger than 30 MB, split, compress, or move the file to cloud storage and have Claude read it via tools.

    Frequently Asked Questions

    Is the Claude Team plan usage limit shared across team members?

    No. Per Anthropic’s Help Center documentation, each team member has their own set of usage limits. If one team member reaches their seat’s included limit, other team members are unaffected and can keep working.

    What is Claude’s file upload size limit?

    30 MB per file for both uploads and downloads, per Anthropic’s official documentation. For PDFs larger than 30 MB, Claude can process them through its computing environment without loading them into the context window.

    Where does the 1M token context window actually apply?

    1M context is available on Claude Code with Opus 4.7 (Pro/Max/Team/Enterprise) and on the API with Opus 4.7, Opus 4.6, and Sonnet 4.6. Web chat tops out at 500K tokens even on the most capable models. Sonnet 4.6 in Claude Code requires extra usage to be enabled to access the 1M window (except on usage-based Enterprise plans).

    What’s the difference between Standard and Premium Team seats?

    Standard seats offer 1.25x Pro plan usage per session with one weekly limit across all models. Premium seats offer 6.25x Pro session usage with two weekly limits (one across all models, one Sonnet-specific). Both reset seven days after the session starts.

    What happens when I hit my Claude usage limit?

    If extra usage is enabled at your organization, you continue at standard API rates billed separately. If extra usage is not enabled, your requests stop until your limit resets at the next session window (seven days from session start, not a fixed weekly day).

    Should I use a Team plan or the API for production automation?

    For sustained shared automation (CI pipelines, cron jobs, background services), Anthropic recommends the Claude Developer Platform with an API key over subscription seats. Subscription seats are sized for individual interactive use; API keys give you predictable pay-as-you-go billing, no per-seat caps, and don’t compete with team members’ interactive usage.

    Related Reading

    How we sourced this

    Sources reviewed May 15, 2026:

    • Anthropic Help Center: Understanding usage and length limits, What is the Team plan?, How is my Team plan bill calculated?, Manage extra usage for Team and seat-based Enterprise plans, Models, usage, and limits in Claude Code, How large is the context window on paid Claude plans?, How large is the Claude API’s context window?, Upload files to Claude (primary sources for all limit specifics)
    • Anthropic platform documentation: Context windows at docs.claude.com (primary source for API context window behavior)
    • Anthropic Help Center: Use the Claude Agent SDK with your Claude plan (primary source for the June 15, 2026 billing change)

    All limit numbers and policies are accurate as of May 15, 2026. Anthropic adjusts subscription mechanics regularly; if you’re making procurement decisions on this article more than 60 days from the date stamp, re-verify the per-seat multipliers and context window availability against the current Help Center.

  • Notion Developer Platform Launch (May 13, 2026): What Changes for Claude Users and the Three-Legged Stack

    Notion Developer Platform Launch (May 13, 2026): What Changes for Claude Users and the Three-Legged Stack

    Last refreshed: May 15, 2026

    The Three-Legged Stack: Claude, Notion, GCP - walnut stool with copper, porcelain, and steel legs representing the Tygart Media AI operating stack
    Notion’s May 13 Developer Platform launch reshapes how the Notion + Claude + GCP stack fits together.

    On May 13, 2026, Notion shipped what is, structurally, the biggest change to how Notion fits into an AI-driven operating stack since the original Notion AI launch. Version 3.5 — the Notion Developer Platform — turns Notion from a workspace that you operate into a platform that other agents can operate inside. Claude is one of the launch partners.

    This article is written from inside the practice of running a business on a three-legged stool of Notion, Claude, and Google Cloud. The Developer Platform launch matters to that stool in specific ways, and most of the day-one coverage is missing them. The goal here is to pin down what shipped, what it actually changes for an operator who already runs Claude against Notion, and where the seams are between this platform and the way most of us were already wiring things together.

    What Notion actually shipped on May 13, 2026

    From Notion’s own release notes for version 3.5 (verified May 15, 2026), the Developer Platform comprises four meaningfully distinct pieces:

    Workers. A cloud-based runtime that runs custom code inside Notion’s infrastructure. Workers is how you take a Notion-resident workflow and bind real compute to it — running on a schedule, reacting to a database trigger, fanning work out to other systems — without standing up your own infrastructure for the runtime.

    Database sync. Notion databases can now pull live data from any API-enabled external source. The thing that used to require a Zapier or Make.com bridge becomes a property of the database itself.

    External Agents API. The piece that matters most for Claude users: an outside AI agent can appear and operate inside the Notion workspace as a first-class collaborator. Claude is one of the launch partners, alongside Cursor, Codex, and Decagon.

    Notion CLI. A command-line tool through which both developers and agents interact with the platform. Available across all plan tiers.

    The packaging detail worth noting: Workers and the Developer Platform deployment surface are limited to Business and Enterprise plans, but the External Agents API and the CLI are available on all tiers. The whole platform is free to use through August 11, 2026.

    The shift in framing, in operator terms

    Before May 13, the standard pattern for getting Claude to work with Notion looked like this: install a Notion MCP server, point Claude Code at it, and use Claude as the active driver that reads from and writes to Notion through tool calls. Notion was the database, Claude was the agent, MCP was the wire.

    After May 13, the relationship can flip. The External Agents API lets Claude appear inside Notion — not as an external tool you switch to, but as a collaborator your team can assign work to from the same task board where you assign work to humans. The wire is no longer “Claude reaches into Notion when called.” It’s “Notion can hand work to Claude the same way it can hand work to a person.”

    For an operator running a second-brain architecture, that’s a meaningful change. It moves Claude from a tool you invoke into a participant your system operates against. Both modes are still available — MCP wiring still works fine — but the External Agents API opens a different set of patterns where the system of record stays in Notion and Claude becomes one of several agents that the system orchestrates.

    Where this fits in the three-legged stool

    For anyone running Notion + Claude + Google Cloud as the operating stack of a small business or solo operator setup, the Developer Platform launch reinforces something the architecture was already pointing at: Notion is the system of record, Claude is the reasoning layer, GCP is the compute and data substrate. The May 13 launch makes that division of labor more legible.

    • Notion as system of record — Workers and database sync make Notion an active control plane, not just a passive document store. State lives here. Workflows initiate here.
    • Claude as reasoning layer — The External Agents API gives Claude a formal role inside Notion’s task management, planning, and review loops. Claude does the thinking; Notion holds the result.
    • GCP as compute substrate — Anything Workers can’t do (long-running automation, heavy compute, custom data pipelines, things that need to live behind a firewall), Cloud Run and Compute Engine still handle. Workers doesn’t replace GCP for the operations that need real horsepower; it extends Notion into the lightweight automation gap that previously required a Zapier-class bridge.

    The leg that grows the most from this launch is Notion. It picks up native automation and native AI-agent orchestration in one shipment. The leg that doesn’t change is Google Cloud — GCP is still where the heavyweight workloads live, the per-site WordPress fortresses run, and the custom Python and Node services that hold the operational glue together.

    The Claude-specific implications

    Anthropic’s customer page on Notion (verified May 15, 2026) confirms that Notion has integrated Claude Managed Agents — the version of Claude designed for long-running sessions with persistent memory and high-quality multi-turn outputs. Notion has also made Claude Opus available inside Notion Agent for the first time as part of the broader integration. The framing from Anthropic’s side: Notion is a design partner that helped shape Claude Code’s early development, and the External Agents API is the formal extension of that partnership into the Notion product surface.

    Practically, three things change for someone who already runs Claude against Notion:

    1. The MCP wiring is no longer the only path. If you’ve been using a Notion MCP server to give Claude Code read-write access to your workspace, that pattern still works and still has its place — particularly for developer workflows where Claude is doing the driving. But for operational workflows where Notion should drive and Claude should respond, the External Agents API is now the more natural fit.

    2. Multi-agent orchestration becomes a first-class concept. When Notion can address Claude, Cursor, Codex, and Decagon as discrete agents, the question stops being “which AI tool do I use” and becomes “which agent gets which task.” That’s a richer surface for actually distributing work across capabilities — Cursor for IDE-bound coding, Claude for long-form reasoning and writing, Decagon for customer-facing workflows. The orchestration sits in Notion.

    3. The persistent-memory pattern gets cleaner. The “Notion as Claude’s memory” architecture that we and others have been building with MCP wiring is now a supported, native pattern rather than a clever workaround. The structured pages, databases, and templates that hold what Claude needs to remember between sessions can now be addressed through a sanctioned API rather than reverse-engineered through tool calls.

    What we’d actually rebuild now

    If we were starting our second-brain architecture from scratch on May 15, 2026, knowing what shipped on May 13, the build order would be different than what we have today:

    • Database structures stay in Notion — same as before. The systems of record (clients, projects, content pipelines, scheduled tasks, the Promotion Ledger) all live in Notion databases.
    • Sync replaces a meaningful chunk of Zapier/Make — anywhere we currently bridge Notion to an external API for read or for write, native database sync becomes the first thing to try before reaching for a third-party automation tool.
    • Workers handles light recurring automation — the kind of thing we currently run as a Cloud Run cron job, where the trigger and state both live in Notion. Workers is closer to the data and easier to reason about for operators who don’t want to context-switch out of Notion.
    • External Agents API for Claude orchestration — Claude assignments come from inside Notion’s task surfaces. The Promotion Ledger, the editorial calendar, the client deliverable boards all become places where Claude can be assigned the same way a teammate is assigned.
    • GCP holds everything that’s heavyweight or sensitive — WordPress fortresses, custom data pipelines, anything HIPAA/regulated, the AI Media Architect run on Cloud Run, the knowledge-cluster-vm. None of this moves. Notion’s platform doesn’t compete here.

    The honest part: most of our existing infrastructure is staying. The Developer Platform launch isn’t a “rebuild everything” moment. It’s a “reach for Notion-native first when the workflow naturally lives in Notion anyway” moment. Where we used to glue together MCP servers, Zapier flows, and custom Cloud Run jobs to bridge gaps, the gaps are smaller now.

    The seams worth noticing

    Three things to be honest about:

    Workers is plan-gated. If you’re on a Notion plan below Business, you can use the External Agents API and the CLI but not Workers. The full programmable-platform vision requires the upgrade. For solo operators on Plus or below, this is a real friction point.

    The free-through-August window is a usage signal, not a permanent state. The Developer Platform is free through August 11, 2026. Notion has not yet published post-window pricing. Anyone building production workloads against the platform should plan for the possibility of a usage-based or tier-gated pricing model after that date.

    External Agents is a launch-partner-first model. Claude Code, Cursor, Codex, and Decagon are first-class. Other agents — and there will be other agents — show up later through the API. If your stack depends on an agent that isn’t on the launch partner list, the surface for integrating it is smaller right now than it will be in a few months.

    What to actually do this week

    If you’re running Claude against Notion in any operational capacity:

    1. Read Notion’s official release notes for 3.5 (notion.com/releases/2026-05-13). It’s short and concrete.
    2. Try the Notion CLI on a non-production workspace. The CLI is the lowest-friction way to feel what’s actually changed.
    3. If you have any workflow currently glued together with Zapier/Make where the trigger and state are both in Notion, evaluate whether database sync or Workers replaces it more cleanly.
    4. If you currently invoke Claude through MCP for tasks that would more naturally be assigned to Claude from inside Notion’s task boards, prototype the same workflow through the External Agents API and compare.
    5. Don’t migrate anything you don’t have to. The May 13 launch creates new options, not new mandates.

    Frequently Asked Questions

    What is the Notion Developer Platform?

    It’s the May 13, 2026 release (Notion 3.5) that adds Workers (cloud-based runtime), database sync (live data from external APIs), an External Agents API (outside AI agents operating natively inside Notion), and a Notion CLI. It turns Notion from an application you use into a platform you and your agents can build on.

    Is Claude one of the launch partners?

    Yes. Per Notion’s release notes and Anthropic’s customer page on Notion (both verified May 15, 2026), Claude is a launch partner alongside Cursor, Codex, and Decagon. Notion has also integrated Claude Managed Agents and made Claude Opus available inside Notion Agent.

    How is the External Agents API different from connecting Claude through MCP?

    MCP wiring lets Claude reach into Notion through tool calls — Claude is the driver, Notion is the data source. The External Agents API lets Claude appear inside Notion as a collaborator that can be assigned work — Notion is the driver, Claude is one of several agents responding. Both patterns coexist. Pick the one that matches who should be in charge of the workflow.

    What does the Developer Platform cost?

    Free through August 11, 2026, per Notion’s release notes. Workers and the deploy surface are limited to Business and Enterprise plans; the External Agents API and CLI are available across all tiers. Post-window pricing has not been published as of May 15, 2026.

    Does this replace MCP servers?

    No. MCP servers remain useful — particularly for developer workflows where Claude is doing the driving from inside Claude Code, and for cases where you need Claude to talk to multiple systems (not just Notion). The External Agents API adds an alternative pattern for the cases where Notion should hold the workflow and Claude should respond to it.

    Should I move workloads off Google Cloud onto Notion Workers?

    For most things, no. Workers is suited to lightweight, Notion-native automation. Heavy compute, regulated workloads, custom data pipelines, and anything that needs to live behind a firewall still belong on GCP (or your equivalent cloud). The Developer Platform extends what Notion can do natively; it doesn’t replace what cloud infrastructure does.

    Related Reading

    How we sourced this

    Sources reviewed May 15, 2026:

    • Notion official release notes: May 13, 2026 – 3.5: Notion Developer Platform at notion.com/releases/2026-05-13 (primary source for what shipped, pricing window, plan-tier gating)
    • Anthropic customer page on Notion at claude.com/customers/notion (primary source for Claude Managed Agents integration, Opus availability in Notion Agent, design-partner relationship)
    • TechCrunch coverage of the May 13 launch (Tier 2 confirming source for partner agent list and “control room for AI agents” framing)
    • InfoWorld coverage of the Notion Developer Platform launch (Tier 2 confirming source)
    • BetaNews and Dataconomy coverage (additional Tier 2 confirming sources)

    This article will need a refresh after August 11, 2026, when the free-pricing window ends and Notion publishes post-window pricing details. The verified-vs-reported standard from our other May 2026 pieces applies — anything beyond what Notion’s own release notes and Anthropic’s customer page confirm has been clearly distinguished.

  • Claude Models Roadmap May 2026: Opus 4.7, Knowledge Cutoffs, the 1M Context Window, and What’s Real About Claude 5

    Claude Models Roadmap May 2026: Opus 4.7, Knowledge Cutoffs, the 1M Context Window, and What’s Real About Claude 5

    Last refreshed: May 15, 2026

    The pace of new Claude releases in 2026 has been fast enough that the canonical question — “what’s the latest Claude model and what’s it actually good for?” — has a different answer almost every quarter. This article is the current map, dated and sourced, of what Anthropic has shipped in 2026, what’s confirmed about each model’s specs and knowledge cutoffs, and what’s been claimed (but not officially confirmed by Anthropic) about what’s coming next.

    Two ground rules first, because the model-roadmap space is full of speculation:

    • Specs and release dates marked as verified come from Anthropic’s own documentation, news posts, or help center pages. We list the specific source.
    • Anything marked as reported or claimed comes from third-party reporting (TechCrunch, secondary news sites, analyst commentary) that we could not independently confirm against an Anthropic-published source as of May 15, 2026.

    If you’re making product decisions on this information, treat verified facts as actionable and reported facts as directional.

    The current generally-available Claude models (May 15, 2026)

    From Anthropic’s official models overview and pricing pages, the current production Claude lineup is:

    Claude Opus 4.7claude-opus-4-7

    • Status: Generally available, currently the most capable Claude model
    • Context window: 1 million tokens at standard pricing (no long-context premium)
    • Max output: 128,000 tokens
    • Knowledge cutoff: January 2026 (per Anthropic Help Center, verified May 15, 2026)
    • Pricing: $5/MTok input, $25/MTok output (base rates)
    • Notable changes from 4.6: New tokenizer (uses up to ~35% more tokens for the same text), high-resolution image support up to 2576px / 3.75MP, new xhigh effort level, task budgets beta. Extended thinking budgets and sampling parameters (temperature, top_p, top_k) are removed.

    Claude Opus 4.6 — Still generally available, $5/MTok input, $25/MTok output. Released February 2026.

    Claude Sonnet 4.6 — $3/MTok input, $15/MTok output. Includes the 1M token context window at standard pricing.

    Claude Haiku 4.5 — Cheapest model in the active lineup at $1/MTok input, $5/MTok output.

    Earlier models still active or in deprecation: Opus 4.5, Opus 4.1, Sonnet 4.5, and Haiku 3.5 (retired except on Bedrock and Vertex AI). Opus 4 and Sonnet 4 are listed as deprecated.

    Knowledge cutoff dates that actually matter

    Per Anthropic’s Help Center article on training-data recency (verified May 15, 2026), the most recent generally-available models have January 2026 knowledge cutoffs. That means:

    • Anything that happened after January 2026 is outside the model’s training data
    • For current events, recent product launches, recent legal or regulatory changes, or very recent technical documentation, the model needs to be given the information directly (in the prompt, via web search, or through tool use) — it can’t be relied on to know it
    • The model still has tools available (web search, code execution, file access) that can access post-cutoff information when explicitly invoked

    The practical version: don’t ask Claude what happened last week and expect it to know. Hand it the source material and ask it to analyze, summarize, or work with what you’ve given it.

    The 1M token context window — what it actually unlocks

    Per Anthropic’s official pricing documentation (verified May 15, 2026), Opus 4.7, Opus 4.6, and Sonnet 4.6 all include the full 1 million token context window at standard pricing. There’s no long-context premium — a 900,000-token request is billed at the same per-token rate as a 9,000-token request.

    That’s an enormous practical change from earlier Claude generations. A 1M context window is roughly:

    • ~750,000 words of English text
    • Most full books or technical specifications in a single context
    • ~8 hours of meeting transcripts at typical density
    • An entire mid-sized codebase, including most or all source files

    Prompt caching and batch processing discounts both apply at standard rates across the full 1M window. For workloads that involve sending the same large document repeatedly with different questions, prompt caching against a 1M context is one of the highest-leverage cost optimizations available in the current Claude lineup.

    What’s reported about Claude 5 (and what we cannot independently verify)

    Multiple third-party sources reported in early 2026 that Anthropic CEO Dario Amodei confirmed a Q2 2026 launch window for Claude 5 in a TechCrunch interview published February 1, 2026. The same sources cited an internal-roadmap leak suggesting an April 28 target date.

    What we can verify as of May 15, 2026:

    • Anthropic’s official model lineup, news page, and platform documentation list the latest production models as Opus 4.7 and earlier 4.x variants. Anthropic has not, to our review, published an official “Claude 5” launch announcement on its anthropic.com news page or its docs.claude.com release notes as of this date.
    • The third-party reporting on Claude 5 specifications (500K context window, 20-25% benchmark improvements, ~90%+ on SWE-bench Verified) is widely repeated but, as far as we could verify, is not sourced to an Anthropic-published document.

    The honest read: Q2 2026 ends June 30, so if the reported timeline is accurate, an official Claude 5 announcement could plausibly land in the next several weeks. If you’re planning a project that depends on a specific Claude 5 capability, build against current Opus 4.7 first and treat any Claude 5-specific work as speculative until Anthropic publishes official model details.

    Claude Sonnet 5 — separate question

    Some 2026 third-party reporting refers to “Claude Sonnet 5” launching in early February 2026 under an internal codename. We could not, in our May 15, 2026 review, find this model listed in Anthropic’s official models overview, pricing page, or release notes — only Sonnet 4.6 and earlier Sonnet variants are listed as currently available models. If “Sonnet 5” was a real intermediate release, it does not appear in Anthropic’s current public model documentation under that name.

    Two possibilities to consider, neither of which we can confirm: the reported Sonnet 5 may have been folded into the broader 4.x lineup under a different name, or the reporting may have been speculative or premature. If you’re tracking model identifiers for production use, only model IDs published in Anthropic’s documentation (such as claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5) are guaranteed to be valid against the API.

    How to actually keep up with Claude releases

    The signal-to-noise ratio in the model-release coverage space is not great. Two practical sources are reliable enough to bookmark:

    • Anthropic’s news page at anthropic.com/news — first-party launch announcements with full model details
    • Claude API release notes at the Help Center release-notes page — concise, dated, version-specific

    For breaking changes that affect production code, the Anthropic platform documentation publishes per-version “What’s new” pages (Opus 4.7’s, for example, lists every API breaking change at launch). Those are the canonical reference for migration work.

    For everything else — analyst commentary, predictions, leak coverage — treat it as commentary, not as fact.

    What this means for your work today

    Based on what is verifiable on May 15, 2026:

    • If you need the most capable Claude model available, use Opus 4.7. It has the largest context window, the highest knowledge cutoff (January 2026), and the strongest reported coding/agentic performance.
    • If you need cost-efficient production work, use Sonnet 4.6. Same 1M context, much lower per-token rates than Opus.
    • If you need cheap, fast, simple-task workloads, use Haiku 4.5.
    • If you’re planning around Claude 5, treat the timing as unconfirmed and build resilience into your code (don’t hard-code model IDs that don’t exist yet).
    • For knowledge cutoff-sensitive use cases (current events, recent regulatory data, post-January 2026 news), always provide the information directly or use tool calls — don’t rely on training data alone.

    Frequently Asked Questions

    What is the knowledge cutoff for Claude Opus 4.7?

    January 2026, per Anthropic’s Help Center documentation verified May 15, 2026. Information about events, products, or developments after that date is not in the model’s training data and must be provided directly.

    What is the largest Claude context window currently available?

    1 million tokens, available on Opus 4.7, Opus 4.6, and Sonnet 4.6 at standard pricing with no long-context premium.

    Has Anthropic officially announced Claude 5?

    As of May 15, 2026, we could not locate an Anthropic-published announcement of a Claude 5 model on anthropic.com or docs.claude.com. Multiple third-party sources have reported a Q2 2026 launch window based on a TechCrunch interview with Dario Amodei, but we could not independently confirm those specifications against a primary source.

    Is Claude Sonnet 5 a real model I can use?

    As of May 15, 2026, “Claude Sonnet 5” does not appear in Anthropic’s official models overview or pricing documentation. The currently available Sonnet model is Claude Sonnet 4.6 (model ID claude-sonnet-4-6). Earlier reports of a Sonnet 5 release were not confirmed against an Anthropic-published source in our review.

    Why does Opus 4.7 use more tokens than Opus 4.6 for the same text?

    Opus 4.7 ships with a new tokenizer that contributes to its improved performance but uses approximately 1x to 1.35x as many tokens for the same input text compared to previous models. Anthropic recommends increasing max_tokens headroom and adjusting compaction triggers accordingly.

    Are sampling parameters (temperature, top_p, top_k) still supported on Opus 4.7?

    No. Setting temperature, top_p, or top_k to any non-default value on Opus 4.7 returns a 400 error. Migration guidance: omit these parameters and use prompting to guide the model’s behavior.

    Related Reading

    How we sourced this

    Sources reviewed May 15, 2026:

    • Anthropic Pricing Documentation: docs.claude.com/en/docs/about-claude/pricing (primary source for model lineup, per-token rates, context window pricing)
    • Anthropic Platform Documentation: What’s new in Claude Opus 4.7 (primary source for Opus 4.7 features, breaking changes, tokenizer, image support, task budgets)
    • Anthropic Help Center: How up-to-date is Claude’s training data? (primary source for knowledge cutoff dates)
    • Anthropic news page (primary source check for Claude 5 announcement — none located as of May 15, 2026)
    • Third-party reporting on Claude 5 / Sonnet 5 (TechCrunch interview reports, Claude5.com, Fello AI, WaveSpeed Blog) — cited as reported but not independently confirmed against primary sources

    This article applies the verified vs. reported distinction throughout. If any of the unverified third-party claims are confirmed by Anthropic in the weeks after this article’s date stamp, the relevant sections should be updated to reflect the new primary-source documentation.

  • Claude MCP Token Cost Reality: Why Your Model Context Protocol Setup Is Burning 18,000 Tokens Per Turn

    Claude MCP Token Cost Reality: Why Your Model Context Protocol Setup Is Burning 18,000 Tokens Per Turn

    Last refreshed: May 15, 2026

    If you’ve ever connected a few Model Context Protocol (MCP) servers to Claude Code and watched your usage limit drain faster than the work you actually did would explain, you’re not imagining it. There’s a real, documented, and sometimes substantial token cost to wiring MCP servers into your Claude environment — and most setup guides don’t mention it.

    The short version: each MCP server you connect injects its complete tool schema into the context of every message you send. Multiple servers stack. The total overhead can range from a few thousand tokens for a single server up to roughly 18,000 tokens per turn when you’re running a typical multi-server developer setup. Anthropic’s own engineering team has acknowledged this in a public GitHub issue and shipped optimizations to reduce it.

    This article walks through where the overhead actually comes from, how to measure your own setup, what Anthropic has changed in 2026 to ease the cost, and the concrete steps you can take to keep MCP useful without burning through your token budget.

    What MCP actually is, briefly

    The Model Context Protocol is an open standard created by Anthropic that lets Claude (and other LLMs that adopt the standard) connect to external tools and data sources through a common interface. Instead of writing a custom integration for every API or database you want Claude to access, you point Claude at an MCP server, and the server exposes its capabilities — file access, Slack messages, GitHub repos, database queries — in a format Claude can use.

    It’s a real productivity unlock. It’s also why the token math gets complicated.

    Where the token cost comes from

    When you connect an MCP server to Claude Code (or any MCP-aware client), three things happen on every message:

    1. Tool schema injection. Every tool the server exposes — every name, every description, every parameter definition — is included in the context Claude sees. A Slack MCP server with 10–15 tools typically adds about 2,000 tokens. A GitHub server is heavier. A custom internal-tooling server with verbose descriptions can run 5,000–8,000 tokens on its own.

    2. Tool-use system prompt overhead. Anthropic’s documentation confirms that whenever tools are present in a request, a special system prompt is automatically prepended that teaches the model how to use tools. For Claude 4.x models with tool_choice: auto, that’s an additional 346 tokens per request. The bash tool adds 245. The text editor tool adds 700. The computer-use tool adds 735 plus a 466–499 token system prompt extension.

    3. Stateless re-sending. Each message in a conversation is a fresh API request that includes the full conversation history plus the full tool schema. Claude does not “remember” your tools from the last turn the way a human remembers a colleague’s job description. Every turn pays the schema cost again.

    That’s the math. Now multiply by the number of MCP servers you have connected. A developer running Slack + GitHub + a database connector + an internal custom server can easily land in the 15,000–20,000 tokens-per-turn range — and that’s before you’ve typed your actual question.

    The 18,000-token figure, sourced

    The “up to 18,000 tokens per turn” number comes from a combination of public sources verified May 15, 2026:

    • Anthropic’s own GitHub repo for Claude Code, issue #3406, titled “Built-in tools + MCP descriptions load on first message causing 10–20k token overhead.” Anthropic engineers acknowledged the issue and have shipped progressive optimizations against it.
    • Independent analysis by MindStudio measuring real Claude Code sessions with multiple MCP servers attached.
    • Anthropic’s official Claude Code documentation on cost management explicitly recommends running /mcp to inspect connected servers and disabling unused ones to control token consumption.

    The exact number for your setup will be different. The shape of the problem is the same.

    Why this matters more than it looks

    Claude’s standard context window is 200,000 tokens. Losing 18,000 of those to tool definitions before you start typing represents about 9% of your effective working space. That’s a real ceiling cost — but it’s not the part that hurts most.

    The part that hurts is the cumulative bill. If you’re on a Claude subscription with a usage limit, every turn through Claude Code is paying the full schema cost again. A workflow that takes 30 turns of back-and-forth burns 540,000 tokens worth of tool definitions across that session — even if the tool descriptions never change. On the API at standard Sonnet 4.6 rates, that’s about $1.62 in pure schema overhead per session, before any of the actual work gets billed.

    Multiply by a team of engineers running Claude Code daily, and the overhead becomes the largest single line item in your token spend.

    What Anthropic has changed in 2026

    Anthropic has shipped two meaningful optimizations against MCP token bloat over the past few months:

    Deferred tool loading. In recent Claude Code releases, MCP tool definitions are no longer all loaded into context at the start of a session by default. Tool names enter context, but the full schemas only load when Claude actually invokes a particular tool. This is a substantial improvement for sessions where you have many tools available but only use a few.

    Tool Search. A new built-in search mechanism lets Claude discover relevant MCP tools on demand rather than carrying them all in context. One independent measurement reported a Claude Code MCP context cut of 46.9% — from roughly 51,000 tokens down to 8,500 tokens — by using Tool Search instead of full upfront loading.

    These optimizations help, but they don’t make the overhead zero. The baseline cost of having any MCP server connected at all is real, and you still pay it on every turn even with deferral active.

    How to measure your own MCP token cost

    Two practical methods work for most setups:

    Method 1 — The /mcp command. In Claude Code, run /mcp to see every server currently connected. For each one, check how many tools it exposes. Anthropic’s documentation explicitly recommends this as the first step to controlling MCP costs.

    Method 2 — Token-count delta. Send a single message in Claude Code with no MCP servers connected and note the input token count from the API response. Reconnect your MCP servers one at a time. The delta in input tokens between configurations is the per-turn cost of each server. This is the most precise way to know your own number.

    Anything north of about 8,000 tokens per turn in pure MCP overhead is worth optimizing. North of 15,000 is a flag.

    Concrete steps to control MCP token cost

    • Disable MCP servers you aren’t actively using. The single highest-leverage move. If you connected a server two weeks ago for one experiment and never went back to it, every turn you’ve taken since has been paying for it.
    • Prefer CLI tools over MCP servers when both exist. Anthropic’s own cost-management guidance notes that tools like gh, aws, gcloud, and sentry-cli remain more context-efficient than equivalent MCP servers because they don’t add per-tool listing overhead. Claude can simply invoke them via the bash tool.
    • Use MCP gateways for large server counts. If you genuinely need many tools available, gateway products (Maxim, Milvus-backed setups, others) consolidate tools and surface only relevant ones per query, cutting net overhead substantially.
    • Run a complex CLAUDE.md audit. Long project-level CLAUDE.md files compound the per-turn baseline. Treat CLAUDE.md as an asset that’s expensive to keep verbose.
    • Watch for context compounding. In long Claude Code sessions, conversation history grows alongside the tool schema cost. If you’re running a workflow longer than 20 turns, periodically clear context (/clear) to reset the per-turn cost to baseline.

    Frequently Asked Questions

    Does every MCP server cost 18,000 tokens?

    No. The 18,000-token figure is for a typical multi-server setup with several connected servers and built-in tools active. A single small MCP server (5–10 tools, concise descriptions) might only add 1,500–3,000 tokens. The cost scales with the number of servers and the verbosity of their tool definitions.

    Why does Claude reload the tool definitions every turn?

    The Claude API is stateless. Every message is a fresh API request containing the full conversation history and the full tool schema. The model has no memory between requests, so the schema must be present every time tools could be used. Recent deferred-loading optimizations reduce this for unused tools, but anything Claude actually needs still loads each turn.

    How do I see what’s loaded in my Claude Code environment?

    Run /mcp in Claude Code to list every connected MCP server and its tool count. To check the actual token cost, send a test message and inspect the input token count returned by the API.

    Are CLI tools really cheaper than MCP servers?

    Yes, for tools that have both options. CLI tools accessed via the bash tool only add the bash tool’s 245-token overhead. An equivalent MCP server adds its full tool schema for every tool it exposes. For tools you use frequently, MCP can still be worth it for the structured interface; for tools you use rarely, CLI is more efficient.

    Does this affect Claude on the web (claude.ai) too?

    Web Claude does not use the same MCP server-connection model as Claude Code. The MCP token-overhead pattern primarily affects Claude Code, custom Agent SDK applications, and other developer-facing clients where you wire in MCP servers directly.

    Will this get better in future Claude releases?

    Likely. Anthropic has already shipped deferred tool loading and Tool Search in 2026, both of which materially reduce the per-turn overhead for unused tools. The architectural baseline (tools must be present in context to be invoked) is unlikely to change, but the practical cost should keep dropping as the deferred-loading optimizations mature.

    Related Reading

    How we sourced this

    Sources reviewed May 15, 2026:

    • Anthropic GitHub: anthropics/claude-code issue #3406, “Built-in tools + MCP descriptions load on first message causing 10-20k token overhead” (primary source for the overhead figure and Anthropic acknowledgment)
    • Anthropic Claude Code documentation: Connect Claude Code to tools via MCP and Manage costs effectively (primary source for /mcp command and CLI vs. MCP guidance)
    • Anthropic Pricing Documentation: tool-use system prompt token counts, bash/text-editor/computer-use overheads (primary source for the per-tool fixed costs)
    • Independent analysis: MindStudio (multiple Claude Code MCP measurements), Joe Njenga’s Tool Search 51K→8.5K measurement, Maxim and Scott Spence on optimization patterns (Tier 2 confirming sources)

    Token-cost numbers in this article are accurate as of May 15, 2026. Anthropic is shipping MCP optimizations regularly, so the practical overhead may be lower in your environment than what’s described here.

  • Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026 (And Why It Matters)

    Claude Agent SDK Dual-Bucket Billing: What Changes June 15, 2026 (And Why It Matters)

    Last refreshed: May 15, 2026

    If you’ve been running Claude Code’s claude -p command in production, kicking off background jobs through the Claude Agent SDK, or wiring the Agent SDK into a third-party app, the way you pay for that work is about to change.

    Starting June 15, 2026, Anthropic is splitting Claude subscription billing into two separate buckets: one for the things you do interactively (Claude.ai chat, Claude Code in your terminal, Claude Cowork), and a brand-new credit pool that only covers programmatic, autonomous, and SDK-driven work.

    This is a meaningful shift. It’s also one of the most under-explained changes Anthropic has made to subscription pricing this year. If you don’t know about it before June 15, you can find yourself with stopped automations, surprise overage charges, or both.

    This guide walks through exactly what’s changing, what the credits cover, what they don’t cover, what each plan gets, and how to plan for it before the cutover.

    The short version

    Claude subscription plans (Pro, Max, Team, Enterprise) currently have one shared usage limit. Whether you’re chatting with Claude on the web, using Claude Code in your terminal, or running unattended jobs through the Agent SDK, all of that draws from the same plan-level allowance.

    On June 15, 2026, Anthropic is separating those two modes of use:

    • Bucket 1 — Interactive use: Claude.ai chat, Claude Code in the terminal/IDE, Claude Cowork. Uses your existing subscription usage limits, exactly as before.
    • Bucket 2 — Agent SDK monthly credit: A separate, dollar-denominated credit pool. Funds the Claude Agent SDK, the claude -p non-interactive command, the Claude Code GitHub Actions integration, and any third-party app that authenticates via the Agent SDK.

    The two buckets do not commingle. Agent SDK work cannot draw from your interactive subscription limit, and interactive use cannot draw from your Agent SDK credit. If you exhaust your Agent SDK credit and don’t have extra usage enabled, your background jobs simply stop until the credit refreshes the following month.

    What each plan gets

    Here is the official monthly Agent SDK credit by plan, as published in Anthropic’s Help Center (verified May 15, 2026):

    • Pro: $20/month
    • Max 5x: $100/month
    • Max 20x: $200/month
    • Team — Standard seats: $20/month per seat
    • Team — Premium seats: $100/month per seat
    • Enterprise — usage-based: $20/month
    • Enterprise — seat-based Premium seats: $200/month

    Important detail buried in the announcement: Enterprise seat-based plans on Standard seats are not eligible to claim the Agent SDK credit at all. If you administer one of those plans and have engineers running automation, that’s a gap to plan around.

    What the credit covers (and what it doesn’t)

    Anthropic’s documentation is specific about what counts as Agent SDK use, so this is worth reading carefully.

    Covered by the credit:

    • Claude Agent SDK usage in your own Python or TypeScript projects
    • The claude -p command in Claude Code (non-interactive mode)
    • The Claude Code GitHub Actions integration
    • Third-party apps that authenticate with your Claude subscription through the Agent SDK

    Not covered (these still draw from your normal subscription limits):

    • Interactive Claude Code in your terminal or IDE
    • Claude conversations on web, desktop, or mobile
    • Claude Cowork
    • Other features that draw from extra usage

    The plain-English version: if a human is sitting at the keyboard waiting for the response, that’s interactive use. If a script kicks off the work and the result lands somewhere else later, that’s Agent SDK use.

    How the credit actually works in practice

    Five mechanics matter for budgeting:

    1. Per-user, never pooled. Each eligible user on a Team or Enterprise plan claims their own credit. There is no organization-level pool. Credits cannot be transferred between users, shared, or stockpiled across accounts.

    2. Refreshes monthly with the billing cycle. Whatever you don’t spend in a given month evaporates. Unused credits do not roll over.

    3. One-time opt-in. You claim your credit through your Claude account once. After that initial claim, it refreshes automatically each cycle.

    4. Drains first, before any other source. When an Agent SDK request fires, it pulls from your monthly credit before any other paid usage source kicks in. This is good — it means you actually use what you’ve already paid for.

    5. After the credit, requests either flow to extra usage or stop entirely. When your monthly credit hits zero, additional Agent SDK requests draw from extra usage at standard API rates — but only if you have extra usage enabled. If you haven’t enabled extra usage, your Agent SDK requests stop until the next refresh.

    That last point is the one most likely to bite teams. If you’re running a daily cron job through the Agent SDK and you don’t enable extra usage, the day your credit runs out is the day your automation goes silent — without obvious warning if you’re not watching the credit balance.

    Why Anthropic is doing this

    Anthropic frames this as separating individual experimentation from production automation. From the Help Center documentation: “The Agent SDK monthly credit is sized for individual experimentation and automation. Teams running shared production automation should use the Claude Developer Platform with an API key for predictable pay-as-you-go billing.”

    The translation: a single user’s $20 or $200 of Agent SDK credit was never going to cover a real production workload anyway. Anthropic is making explicit what was already true under the hood — that a subscription was a chat product, and serious unattended automation belongs on the API.

    What this also does, structurally, is protect interactive subscription users from getting their experience degraded by heavy autonomous workloads sharing the same pool. If you’ve ever hit a subscription rate limit during a normal chat session because something else on your account was burning tokens in the background, this change removes that failure mode.

    What you should do before June 15, 2026

    If you run any unattended Claude work (the most important group):

    Audit every place your subscription is being used by something other than a human at a keyboard. The big four to check:

    • claude -p commands in cron jobs, CI pipelines, or shell scripts
    • Claude Code GitHub Actions workflows
    • Custom Python or TypeScript projects using the Agent SDK
    • Any third-party tool that asks for “Sign in with Claude” — those go through the Agent SDK

    For each one, estimate dollar consumption per day at standard API rates. If the total approaches or exceeds your plan’s Agent SDK monthly credit, you have three options: enable extra usage to allow overage, move that workload to a Claude Developer Platform API key (more predictable for sustained loads), or downsize the workload itself.

    If you administer a Team or Enterprise plan:

    Eligible users on your team will receive an email with claim instructions before June 15, 2026. You don’t need to take action yourself, but it’s worth communicating internally that the credits are per-user, can’t be pooled, and that any team-wide automation should be on an API key, not on a subscription seat.

    If you’re a solo Pro or Max user who only chats with Claude:

    You probably don’t need to do anything. The split affects you only if you’re running scripts or background jobs. If you’ve never used claude -p or the Agent SDK directly, your interactive usage limits don’t change.

    Frequently Asked Questions

    What happens to my Agent SDK usage on June 14 vs. June 15, 2026?

    Before June 15, Agent SDK and claude -p usage counts against your subscription’s general usage limits. Starting June 15, that same usage no longer touches your subscription limits and instead draws from the new Agent SDK monthly credit pool. Your interactive Claude Code, web chat, and Cowork usage continues to work exactly as before.

    Can I share the Agent SDK credit across my team?

    No. Per Anthropic’s official documentation, “Credits are per-user. Each eligible user on your team claims their own credit. Credits can’t be pooled, transferred, or shared across the organization.” If your team needs shared automation budget, the Claude Developer Platform with an API key is the recommended path.

    Do unused Agent SDK credits roll over?

    No. Unused credits expire at the end of each billing cycle and do not carry into the next month.

    What happens if I run out of Agent SDK credit mid-month?

    If you have extra usage enabled, additional requests flow to extra usage at standard API rates (the same per-token prices listed in Anthropic’s pricing documentation). If extra usage is not enabled, your Agent SDK requests stop until your credit refreshes at the start of the next billing cycle.

    Does this affect Claude API customers using their own API key?

    No. If you authenticate with the Agent SDK using a Claude Developer Platform API key, nothing changes. Pay-as-you-go billing continues, and you do not receive an Agent SDK monthly credit. The credit only applies to subscription-authenticated Agent SDK use.

    Is interactive Claude Code in my terminal still covered by my subscription?

    Yes. Interactive Claude Code (typing commands and getting responses in your terminal or IDE) continues to draw from your subscription usage limits exactly as before. Only the non-interactive claude -p mode and direct Agent SDK calls move to the new credit pool.

    What’s the dollar value of the credit on each plan?

    As of May 15, 2026: Pro $20, Max 5x $100, Max 20x $200, Team Standard $20/seat, Team Premium $100/seat, Enterprise usage-based $20, Enterprise seat-based Premium $200. Enterprise seat-based Standard seats do not receive a credit.

    Related Reading

    How we sourced this

    Every factual claim in this article was triple-checked across the following sources, all reviewed on May 15, 2026:

    • Anthropic Help Center: Use the Claude Agent SDK with your Claude plan (primary source for credit amounts, eligibility, and mechanics)
    • Anthropic Pricing Documentation: docs.claude.com/en/docs/about-claude/pricing (primary source for standard API rates and tool-use pricing)
    • Independent press coverage from The New Stack, The Decoder, and InfoWorld confirming the announcement and its scope

    If you spot a number that’s drifted out of sync with Anthropic’s current published rates, treat the official documentation as authoritative. The pricing surface around Claude is moving quickly in 2026, and we date-stamp specifics so readers know which facts to re-verify.

  • Claude Code Pricing in May 2026: What $20, $100, and $200 a Month Actually Buy You

    Claude Code Pricing in May 2026: What $20, $100, and $200 a Month Actually Buy You

    Last refreshed: May 15, 2026

    Claude Code pricing has stopped being a clean sticker number and started being a question of which ceiling you hit first. There is a $20 plan, a $100 plan, and a $200 plan — and underneath all three sits a 5-hour rolling window, a weekly active-hours cap added in August 2025, and a per-model multiplier that quietly makes Opus 4.7 the most expensive thing you can do inside the terminal. If you came looking for the right plan, the honest answer is: it depends on whether you are mostly a Sonnet operator or you live in Opus.

    The three subscription tiers, stripped down

    Pro — $20/month. Access to Claude Code in the terminal, web, and desktop, with both Sonnet 4.6 and Opus 4.7 available. The practical envelope is about 44,000 tokens per 5-hour window and roughly 40–80 weekly active hours on Sonnet, depending on session concurrency. This is the plan for someone running Claude Code a few hours a day on focused work — refactors, scoped feature builds, debugging passes — not someone leaving an agent running while they eat lunch.

    Max 5x — $100/month. Five times the Pro envelope, plus priority during peak demand. The window allocation lands around 88,000 tokens per 5-hour block. This is the tier where you stop thinking about token budgets during a single working day and start thinking about them across a whole week. Picked correctly, it is the cheapest way to use Claude Code as your primary IDE companion without flipping over to API billing.

    Max 20x — $200/month. Twenty times Pro — about 220,000 tokens per window — which translates to roughly 480 Sonnet-hours or about 40 Opus-hours per week before the weekly cap kicks in. Real-world reports from early 2026 had $200/month users watching single Opus prompts eat 10–20% of their daily allocation; Anthropic publicly acknowledged the problem, expanded capacity, and doubled the 5-hour rate limit for Pro and Max accounts. If you are running Claude Code across multiple repos all week and reaching for Opus on the hard problems, this is the tier that stops you from staring at a rate-limit wall.

    The API, as a sanity check

    If you want a sanity check on whether the subscription math works, price the same workload against the API:

    • Claude Haiku 4.5 (claude-haiku-4-5-20251001): $1.00 input / $5.00 output per million tokens
    • Claude Sonnet 4.6 (claude-sonnet-4-6): $3.00 input / $15.00 output per million tokens
    • Claude Opus 4.7 (claude-opus-4-7): $5.00 input / $25.00 output per million tokens

    Prompt caching is the lever almost nobody uses correctly. Cache writes cost 1.25x input price for the 5-minute TTL or 2.0x for the 1-hour TTL, but cache reads cost 0.10x — a 90% discount on every subsequent request that hits the same context. If your .clauderules file, project map, and the file you are editing are all stable for an hour, the bill on a long pairing session can drop by an order of magnitude. The Batch API knocks another 50% off both directions for asynchronous workloads, which is worth knowing if you are running large refactor sweeps.

    One trap on Opus 4.7 specifically: the model uses a new tokenizer that inflates token counts by up to 35% on identical text compared to Opus 4.6. The headline price did not change, but your effective spend per request did — sometimes by nothing, sometimes by a third, depending on the content. If you migrated from Opus 4.6 and your bill went up without your prompt patterns changing, that is the reason.

    How to actually choose

    The cleanest way to pick a plan is to first decide your model mix, then your weekly hours.

    If you are mostly a Sonnet operator — long agentic runs, multi-file edits, codebase Q&A, with Opus only reached for on the architectural questions — Pro at $20 is plausible up to about 5–8 hours of focused use per day, Max 5x covers most full-time individual developers, and Max 20x is overkill unless you are running multiple sessions in parallel.

    If you live in Opus — long-horizon agentic work, hard refactors across many files, anything where you would rather have one good attempt than three Sonnet retries — Pro will frustrate you within two weeks, Max 5x is the realistic floor, and Max 20x is the only tier that gives you a defensible Opus envelope without bouncing over to API billing.

    And if you are running Claude Code across multiple repos all week, leaving agents to grind on tasks while you do other things, Max 20x is the only subscription that holds up — and even then, the weekly cap is real. Use the API for the spillover and you will still come out cheaper than trying to brute-force a smaller plan.

    The number that matters

    One developer’s public report this year: roughly 10 billion tokens consumed across Claude Code over eight months. API metered cost would have exceeded $15,000. The same workload on Max at $100/month for the same window came in around $800 — about 93% cheaper. That is the gap that makes the subscription model worth taking seriously, even when the rate limits feel arbitrary. The $200 tier is not a vanity number; it is the price Anthropic charges to stop being a meaningful constraint on your workflow.

    The right way to read Claude Code pricing in May 2026 is not to ask which plan is cheapest. It is to ask which plan is the cheapest one that disappears — the one that stops appearing in your day. For most full-time developers reaching for Opus regularly, that plan is Max 20x. For everyone else, Max 5x is the first plan that actually gets out of your way.

  • Claude Context Window Size 2026: What 1 Million Tokens Actually Means

    Claude Context Window Size 2026: What 1 Million Tokens Actually Means

    Last refreshed: May 15, 2026

    Looking for quick answers? The FAQ version covers every common question directly.

    → Context Window FAQ

    Claude’s context window is one of those specs that sounds simple until you actually need to use it. “1 million tokens” means almost nothing without a frame of reference. This is the guide we wish existed when we started building on Claude — written from our own experience running it in production, with numbers pulled directly from Anthropic’s official documentation.

    Quick Definition

    The context window is Claude’s working memory for a conversation. It holds everything Claude can see and reason about at once: your messages, Claude’s responses, any documents you’ve shared, and system prompts. When the window fills up, earlier content drops out.

    Current Context Window Sizes by Model (May 2026)

    These numbers come directly from Anthropic’s official models page, fetched May 9, 2026. Model strings are exact API identifiers:

    Model API String Context Window Max Output
    Claude Opus 4.7 claude-opus-4-7 1,000,000 tokens 128,000 tokens
    Claude Sonnet 4.6 claude-sonnet-4-6 1,000,000 tokens 64,000 tokens
    Claude Haiku 4.5 claude-haiku-4-5-20251001 200,000 tokens 64,000 tokens

    Opus 4.7 and Sonnet 4.6 both have the full 1M token context window. Haiku 4.5 is 200K. The key difference between Opus 4.7 and Sonnet 4.6 in this table is the max output — Opus 4.7 can write up to 128K tokens in a single response, Sonnet 4.6 caps at 64K.

    What Does 1 Million Tokens Actually Hold?

    Token counts are an abstraction. Here’s what 1 million tokens translates to in practical terms:

    • About 750,000 words of English text — roughly 10 full-length novels, or 1,500 average blog posts
    • A full mid-size codebase — a 50,000-line Python project with comments fits comfortably
    • Hours of meeting transcripts — a full workday of recorded calls, transcribed, fits in one context window
    • Multiple large documents simultaneously — 10 research PDFs at 30 pages each, all in the same conversation
    • Long conversation histories — hundreds of back-and-forth exchanges before anything starts dropping off

    We’ve loaded entire Notion exports, full project histories, and multi-document research packs into a single Claude session. At 1M tokens, you’re unlikely to hit the ceiling in a normal working session. You hit it when you’re doing things like: loading your entire codebase plus documentation plus conversation history and then asking Claude to do a full architectural review.

    Context Window vs. Memory: What’s the Difference?

    This is where a lot of people get confused. The context window and memory are not the same thing:

    • Context window: What Claude can see right now, in this session. Once a session ends, it’s gone.
    • Memory (in claude.ai): A separate system that extracts and stores key information from past sessions. It surfaces relevant facts into future conversations as a snippet in the context.
    • Managed Agents memory stores: A developer-layer construct where agents maintain and update knowledge bases across sessions — distinct from both the context window and the consumer memory feature.

    The 1M token context window is your working memory for one session. It doesn’t persist. Memory systems are what carry information across sessions — but they work by injecting a summary into the context window of the new session, not by giving Claude access to the full history.

    Does a Bigger Context Window Mean Better Performance?

    Mostly yes, with one important nuance. More context means Claude has more information to reason about, which generally produces better outputs for tasks that benefit from full context — code reviews, document synthesis, long-form writing, multi-document comparison.

    The nuance: performance can degrade on tasks involving specific information buried deep in a very long context. This is sometimes called the “lost in the middle” problem — models tend to pay more attention to the beginning and end of a long context than the middle. Anthropic has worked on this with Claude’s architecture, and it performs well on long-context tasks, but it’s worth structuring important information at natural reference points rather than burying it in the middle of a 500-page document.

    How We Actually Use the 1M Token Window

    We run Claude in production for content operations, site management, and agentic coding workflows. Here’s where the 1M context window makes a concrete difference in our work:

    • Full site audits: Loading every post from a WordPress site (200+ posts worth of content) into one session for comprehensive SEO analysis — without having to chunk and re-prompt
    • Cross-session context: Pasting in long Notion briefings, prior session transcripts, and the current task in one go. The window is large enough that we don’t have to decide what to leave out.
    • Codebase-wide reasoning: In Claude Code, having the full project context means Claude can make changes that account for how files interact rather than reasoning only about the current file
    • Multi-document synthesis: Research projects where we load 10-15 source documents and ask Claude to synthesize across them — something that was impossible at 100K context windows

    The practical shift from 200K to 1M tokens wasn’t just “more room.” It changed what we could ask Claude to do in a single session.

    Context Window on the API: Batch Output Extension

    For API users: on the Message Batches API, Opus 4.7, Opus 4.6, and Sonnet 4.6 support up to 300K output tokens using the output-300k-2026-03-24 beta header. This is relevant for batch generation tasks where you need very long outputs — documentation generation, large codebases, book-length content.

    Frequently Asked Questions

    What is Claude’s context window in 2026?

    Claude Opus 4.7 and Claude Sonnet 4.6 both have 1,000,000 token (1M token) context windows as of May 2026. Claude Haiku 4.5 has a 200,000 token context window. These are the current generally available models.

    How many pages can Claude read at once?

    At 1M tokens, Claude can hold roughly 750,000 words of English text — equivalent to approximately 3,000 average pages. In practice, a typical 20-page PDF is roughly 10,000-15,000 tokens, so you could load 60-100 such documents in a single session before approaching the limit.

    Does the context window reset between messages?

    No — the context window accumulates across an entire conversation session. Every message you send and every response Claude gives adds to the total. The window doesn’t reset between individual messages; it resets when you start a new conversation.

    What happens when Claude hits the context window limit?

    When a conversation reaches the context window limit, earlier messages begin to drop out of the active context. Claude can no longer reference information from those earlier messages — it effectively forgets that part of the conversation. In the claude.ai interface, you’ll see a notification when you’re approaching the limit.

    Is the 1M context window available on the free plan?

    The model available to free plan users has access to the 1M context window. However, free plan usage limits mean long-context sessions hit rate limits faster than paid plans. The window is technically available, but sustained heavy use of it is more practical on paid tiers.

    What’s the difference between Claude Opus 4.7 and Sonnet 4.6 context windows?

    Both have the same 1M token input context window. The difference is max output: Opus 4.7 can generate up to 128,000 tokens in a single response; Sonnet 4.6 caps at 64,000 tokens. For most tasks this distinction doesn’t matter, but for very long document generation or large code outputs, Opus 4.7 has the higher output ceiling.

  • The CLAUDE.md Playbook: How to Actually Guide Claude Code Across a Real Project (2026)

    The CLAUDE.md Playbook: How to Actually Guide Claude Code Across a Real Project (2026)

    Last refreshed: May 15, 2026

    Most writing about CLAUDE.md gets one thing wrong in the first paragraph, and once you notice it, you can’t unsee it. People describe it as configuration. A “project constitution.” Rules Claude has to follow.

    It isn’t any of those things, and Anthropic is explicit about it.

    CLAUDE.md content is delivered as a user message after the system prompt, not as part of the system prompt itself. Claude reads it and tries to follow it, but there’s no guarantee of strict compliance, especially for vague or conflicting instructions. — Anthropic, Claude Code memory docs

    That one sentence is the whole game. If you write a CLAUDE.md as if you’re programming a machine, you’ll get frustrated when the machine doesn’t comply. If you write it as context — the thing a thoughtful new teammate would want to read on day one — you’ll get something that works.

    This is the playbook I wish someone had handed me the first time I set one up across a real codebase. It’s grounded in Anthropic’s current documentation (linked throughout), layered with patterns I’ve used across a network of production repos, and honest about where community practice has outrun official guidance.

    If any of this ages out, the docs are the source of truth. Start there, come back here for the operator layer.


    The memory stack in 2026 (what CLAUDE.md actually is, and isn’t)

    Claude Code’s memory system has three parts. Most people know one of them, and the other two change how you use the first.

    CLAUDE.md files are markdown files you write by hand. Claude reads them at the start of every session. They contain instructions you want Claude to carry across conversations — build commands, coding standards, architectural decisions, “always do X” rules. This is the part people know.

    Auto memory is something Claude writes for itself. Introduced in Claude Code v2.1.59, it lets Claude save notes across sessions based on your corrections — build commands it discovered, debugging insights, preferences you kept restating. It lives at ~/.claude/projects/<project>/memory/ with a MEMORY.md entrypoint. You can audit it with /memory, edit it, or delete it. It’s on by default. (Anthropic docs.)

    .claude/rules/ is a directory of smaller, topic-scoped markdown files — code-style.md, testing.md, security.md — that can optionally be scoped to specific file paths via YAML frontmatter. A rule with paths: ["src/api/**/*.ts"] only loads when Claude is working with files matching that pattern. (Anthropic docs.)

    The reason this matters for how you write CLAUDE.md: once you understand what the other two are for, you stop stuffing CLAUDE.md with things that belong somewhere else. A 600-line CLAUDE.md isn’t a sign of thoroughness. It’s usually a sign the rules directory doesn’t exist yet and auto memory is disabled.

    Anthropic’s own guidance is explicit: target under 200 lines per CLAUDE.md file. Longer files consume more context and reduce adherence.

    Hold that number. We’ll come back to it.


    Where CLAUDE.md lives (and why scope matters)

    CLAUDE.md files can live in four different scopes, each with a different purpose. More specific scopes take precedence over broader ones. (Full precedence table in Anthropic docs.)

    Managed policy CLAUDE.md lives at the OS level — /Library/Application Support/ClaudeCode/CLAUDE.md on macOS, /etc/claude-code/CLAUDE.md on Linux and WSL, C:\Program Files\ClaudeCode\CLAUDE.md on Windows. Organizations deploy it via MDM, Group Policy, or Ansible. It applies to every user on every machine it’s pushed to, and individual settings cannot exclude it. Use it for company-wide coding standards, security posture, and compliance reminders.

    Project CLAUDE.md lives at ./CLAUDE.md or ./.claude/CLAUDE.md. It’s checked into source control and shared with the team. This is the one you’re writing when someone says “set up CLAUDE.md for this repo.”

    User CLAUDE.md lives at ~/.claude/CLAUDE.md. It’s your personal preferences across every project on your machine — favorite tooling shortcuts, how you like code styled, patterns you want applied everywhere.

    Local CLAUDE.md lives at ./CLAUDE.local.md in the project root. It’s personal-to-this-project and gitignored. Your sandbox URLs, preferred test data, notes Claude should know that your teammates shouldn’t see.

    Claude walks up the directory tree from wherever you launched it, concatenating every CLAUDE.md and CLAUDE.local.md it finds. Subdirectories load on demand — they don’t hit context at launch, but get pulled in when Claude reads files in those subdirectories. (Anthropic docs.)

    A practical consequence most teams miss: in a monorepo, your parent CLAUDE.md gets loaded when a teammate runs Claude Code from inside a nested package. If that parent file contains instructions that don’t apply to their work, Claude will still try to follow them. That’s what the claudeMdExcludes setting is for — it lets individuals skip CLAUDE.md files by glob pattern at the local settings layer.

    If you’re running Claude Code across more than one repo, decide now whether your standards belong in project CLAUDE.md (team-shared) or user CLAUDE.md (just you). Writing the same thing in both is how you get drift.


    The 200-line discipline

    This is the rule I see broken most often, and it’s the rule Anthropic is most explicit about. From the docs: “target under 200 lines per CLAUDE.md file. Longer files consume more context and reduce adherence.”

    Two things are happening in that sentence. One, CLAUDE.md eats tokens — every session, every time, whether Claude needed those tokens or not. Two, longer files don’t actually produce better compliance. The opposite. When instructions are dense and undifferentiated, Claude can’t tell which ones matter.

    The 200-line ceiling isn’t a hard cap. You can write a 400-line CLAUDE.md and Claude will load the whole thing. It just won’t follow it as well as a 180-line file would.

    Three moves to stay under:

    1. Use @imports to pull in specific files when they’re relevant. CLAUDE.md supports @path/to/file syntax (relative or absolute). Imported files expand inline at session launch, up to five hops deep. This is how you reference your README, your package.json, or a standalone workflow guide without pasting them into CLAUDE.md.

    See @README.md for architecture and @package.json for available scripts.
    
    # Git Workflow
    - @docs/git-workflow.md

    2. Move path-scoped rules into .claude/rules/. Anything that only matters when working with a specific part of the codebase — API patterns, testing conventions, frontend style — belongs in .claude/rules/api.md or .claude/rules/testing.md with a paths: frontmatter. They only load into context when Claude touches matching files.

    ---
    paths:
      - "src/api/**/*.ts"
    ---
    # API Development Rules
    
    - All API endpoints must include input validation
    - Use the standard error response format
    - Include OpenAPI documentation comments

    3. Move task-specific procedures into skills. If an instruction is really a multi-step workflow — “when you’re asked to ship a release, do these eight things” — it belongs in a skill, which only loads when invoked. CLAUDE.md is for the facts Claude should always hold in context; skills are for procedures Claude should run when the moment calls for them.

    If you follow these three moves, a CLAUDE.md rarely needs to exceed 150 lines. At that size, Claude actually reads it.


    What belongs in CLAUDE.md (the signal test)

    Anthropic’s own framing for when to add something is excellent, and it’s worth quoting directly because it captures the whole philosophy in four lines:

    Add to it when:

    • Claude makes the same mistake a second time
    • A code review catches something Claude should have known about this codebase
    • You type the same correction or clarification into chat that you typed last session
    • A new teammate would need the same context to be productive — Anthropic docs

    The operator version of the same principle: CLAUDE.md is the place you write down what you’d otherwise re-explain. It’s not the place you write down everything you know. If you find yourself writing “the frontend is built in React and uses Tailwind,” ask whether Claude would figure that out by reading package.json (it would). If you find yourself writing “when a user asks for a new endpoint, always add input validation and write a test,” that’s the kind of thing Claude won’t figure out on its own — it’s a team convention, not an inference from the code.

    The categories I’ve found actually earn their place in a project CLAUDE.md:

    Build and test commands. The exact string to run the dev server, the test suite, the linter, the type checker. Every one of these saves Claude a round of “let me look for a package.json script.”

    Architectural non-obvious. The thing a new teammate would need someone to explain. “This repo uses event sourcing — don’t write direct database mutations, emit events instead.” “We have two API surfaces, /public/* and /internal/*, and they have different auth requirements.”

    Naming conventions and file layout. “API handlers live in src/api/handlers/.” “Test files go next to the code they test, named *.test.ts.” Specific enough to verify.

    Coding standards that matter. Not “write good code” — “use 2-space indentation,” “prefer const over let,” “always export types separately from values.”

    Recurring corrections. The single most valuable category. Every time you find yourself re-correcting Claude about the same thing, that correction belongs in CLAUDE.md.

    What usually doesn’t belong:

    • Long lists of library choices (Claude can read package.json)
    • Full architecture diagrams (link to them instead)
    • Step-by-step procedures (skills)
    • Path-specific rules that only matter in one part of the repo (.claude/rules/ with a paths: field)
    • Anything that would be true of any project (that goes in user CLAUDE.md)

    Writing instructions Claude will actually follow

    Anthropic’s own guidance on effective instructions comes down to three principles, and every one of them is worth taking seriously:

    Specificity. “Use 2-space indentation” works better than “format code nicely.” “Run npm test before committing” works better than “test your changes.” “API handlers live in src/api/handlers/” works better than “keep files organized.” If the instruction can’t be verified, it can’t be followed reliably.

    Consistency. If two rules contradict each other, Claude may pick one arbitrarily. This is especially common in projects that have accumulated CLAUDE.md files across multiple contributors over time — one file says to prefer async/await, another says to use .then() for performance reasons, and nobody remembers which was right. Do a periodic sweep.

    Structure. Use markdown headers and bullets. Group related instructions. Dense paragraphs are harder to scan, and Claude scans the same way you do. A CLAUDE.md with clear section headers — ## Build Commands, ## Coding Style, ## Testing — outperforms the same content run together as prose.

    One pattern I’ve found useful that isn’t in the docs: write CLAUDE.md in the voice of a teammate briefing another teammate. Not “use 2-space indentation” but “we use 2-space indentation.” Not “always include input validation” but “every endpoint needs input validation — we had a security incident last year and this is how we prevent the next one.” The “why” is optional but it improves adherence because Claude treats the rule as something with a reason behind it, not an arbitrary preference.


    Community patterns worth knowing (flagged as community, not official)

    The following are patterns I’ve seen in operator circles and at industry events like AI Engineer Europe 2026, where practitioners share how they’re running Claude Code in production. None of these are in Anthropic’s documentation as official guidance. I’ve included them because they’re useful; I’m flagging them because they’re community-origin, not doctrine. Your mileage may vary, and Anthropic’s official behavior could change in ways that affect these patterns.

    The “project constitution” framing. Community shorthand for treating CLAUDE.md as the living document of architectural decisions — the thing new contributors read to understand how the project thinks. The framing is useful even though Anthropic doesn’t use the word. It captures the right posture: CLAUDE.md is the place for the decisions you want to outlast any individual conversation.

    Prompt-injecting your own codebase via custom linter errors. Reported at AI Engineer Europe 2026: some teams embed agent-facing prompts directly into their linter error messages, so when an automated tool catches a mistake, the error text itself tells the agent how to fix it. Example: instead of a test failing with “type mismatch,” the error reads “You shouldn’t have an unknown type here because we parse at the edge — use the parsed type from src/schemas/.” This is not documented Anthropic practice; it’s a community pattern that works because Claude Code reads tool output and tool output flows into context. Use with judgment.

    File-size lint rules as context-efficiency guards. Some teams enforce file-size limits (commonly cited: 350 lines max) via their linters, with the explicit goal of keeping files small enough that Claude can hold meaningful ones in context without waste. Again, community practice. The number isn’t magic; the discipline is.

    Token Leverage as a team metric. The idea that teams should track token spend ÷ human labor spend as a ratio and try to scale it. This is business-strategy content, not engineering guidance, and it’s emerging community discourse rather than settled practice. Take it as a thought experiment, not a KPI to implement by Monday.

    I’d rather flag these honestly than pretend they’re settled. If something here graduates from community practice to official recommendation, I’ll update.


    Enterprise: managed-policy CLAUDE.md (and when to use settings instead)

    For organizations deploying Claude Code across teams, there’s a managed-policy CLAUDE.md that applies to every user on a machine and cannot be excluded by individual settings. It lives at /Library/Application Support/ClaudeCode/CLAUDE.md (macOS), /etc/claude-code/CLAUDE.md (Linux and WSL), or C:\Program Files\ClaudeCode\CLAUDE.md (Windows), and is deployed via MDM, Group Policy, Ansible, or similar.

    The distinction that matters most for enterprise: managed CLAUDE.md is guidance, managed settings are enforcement. Anthropic is clear about this. From the docs:

    Settings rules are enforced by the client regardless of what Claude decides to do. CLAUDE.md instructions shape Claude’s behavior but are not a hard enforcement layer. — Anthropic docs

    If you need to guarantee that Claude Code can’t read .env files or write to /etc, that’s a managed settings concern (permissions.deny). If you want Claude to be reminded of your company’s code review standards, that’s managed CLAUDE.md. If you confuse the two and put your security policy in CLAUDE.md, you have a strongly-worded suggestion where you needed a hard wall.

    Building With Claude?

    I’ll send you the CLAUDE.md cheat sheet personally.

    If you’re in the middle of a real project and this playbook is helping — or raising more questions — just email me. I read every message.

    Email Will → will@tygartmedia.com

    The right mental model:

    Concern Configure in
    Block specific tools, commands, or file paths Managed settings (permissions.deny)
    Enforce sandbox isolation Managed settings (sandbox.enabled)
    Authentication method, organization lock Managed settings
    Environment variables, API provider routing Managed settings
    Code style and quality guidelines Managed CLAUDE.md
    Data handling and compliance reminders Managed CLAUDE.md
    Behavioral instructions for Claude Managed CLAUDE.md

    (Full table in Anthropic docs.)

    One practical note: managed CLAUDE.md ships to developer machines once, so it has to be right. Review it, version it, and treat changes to it the way you’d treat changes to a managed IDE configuration — because that’s what it is.


    The living document problem: auto memory, CLAUDE.md, and drift

    The thing that changed most in 2026 is that Claude now writes memory for itself when auto memory is enabled (on by default since Claude Code v2.1.59). It saves build commands it discovered, debugging insights, preferences you expressed repeatedly — and loads the first 200 lines (or 25KB) of its MEMORY.md at every session start. (Anthropic docs.)

    This changes how you think about CLAUDE.md in two ways.

    First, you don’t need to write CLAUDE.md entries for everything Claude could figure out on its own. If you tell Claude once that the build command is pnpm run build --filter=web, auto memory might save that, and you won’t need to codify it in CLAUDE.md. The role of CLAUDE.md becomes more specifically about what the team has decided, rather than what the tool needs to know to function.

    Second, there’s a new audit surface. Run /memory in a session and you can see every CLAUDE.md, CLAUDE.local.md, and rules file being loaded, plus a link to open the auto memory folder. The auto memory files are plain markdown. You can read, edit, or delete them.

    A practical auto-memory hygiene pattern I’ve landed on:

    • Once a month, open /memory and skim the auto memory folder. Anything stale or wrong gets deleted.
    • Quarterly, review the CLAUDE.md itself. Has anything changed in how the team works? Are there rules that used to matter but don’t anymore? Conflicting instructions accumulate faster than you think.
    • Whenever a rule keeps getting restated in conversation, move it from conversation to CLAUDE.md. That’s the signal Anthropic’s own docs describe, and it’s the right one.

    CLAUDE.md files are living documents or they’re lies. A CLAUDE.md from six months ago that references libraries you’ve since replaced will actively hurt you — Claude will try to follow instructions that no longer apply.


    A representative CLAUDE.md template

    What follows is a synthetic example, clearly not any specific project. It demonstrates the shape, scope, and discipline of a good project CLAUDE.md. Adapt it to your codebase. Keep it under 200 lines.

    # Project: [Name]
    
    ## Overview
    Brief one-paragraph description of what this project is and who uses it.
    Link to deeper architecture docs rather than duplicating them here.
    
    See @README.md for full architecture.
    
    ## Build and Test Commands
    - Install: `pnpm install`
    - Dev server: `pnpm run dev`
    - Build: `pnpm run build`
    - Test: `pnpm test`
    - Type check: `pnpm run typecheck`
    - Lint: `pnpm run lint`
    
    Run `pnpm run typecheck` and `pnpm test` before committing. Both must pass.
    
    ## Tech Stack
    (Only list the non-obvious choices. Claude can read package.json.)
    - We use tRPC, not REST, for internal APIs.
    - Styling is Tailwind with a custom token file at `src/styles/tokens.ts`.
    - Database migrations via Drizzle, not Prisma (migrated in Q1 2026).
    
    ## Directory Layout
    - `src/api/` — tRPC routers, grouped by domain
    - `src/components/` — React components, one directory per component
    - `src/lib/` — shared utilities, no React imports allowed here
    - `src/server/` — server-only code, never imported from client
    - `tests/` — integration tests (unit tests live next to source)
    
    ## Coding Conventions
    - TypeScript strict mode. No `any` without a comment explaining why.
    - Functional components only. No class components.
    - Imports ordered: external, internal absolute, relative.
    - 2-space indentation. Prettier config in `.prettierrc`.
    
    ## Conventions That Aren't Obvious
    - Every API endpoint validates input with Zod. No exceptions.
    - Database queries go through the repository layer in `src/server/repos/`. 
      Never import Drizzle directly from route handlers.
    - Errors surfaced to the UI use the `AppError` class from `src/lib/errors.ts`.
      This preserves error codes for the frontend to branch on.
    
    ## Common Corrections
    - Don't add new top-level dependencies without discussing first.
    - Don't create new files in `src/lib/` without checking if a similar 
      utility already exists.
    - Don't write tests that hit the real database. Use the test fixtures 
      in `tests/fixtures/`.
    
    ## Further Reading
    - API design rules: @.claude/rules/api.md
    - Testing conventions: @.claude/rules/testing.md
    - Security: @.claude/rules/security.md

    That’s roughly 70 lines. Notice what it doesn’t include: no multi-step procedures, no duplicated information from package.json, no universal-best-practice lectures. Every line is either a command you’d otherwise re-type, a convention a new teammate would need briefed, or a pointer to a more specific document.


    When CLAUDE.md still isn’t being followed

    This happens to everyone eventually. Three debugging steps, in order:

    1. Run /memory and confirm your file is actually loaded. If CLAUDE.md isn’t in the list, Claude isn’t reading it. Check the path — project CLAUDE.md can live at ./CLAUDE.md or ./.claude/CLAUDE.md, not both, not a subdirectory (unless Claude happens to be reading files in that subdirectory).

    2. Make the instruction more specific. “Write clean code” is not an instruction Claude can verify. “Use 2-space indentation” is. “Handle errors properly” is not an instruction. “All errors surfaced to the UI must use the AppError class from src/lib/errors.ts” is.

    3. Look for conflicting instructions. A project CLAUDE.md saying “prefer async/await” and a .claude/rules/performance.md saying “use raw promises for hot paths” will cause Claude to pick one arbitrarily. In monorepos this is especially common — an ancestor CLAUDE.md from a different team can contradict yours. Use claudeMdExcludes to skip irrelevant ancestors.

    If you need guarantees rather than guidance — “Claude cannot, under any circumstances, delete this directory” — that’s a settings-level permissions concern, not a CLAUDE.md concern. Write the rule in settings.json under permissions.deny and the client enforces it regardless of what Claude decides.


    FAQ

    What is CLAUDE.md? A markdown file Claude Code reads at the start of every session to get persistent instructions for a project. It lives in a project’s source tree (usually at ./CLAUDE.md or ./.claude/CLAUDE.md), gets loaded into the context window as a user message after the system prompt, and contains coding standards, build commands, architectural decisions, and other team-level context. Anthropic is explicit that it’s guidance, not enforcement. (Source.)

    How long should a CLAUDE.md be? Under 200 lines. Anthropic’s own guidance is that longer files consume more context and reduce adherence. If you’re over that, split with @imports or move topic-specific rules into .claude/rules/.

    Where should CLAUDE.md live? Project-level: ./CLAUDE.md or ./.claude/CLAUDE.md, checked into source control. Personal-global: ~/.claude/CLAUDE.md. Personal-project (gitignored): ./CLAUDE.local.md. Organization-wide (enterprise): /Library/Application Support/ClaudeCode/CLAUDE.md (macOS), /etc/claude-code/CLAUDE.md (Linux/WSL), or C:\Program Files\ClaudeCode\CLAUDE.md (Windows).

    What’s the difference between CLAUDE.md and auto memory? CLAUDE.md is instructions you write for Claude. Auto memory is notes Claude writes for itself across sessions, stored at ~/.claude/projects/<project>/memory/. Both load at session start. CLAUDE.md is for team standards; auto memory is for build commands and preferences Claude picks up from your corrections. Auto memory requires Claude Code v2.1.59 or later.

    Can Claude ignore my CLAUDE.md? Yes. CLAUDE.md is loaded as a user message and Claude “reads it and tries to follow it, but there’s no guarantee of strict compliance.” For hard enforcement (blocking file access, sandbox isolation, etc.) use settings, not CLAUDE.md.

    Does AGENTS.md work for Claude Code? Claude Code reads CLAUDE.md, not AGENTS.md. If your repo already uses AGENTS.md for other coding agents, create a CLAUDE.md that imports it with @AGENTS.md at the top, then append Claude-specific instructions below.

    What’s .claude/rules/ and when should I use it? A directory of smaller, topic-scoped markdown files that can optionally be scoped to specific file paths via YAML frontmatter. Use it when your CLAUDE.md is getting long or when instructions only matter in part of the codebase. Rules without a paths: field load at session start with the same priority as .claude/CLAUDE.md; rules with a paths: field only load when Claude works with matching files.

    How do I generate a starter CLAUDE.md? Run /init inside Claude Code. It analyzes your codebase and produces a starting file with build commands, test instructions, and conventions it discovers. Refine from there with instructions Claude wouldn’t discover on its own.


    A closing note

    The biggest mistake I see people make with CLAUDE.md isn’t writing it wrong — it’s writing it once and forgetting it exists. Six months later it references libraries they’ve since replaced, conventions that have since shifted, and a team structure that has since reorganized. Claude dutifully tries to follow instructions that no longer apply, and the team wonders why the tool seems to have gotten worse.

    CLAUDE.md is a living document or it’s a liability. Treat it the way you’d treat a critical piece of onboarding documentation, because functionally that’s exactly what it is — onboarding for the teammate who shows up every session and starts from zero.

    Write it for that teammate. Keep it short. Update it when reality shifts. And remember the part nobody likes to admit: it’s guidance, not enforcement. For anything that has to be guaranteed, reach for settings instead.


    Sources and further reading

    Community patterns referenced in this piece were reported at AI Engineer Europe 2026 and captured in a session recap. They represent emerging practice, not Anthropic doctrine.

  • GCP Content Pipeline Setup for AI-Native WordPress Publishers

    GCP Content Pipeline Setup for AI-Native WordPress Publishers

    What Is a GCP Content Pipeline?
    A GCP Content Pipeline is a Google Cloud-hosted infrastructure stack that connects Claude AI to your WordPress sites — bypassing rate limits, WAF blocks, and IP restrictions — and automates content publishing, image generation, and knowledge storage at scale. It’s the back-end that lets a one-person operation run like a 10-person content team.

    Most content agencies are running Claude in a browser tab and copy-pasting into WordPress. That works until you’re managing 5 sites, 20 posts a week, and a client who needs 200 articles in 30 days.

    We run 122+ Cloud Run services across a single GCP project. WordPress REST API calls route through a proxy that handles authentication, IP allowlisting, and retry logic automatically. Imagen 4 generates featured images with IPTC metadata injected before upload. A BigQuery knowledge ledger stores 925 embedded content chunks for persistent AI memory across sessions.

    We’ve now productized this infrastructure so you can skip the 18 months it took us to build it.

    Who This Is For

    Content agencies, SEO publishers, and AI-native operators running multiple WordPress sites who need content velocity that exceeds what a human-in-the-loop browser session can deliver. If you’re publishing fewer than 20 posts a week across fewer than 3 sites, you probably don’t need this yet. If you’re above that threshold and still doing it manually — you’re leaving serious capacity on the table.

    What We Build

    • WP Proxy (Cloud Run) — Single authenticated gateway to all your WordPress sites. Handles Basic auth, app passwords, WAF bypass, and retry logic. One endpoint to rule all sites.
    • Claude AI Publisher — Cloud Run service that accepts article briefs, calls Claude API, optimizes for SEO/AEO/GEO, and publishes directly to WordPress REST API. Fully automated brief-to-publish.
    • Imagen 4 Proxy — GCP Vertex AI image generation endpoint. Accepts prompts, returns WebP images with IPTC/XMP metadata injected, uploads to WordPress media library. Four-tier quality routing: Fast → Standard → Ultra → Flagship.
    • BigQuery Knowledge Ledger — Persistent AI memory layer. Content chunks embedded via Vertex AI text-embedding-005, stored in BigQuery, queryable across sessions. Ends the “start from scratch” problem every time a new Claude session opens.
    • Batch API Router — Routes non-time-sensitive jobs (taxonomy, schema, meta cleanup) to Anthropic Batch API at 50% cost. Routes real-time jobs to standard API. Automatic tier selection.

    What You Get vs. DIY vs. n8n/Zapier

    Tygart Media GCP Build DIY from scratch No-code automation (n8n/Zapier)
    WordPress WAF bypass built in You figure it out
    Imagen 4 image generation
    BigQuery persistent AI memory
    Anthropic Batch API cost routing
    Claude model tier routing
    Proven at 20+ posts/day Unknown

    What We Deliver

    Item Included
    WP Proxy Cloud Run service deployed to your GCP project
    Claude AI Publisher Cloud Run service
    Imagen 4 proxy with IPTC injection
    BigQuery knowledge ledger (schema + initial seed)
    Batch API routing logic
    Model tier routing configuration (Haiku/Sonnet/Opus)
    Site credential registry for all your WordPress sites
    Technical walkthrough + handoff documentation
    30-day async support

    Prerequisites

    You need: a Google Cloud account (we can help set one up), at least one WordPress site with REST API enabled, and an Anthropic API key. Vertex AI access (for Imagen 4) requires a brief GCP onboarding — we walk you through it.

    Ready to Stop Copy-Pasting Into WordPress?

    Tell us how many sites you’re managing, your current publishing volume, and where the friction is. We’ll tell you exactly which services to build first.

    will@tygartmedia.com

    Email only. No sales call required. No commitment to reply.

    Frequently Asked Questions

    Do I need to know how to use Google Cloud?

    No. We build and deploy everything. You’ll need a GCP account and billing enabled — we handle the rest and document every service so you can maintain it independently.

    How is this different from using Claude directly in a browser?

    Browser sessions have no memory, no automation, no direct WordPress integration, and no cost optimization. This infrastructure runs asynchronously, publishes directly to WordPress via REST API, stores content history in BigQuery, and routes jobs to the cheapest model tier that can handle the task.

    Which WordPress hosting providers does the proxy support?

    We’ve tested and configured routing for WP Engine, Flywheel, SiteGround, Cloudflare-protected sites, Apache/ModSecurity servers, and GCP Compute Engine. Most hosting environments work out of the box — a handful need custom WAF bypass headers, which we configure per-site.

    What does the BigQuery knowledge ledger actually do?

    It stores content chunks (articles, SOPs, client notes, research) as vector embeddings. When you start a new AI session, you query the ledger instead of re-pasting context. Your AI assistant starts with history, not a blank slate.

    What’s the ongoing GCP cost?

    Highly variable by volume. For a 10-site agency publishing 50 posts/week with image generation, expect $50–$200/month in GCP costs. Cloud Run scales to zero when idle, so you’re not paying for downtime.

    Can this be expanded after initial setup?

    Yes — the architecture is modular. Each Cloud Run service is independent. We can add newsroom services, variant engines, social publishing pipelines, or site-specific publishers on top of the core stack.

    Last updated: April 2026

  • GCP Content Pipeline Setup for AI-Native WordPress Publishers

    GCP Content Pipeline Setup for AI-Native WordPress Publishers

    What Is a GCP Content Pipeline?
    A GCP Content Pipeline is a Google Cloud-hosted infrastructure stack that connects Claude AI to your WordPress sites — bypassing rate limits, WAF blocks, and IP restrictions — and automates content publishing, image generation, and knowledge storage at scale. It’s the back-end that lets a one-person operation run like a 10-person content team.

    Most content agencies are running Claude in a browser tab and copy-pasting into WordPress. That works until you’re managing 5 sites, 20 posts a week, and a client who needs 200 articles in 30 days.

    We run 122+ Cloud Run services across a single GCP project. WordPress REST API calls route through a proxy that handles authentication, IP allowlisting, and retry logic automatically. Imagen 4 generates featured images with IPTC metadata injected before upload. A BigQuery knowledge ledger stores 925 embedded content chunks for persistent AI memory across sessions.

    We’ve now productized this infrastructure so you can skip the 18 months it took us to build it.

    Who This Is For

    Content agencies, SEO publishers, and AI-native operators running multiple WordPress sites who need content velocity that exceeds what a human-in-the-loop browser session can deliver. If you’re publishing fewer than 20 posts a week across fewer than 3 sites, you probably don’t need this yet. If you’re above that threshold and still doing it manually — you’re leaving serious capacity on the table.

    What We Build

    • WP Proxy (Cloud Run) — Single authenticated gateway to all your WordPress sites. Handles Basic auth, app passwords, WAF bypass, and retry logic. One endpoint to rule all sites.
    • Claude AI Publisher — Cloud Run service that accepts article briefs, calls Claude API, optimizes for SEO/AEO/GEO, and publishes directly to WordPress REST API. Fully automated brief-to-publish.
    • Imagen 4 Proxy — GCP Vertex AI image generation endpoint. Accepts prompts, returns WebP images with IPTC/XMP metadata injected, uploads to WordPress media library. Four-tier quality routing: Fast → Standard → Ultra → Flagship.
    • BigQuery Knowledge Ledger — Persistent AI memory layer. Content chunks embedded via Vertex AI text-embedding-005, stored in BigQuery, queryable across sessions. Ends the “start from scratch” problem every time a new Claude session opens.
    • Batch API Router — Routes non-time-sensitive jobs (taxonomy, schema, meta cleanup) to Anthropic Batch API at 50% cost. Routes real-time jobs to standard API. Automatic tier selection.

    What You Get vs. DIY vs. n8n/Zapier

    Tygart Media GCP Build DIY from scratch No-code automation (n8n/Zapier)
    WordPress WAF bypass built in You figure it out
    Imagen 4 image generation
    BigQuery persistent AI memory
    Anthropic Batch API cost routing
    Claude model tier routing
    Proven at 20+ posts/day Unknown

    What We Deliver

    Item Included
    WP Proxy Cloud Run service deployed to your GCP project
    Claude AI Publisher Cloud Run service
    Imagen 4 proxy with IPTC injection
    BigQuery knowledge ledger (schema + initial seed)
    Batch API routing logic
    Model tier routing configuration (Haiku/Sonnet/Opus)
    Site credential registry for all your WordPress sites
    Technical walkthrough + handoff documentation
    30-day async support

    Prerequisites

    You need: a Google Cloud account (we can help set one up), at least one WordPress site with REST API enabled, and an Anthropic API key. Vertex AI access (for Imagen 4) requires a brief GCP onboarding — we walk you through it.

    Ready to Stop Copy-Pasting Into WordPress?

    Tell us how many sites you’re managing, your current publishing volume, and where the friction is. We’ll tell you exactly which services to build first.

    will@tygartmedia.com

    Email only. No sales call required. No commitment to reply.

    Frequently Asked Questions

    Do I need to know how to use Google Cloud?

    No. We build and deploy everything. You’ll need a GCP account and billing enabled — we handle the rest and document every service so you can maintain it independently.

    How is this different from using Claude directly in a browser?

    Browser sessions have no memory, no automation, no direct WordPress integration, and no cost optimization. This infrastructure runs asynchronously, publishes directly to WordPress via REST API, stores content history in BigQuery, and routes jobs to the cheapest model tier that can handle the task.

    Which WordPress hosting providers does the proxy support?

    We’ve tested and configured routing for WP Engine, Flywheel, SiteGround, Cloudflare-protected sites, Apache/ModSecurity servers, and GCP Compute Engine. Most hosting environments work out of the box — a handful need custom WAF bypass headers, which we configure per-site.

    What does the BigQuery knowledge ledger actually do?

    It stores content chunks (articles, SOPs, client notes, research) as vector embeddings. When you start a new AI session, you query the ledger instead of re-pasting context. Your AI assistant starts with history, not a blank slate.

    What’s the ongoing GCP cost?

    Highly variable by volume. For a 10-site agency publishing 50 posts/week with image generation, expect $50–$200/month in GCP costs. Cloud Run scales to zero when idle, so you’re not paying for downtime.

    Can this be expanded after initial setup?

    Yes — the architecture is modular. Each Cloud Run service is independent. We can add newsroom services, variant engines, social publishing pipelines, or site-specific publishers on top of the core stack.

    Last updated: April 2026