Tag: AI Tools

  • Foreman and Crew: Why My Best Claude Work Actually Runs on Gemini

    Foreman and Crew: Why My Best Claude Work Actually Runs on Gemini

    The Economics of Cognitive Budget

    Every automated system has a cognitive budget. When you are building an AI agency or managing a large-scale content pipeline, that budget is measured in two ways: the literal dollar cost of API credits and the “judgment tokens” spent on complex reasoning. Claude, specifically the 3.x and 4.x Sonnet and Opus series, currently holds the crown for high-judgment work. It understands nuance, follows complex instructions, and writes with a cadence that feels human. But it is also a resource you have to husband carefully.

    The most expensive mistake an operator can make is burning Claude’s judgment tokens on labor that requires zero creativity. If a task involves a fixed vocabulary, a strict JSON schema, and a predictable input-output loop, you don’t need a poet; you need a foreman to watch a crew of laborers. In my current architecture, Claude is the Foreman—the one who decides the strategy and handles the edge cases—while Gemini serves as the Crew. This isn’t just about saving a few dollars on a Tuesday; it’s about architectural resilience and maximizing the throughput of your most capable models.

    Yesterday, I detailed the orchestration pattern that allows these two models to talk to each other. Today, I want to look at the raw numbers and the operational rationale behind why my best Claude work actually runs on Gemini hardware. When you stop treating LLMs as a single-vendor solution and start treating them as tiered compute, the math of your business changes overnight.

    The Tygart Media Benchmark: 1,000 Posts and 931 Tags

    To understand the “Foreman and Crew” model, we have to look at a concrete production environment. We recently moved over 1,000 legacy posts for Tygart Media through a full metadata audit. This wasn’t a “write a summary” task. This was a “categorize these posts using only these 931 specific tags” task. This is what we call a bounded subtask. The model cannot invent new tags. It cannot be “creative.” It must map unstructured text to a strictly defined vocabulary.

    Running this through Claude Opus or even Sonnet 3.5 is technically superior in terms of accuracy, but the cost-to-benefit ratio is skewed. Gemini, particularly when accessed through a Google One AI Premium subscription, allows for a “marginal zero” cost structure for high-volume, bounded tasks. We processed 50 batches, involving approximately 300,000 input tokens and 25,000 output tokens. Here is how that breaks down against the current market rates for Claude models:

    Model Tier Input (300K) Output (25K) Total Cost Estimated Annual (20 Clients)
    Claude Sonnet 3.5 ($3/$15) $0.90 $0.38 $1.28 $307.20
    Claude Opus ($15/$75) $4.50 $1.88 $6.38 $1,531.20
    Gemini (AI Ultra Subscription) $0.00* $0.00* $0.00 $0.00

    *Cost is covered by the existing $19.99/mo subscription already used for storage and workspace tools.

    A $6 saving in a single day is a rounding error. But scale that across 20 client sites on a monthly cadence, and you are looking at $1,500 a year in reclaimed margin. More importantly, you are preserving Claude’s rate limits for the tasks Gemini cannot do—like the actual synthesis of the articles or the high-level strategy decisions that Claude 3.5 handles with far more grace.

    Defining the Bounded Subtask

    The success of this model hinges on knowing where the Foreman ends and the Crew begins. You cannot simply ask Gemini to “write like Claude.” It won’t. Gemini’s prose style often leans toward the repetitive or the overly structured. However, Gemini excels at what I call Bounded Subtasks. These are tasks where the “walls” of the output are clearly defined.

    A bounded subtask has three characteristics:

    • Fixed Vocabulary: The model must choose from a provided list (like our 931-tag library) rather than generating new ideas.
    • Structural Rigidity: The output must be valid JSON or a specific markdown format. Gemini is exceptionally good at following “System Instructions” that demand valid code blocks.
    • Low Context Sensitivity: The task doesn’t require “remembering” what happened three articles ago. It only needs the text in front of it and the rules provided.

    By routing these specific “labor” tasks to Gemini, we ensure that zero hallucinations occur. When you give Gemini 931 tags and tell it “only use these,” its adherence to those boundaries is remarkably stable. In our Tygart Media run of 1,000 posts, we saw zero instances of the model inventing a tag that wasn’t in the provided schema. That is the “Crew” doing exactly what they were told, while the “Foreman” (Claude) is free to handle the complex orchestration logic in the background.

    The Marginal Zero: Subscription Arbitrage

    There is a psychological shift that happens when you move from “consumption-based billing” (API) to “subscription-based billing” (Google One). When you are paying by the token, every experiment feels like a withdrawal from a bank account. You hesitate to run a second pass. You skip the extra validation step to save $0.15.

    When you use Gemini through the AI Ultra subscription (routed through a local bridge or automated CLI), the marginal cost of the next 100,000 tokens is zero. This changes the way you build. You can afford to be “wasteful” with tokens to ensure quality. You can run three different prompts on the same text and have the Foreman (Claude) pick the best one. This “Subscription Arbitrage” is the secret weapon of the independent operator. You are already paying for the Google storage and the workspace; why not use the compute that comes bundled with it to handle your data processing?

    This doesn’t mean Gemini is “better” than Claude. It means Gemini is “cheaper labor” for the specific tasks where its performance is “good enough.” In engineering, “good enough” at zero marginal cost is almost always superior to “perfect” at a premium.

    Architectural Resilience and Multi-Vendor Strategy

    Beyond the cost, there is the matter of resilience. If your entire agency or software stack is built on a single LLM provider, you are not a business; you are a feature of that provider. Rate limits, outages, or sudden changes in model weights can break your pipeline in an afternoon.

    By splitting the workload between Claude (Foreman) and Gemini (Crew), you build a multi-vendor layer into your architecture by default. If Anthropic has a service disruption, the Crew can still process the tagging and the data—perhaps with a slightly more manual oversight—while you wait for the Foreman to come back online. If Google throttles your subscription, you can temporarily route the Crew’s work to Claude Sonnet.

    This decoupling is essential for systems thinkers. It allows you to swap out components without re-writing the entire logic of your application. Your “Foreman” logic stays the same; you just change which “Crew” you are sending the batches to. This is the difference between building a fragile script and building a durable system.

    What You Should Do Tomorrow

    If you are currently running a pipeline that relies solely on Claude, I am not suggesting you switch. I am suggesting you audit. Look at your logs and identify the tasks that don’t require Claude’s soul. Look for the tagging, the JSON formatting, the data extraction, and the basic categorization.

    Tomorrow, try this protocol:

    • Isolate one bounded task: Pick something with a fixed input and a predictable output.
    • Set up a Gemini bridge: Use the API or a subscription-linked CLI to route that specific task.
    • Keep Claude as the orchestrator: Let Claude handle the “why” and the “how,” but let Gemini handle the “what.”
    • Measure the token savings: Don’t just look at the dollars. Look at how many Claude rate-limit tokens you’ve reclaimed for higher-value work.

    The goal isn’t to use less AI; it’s to use the right AI for the right job. My best work runs on Gemini because it allows Claude to be the best version of itself. Stop hiring master carpenters to move boxes. Hire the crew, keep the foreman, and scale the system.

  • Tracking the Chaos: Why We Built an Interactive AI Release Timeline

    Tracking the Chaos: Why We Built an Interactive AI Release Timeline

    The Failure of the Spreadsheet

    For the first two years of the “model wars,” a shared Google Sheet was enough. We tracked parameters, context window sizes, and pricing updates for GPT-4, Claude 2, and the early Gemini iterations. It was a manual process, but it worked. One of our engineers would spend thirty minutes on a Friday morning updating rows, and the team would have a stable reference for the week’s client strategy sessions.

    Then came April 2026. In the span of four weeks, the spreadsheet didn’t just become outdated; it became a liability. When Anthropic dropped Claude Opus 4.7 on April 16, followed immediately by OpenAI’s GPT-5.5 release, and then the surprise “Claude Mythos Preview” teaser, the logic of our rows and columns collapsed. By the time Google announced Gemini 3.5 Flash on May 19 at I/O, we realized we were spending more time formatting cells than analyzing the actual implications of the models.

    The pace of the ai release timeline has moved beyond manual curation. We didn’t need a prettier document; we needed a functional piece of infrastructure. This is why we stopped updating the sheet and started building a custom, interactive AI release timeline directly into the Tygart Media site using Antigravity and React.

    The April/May 2026 Compression

    To understand why a static tracker fails, you have to look at the density of releases in the second quarter of 2026. We are no longer in a “once every six months” cycle. We are in a “twice a week” cycle. The technical debt of staying current is mounting for every digital agency and AI operator.

    • April 16, 2026: Anthropic releases Claude Opus 4.7. This wasn’t just a performance bump; it introduced a native “Artifacts 2.0” layer that changed how we architected frontend deployments.
    • April 2026 (Late): OpenAI responds with GPT-5.5. The reasoning capabilities jumped, but the latency made it unusable for real-time agentic workflows.
    • May 5, 2026: OpenAI follows up with GPT-5.5 Instant. This corrected the latency issues of the previous month, effectively deprecating the “standard” 5.5 for most of our production use cases within 15 days.
    • May 19, 2026: Google releases Gemini 3.5 Flash. This model optimized the “long context” utility that we rely on for codebase analysis, offering a 2M token window at a fraction of the previous cost.

    When you have tracking ai models as a core part of your operations, you can’t rely on a tool that requires a human to “decide” where a release fits. You need a system that visualizes the overlap, the deprecation cycles, and the specific utility of each branch.

    Why a Custom Tool?

    We looked at off-the-shelf timeline plugins and SaaS “roadmap” tools. Most of them are built for marketing—they prioritize “clean” visuals over data density. For an AI strategy firm, “clean” is often the enemy of “useful.” We needed to see the tygart media ai timeline as a heat map of capability jumps, not just a list of dates.

    We chose to build a custom tool for three reasons:

    1. Component Integration: We wanted the timeline to pull directly from our internal Antigravity component library, ensuring that the UI matched our existing dashboard architecture.
    2. Programmatic Ingestion: We needed a way to feed the timeline via CLI tools rather than a CMS backend.
    3. State Management: In the heat of May 2026, we needed to filter by “multimodal,” “latency-optimized,” and “reasoning-heavy” models. Most third-party tools don’t support that level of granular state.

    The Stack: React, Framer Motion, and Antigravity

    The technical core of the timeline is a React application wrapped in Framer Motion for the layout transitions. We chose Framer Motion not for flashy animations, but for its layout projection capabilities. When a user filters the timeline from “All Models” to just “Claude 4.7 release” and its related iterations, the remaining nodes need to reorganize themselves without losing the user’s temporal context.

    The design system is powered by Antigravity, our internal framework for building high-density utility tools. Antigravity allows us to define “tokens” for different model families (Anthropic, OpenAI, Google, Meta). This ensures that as the ai release timeline grows, the visual language remains consistent. A “Preview” release like Claude Mythos has a specific dashed-border treatment defined in the system, while a “Stable” release like Gemini 3.5 Flash uses a solid high-contrast fill.

    
    // A simplified look at the release node structure
    const ReleaseNode = ({ model, date, type }) => {
      return (
        <motion.div 
          layout
          className={`node-${type}`}
          initial={{ opacity: 0 }}
          animate={{ opacity: 1 }}
        >
          <Tag color={getBrandColor(model.brand)}>{model.name}</Tag>
          <h4>{model.version}</h4>
          <p>{model.summary}</p>
        </motion.div>
      );
    };
    

    Data Ingestion: From Scraping to Structured JSON

    One of the biggest failures of our initial spreadsheet was the “copy-paste” error rate. Reading a 4,000-word release note from Google I/O and trying to summarize it into a cell is a recipe for hallucination or omission. To solve this, we moved to an automated ingestion pipeline using Claude Code and the Gemini CLI.

    When a new model drops, we pipe the official announcement text through a Gemini CLI script. The script is prompted to identify specific keys: Release Date, Model Name, Context Window, Pricing per 1M tokens, and “Primary Capability Change.” The output is a structured JSON object that we commit directly to the repository. The React frontend then consumes this JSON to render the timeline.

    This “Operator Mindset” approach means that the person “updating” the timeline isn’t writing marketing copy. They are validating data that has been extracted directly from the source. It removes the “hype” and leaves us with the specs.

    Technical Challenges: Performance and Overlap

    Building an interactive timeline sounds straightforward until you hit a “Hot Week.” The week of May 4, 2026, was a nightmare for our layout engine. We had GPT-5.5 Instant, a mid-cycle update from Mistral, and the first leaks of the Mythos preview all hitting within 72 hours.

    In a standard vertical timeline, these nodes stack on top of each other, creating a “scroll-hole.” We had to implement a collision detection algorithm in the React component. If two releases occur within the same 48-hour window, the timeline branches horizontally. This allows the user to see the “clash” of models visually. It reflects the reality of the market: these models are competing for the same headspace at the same time.

    We also struggled with SVG performance. We initially tried to draw connecting lines between “parent” and “child” models (e.g., GPT-5.5 to GPT-5.5 Instant). As the timeline grew to over 50 nodes, the browser’s paint time started to lag. We eventually moved to a canvas-based background for the connecting lines, keeping the nodes as interactive DOM elements. It’s a bit more complex to maintain, but it keeps the interaction at 60fps.

    Design Decisions: Usefulness Over Aesthetics

    In the Pacific Northwest, we tend to favor restraint. We applied this to the UI. We stripped out the brand logos and replaced them with high-contrast color codes. We removed the “hero images” that usually accompany these releases. If you are an architect looking at our timeline, you don’t need to see a picture of a glowing brain; you need to see the context window and the date.

    One of the most debated features was the “Impact Score.” We originally wanted to rank models on a scale of 1-10. We killed that idea in the second week of development. “Impact” is subjective. Instead, we added a “Primary Use Case” filter. If you’re building a coding agent, the “Impact” of Gemini 3.5 Flash’s 2M context window is much higher than a reasoning-heavy model with a 128k window. Our design allows the user to define what matters to them.

    Failures in Automation

    We aren’t afraid to show where we tripped. Our first attempt at the timeline was 100% automated. We had a CRON job that searched for “new model release” and tried to update the JSON automatically. It was a disaster.

    On May 5, the bot picked up a parody post on X (formerly Twitter) about a “GPT-6 Super-Intelligence” and added it to the timeline. It took us six hours to notice and remove it. We learned that while extraction should be automated, verification must remain human. We now use a “Human-in-the-loop” (HITL) system. The Gemini CLI generates the draft JSON, but it requires a git commit by an engineer to actually go live. This balance is what keeps the tool reliable.

    The Result: An Operator’s View

    The interactive timeline has changed how we talk to clients. Instead of saying, “Things are moving fast,” we can show them the exact density of the claude 4.7 release cycle compared to the previous version. We can show them why we shifted their infrastructure from GPT-5.5 to GPT-5.5 Instant in a matter of days. It provides a visual justification for the agility we build into our systems.

    It’s no longer a “project.” It’s a living part of the Tygart Media stack. It serves as a reminder that in the AI era, your documentation tools must be as scalable and automated as the models themselves.

    What You Should Do Tomorrow

    If you are still tracking AI updates in a spreadsheet or a Notion gallery, you are already behind. You don’t necessarily need to build a custom React app, but you do need to change your process.

    • Step 1: Stop writing manual summaries. Use a CLI tool (Gemini or Claude) to extract the technical specifications from release notes. Create a structured format (JSON or CSV) that remains consistent.
    • Step 2: Define your “Production Stack.” Don’t track every model; track the ones that actually affect your operations. If you aren’t using Llama 3 on-prem, don’t let it clutter your primary view.
    • Step 3: Visualize the overlap. Whether you use a simple Mermaid.js chart in your internal wiki or a custom tool, you need to see when models are released in parallel. It helps you understand which “generation” of technology you are currently building on.

    The chaos isn’t going away. The only variable is how much of it you choose to automate.

  • Agentic AI Orchestration: The Three-Layer Stack (Antigravity vs. Claude Code)

    Agentic AI Orchestration: The Three-Layer Stack (Antigravity vs. Claude Code)

    The Shift from Solitary Agents to Orchestrated Systems

    By May 2026, the novelty of “chatting” with an AI has vanished. For technical operators and systems architects, the conversation has moved from prompt engineering to orchestration. We no longer ask an agent to “write a script”; we deploy stacks that monitor state, reconcile data across disparate platforms, and execute complex workflows without human intervention unless a threshold is breached. In this landscape, two primary paradigms for AI orchestration tools 2026 have emerged: the sequential, deterministic approach of Claude Code and the parallel, swarm-based architecture of Antigravity 2.0.

    The “operator’s reality” in 2026 is that building a single agent is a hobby; building a three-layer stack is a business. This stack—composed of Notion as the human-readable “Eyes,” Google Cloud Platform (GCP) as the “Headless Engine,” and tools like Claude Code or Antigravity as the “Hands”—has become the standard for scalable automation. The challenge isn’t getting the AI to do the work; it’s the reconciliation. It’s ensuring that what the agent thinks it did in the terminal matches what the business sees in its records. This is the breakdown of how these tools operate in the field.

    Claude Code: The Sequential Conductor

    Claude Code remains the gold standard for high-precision, terminal-first execution. It operates as a “Senior Engineer” archetype. When you initialize a session in a repository, it doesn’t just guess; it indexes the environment, maps dependencies, and proceeds with a surgical, step-by-step logic that requires human verification for high-impact changes.

    In our tests, Claude Code’s primary strength is its determinism. If you are refactoring a legacy microservice on GCP, you want the “Conductive” approach. You want the agent to read the logs, propose a fix, and wait for your y/n confirmation before it pushes to production. It is a tool of restraint. Its CLI-native interface is designed for the developer who lives in the terminal, using a local context window to ensure that every line of code written is idiomatically consistent with the existing codebase.

    However, the limitation of claude code vs antigravity becomes apparent in high-volume operations. Claude Code is sequential. It is one agent, one terminal, one task. It is brilliant at fixing a bug; it is slow at managing a fleet of 500 social media accounts or reconciling 10,000 line items across a multi-region inventory system. For that, you need a different architecture.

    Antigravity 2.0: The Parallel Swarm

    Antigravity 2.0, released earlier this year, takes the opposite approach. It is built on “Swarm Intelligence.” Instead of a single conductor, Antigravity deploys a Mission Control UI that manages dozens of “worker” agents simultaneously. These agents don’t wait for your confirmation at every step; they use browser verification to “see” their results in real-time and self-correct based on the visual state of the web or a GUI.

    If Claude Code is the surgeon, Antigravity is the construction crew. In a recent deployment for a logistics client, we used Antigravity to monitor carrier pricing across 15 different portals. A single Claude Code instance would have taken hours to cycle through these sequentially. Antigravity spun up 15 parallel swarms, each with its own browser instance, scraped the data, verified the pricing against the contract terms (using its internal visual verification), and updated the database in under four minutes.

    The Mission Control UI is the differentiator. While Claude Code users are staring at a scrolling terminal, Antigravity users are looking at a dashboard of active swarms. You can see which agents are “thinking,” which are “verifying,” and which have hit a roadblock. It is designed for multi-agent orchestration at scale, where the operator’s role shifts from “approver” to “overseer.”

    The Three-Layer Stack: Eyes, Brain, and Hands

    The most effective systems we’ve built this year don’t rely on a single tool. They use what we call the “Rare Three-Layer Stack.” Most people pick one layer and wonder why their automation is brittle. The real power is in the reconciliation of these three components:

    Layer 1: The Eyes (Notion AI Agents)

    Notion is no longer just a document store; it is the synthesis layer. We use notion ai agents to serve as the “Eyes” of the operation. These agents monitor our project databases, meeting notes, and strategy docs. They synthesize the human intent. If a project manager changes a status in Notion from “Draft” to “Ready for Deployment,” the Notion agent detects this change and sends a signal to the next layer. It provides the human-readable visibility that a terminal lacks.

    Layer 2: The Headless Engine (GCP)

    The “Brain” or “Engine” lives in GCP. We use Cloud Functions and Firestore to maintain the “Source of Truth.” This is where the business logic resides. When the Notion agent signals a status change, GCP processes the rules: Does this change require a security audit? Does it fit the budget? It maintains the state of the entire system, acting as a headless automation layer that doesn’t care about the UI.

    Layer 3: The Hands (Claude Code / Antigravity)

    Finally, the “Hands” execute the work. If the task is a surgical code update, GCP triggers a Claude Code session via a webhook. If the task is a wide-scale data migration or a browser-based workflow, it triggers an Antigravity swarm. These are the connective hands that read from the engine and write to the external world.

    The Reconciliation Ledger: Solving Agent Drift

    The biggest failure we see in agentic ai implementation is “drift.” Drift occurs when an agent performs an action (the Hands), but the state isn’t updated in the record (the Eyes), or the engine (the Brain) loses track of the execution.

    To solve this, we implemented a “Reconciliation Ledger.” Every action taken by a Claude Code or Antigravity instance must be logged back to a Firestore collection with a unique transaction ID. The Notion agent then periodically “audits” the ledger. If Antigravity reports that it updated 500 records, but the GCP database only shows 498 changes, the Notion agent flags a “reconciliation error” and alerts a human operator.

    Without this ledger, multi-agent orchestration is a recipe for silent failure. We’ve seen swarms enter infinite loops because they couldn’t verify their own success, racking up thousands of dollars in API costs before anyone noticed. The ledger is the guardrail.

    Operator’s Log: The Failure of the “Blind Swarm”

    Last month, we tried to automate a complex data migration for an e-commerce client using only Antigravity 2.0 swarms, bypassing the GCP engine layer. We thought the agents were smart enough to handle the state locally. We were wrong.

    The swarm was tasked with updating product descriptions and prices across four different platforms. Because the agents were working in parallel and lacked a centralized “Brain” (GCP) to manage the lock state, two agents attempted to update the same product simultaneously. Agent A updated the price to $49.99 based on the original data, while Agent B updated the description. Agent B’s save operation overwrote Agent A’s price change because it was working with an older “view” of the product page.

    The result was a $12,000 discrepancy in sales over a weekend. We learned the hard way: AI orchestration tools 2026 are powerful, but they are not a substitute for traditional database integrity. You need a headless engine to manage state; you cannot leave it to the agents to “figure it out” in parallel.

    Choosing Your Paradigm: Claude vs. Antigravity

    When choosing between claude code vs antigravity, the decision tree is straightforward:

    • Use Claude Code when: You are working within a single repository, the task requires deep logical reasoning, you need idiomatic code quality, and you have a human operator ready to verify steps. It is for “Building.”
    • Use Antigravity 2.0 when: You are working across multiple web platforms, the task is repetitive and high-volume, you need parallel execution, and visual/browser verification is more important than code-level precision. It is for “Operating.”

    In the most sophisticated environments, you aren’t choosing; you are layering. You use Claude Code to build the scripts that Antigravity then executes at scale. You use Claude to write the custom GCP functions that manage the state for your Antigravity swarms.

    What You’d Do Tomorrow: The Practical Path

    If you are an agency owner or a systems architect looking to move into agentic orchestration, don’t start by trying to automate your entire business. Start with the ledger.

    1. Map your “Eyes”: Identify where your human intent lives. Is it Notion? Jira? Slack? Set up a basic webhook to watch for state changes.
    2. Build the “Engine”: Create a centralized database (Firestore or a simple Postgres instance on GCP) that tracks the state of your manual tasks.
    3. Deploy the “Hands” on one task: Pick a single, annoying, terminal-based task and use Claude Code to automate it. Or pick a browser-based task and use Antigravity.
    4. Reconcile: Ensure that the result of the “Hands” is automatically reflected back in the “Eyes” via the “Engine.”

    The future of work in 2026 isn’t about agents replacing people. It’s about operators managing stacks. The goal isn’t to have the smartest agent; it’s to have the most reliable reconciliation ledger. When the “Eyes,” “Brain,” and “Hands” are in sync, the system scales. When they aren’t, you just have a very expensive way to generate errors.

  • The Death of ‘Vertex AI’ and the Rise of the Gemini Enterprise Agent Platform

    The Death of ‘Vertex AI’ and the Rise of the Gemini Enterprise Agent Platform

    The Death of ‘Vertex AI’ and the Rise of the Gemini Enterprise Agent Platform

    For four years, Vertex AI was the “everything store” for Google Cloud’s machine learning stack. It was a sprawling, often fragmented collection of notebooks, endpoint managers, and feature stores designed for a world where data scientists spent months training models that rarely saw production. But at Google Cloud Next 2026, that era ended quietly. Vertex AI was officially retired, replaced by the Gemini Enterprise Agent Platform.

    This isn’t just a marketing exercise or a shallow rebranding of a legacy service. It is a fundamental architectural admission: the “model-centric” era of AI is over. If 2023 was about finding the best model and 2024 was about RAG (Retrieval-Augmented Generation), 2026 is about the autonomous agent. Google has shifted its entire infrastructure from a library of static endpoints to a stateful orchestration layer for agents that can think, execute, and—most importantly—correct themselves.

    The Architecture Shift: Model-Centric vs. Agent-First

    In the old Vertex AI framework, you deployed a model. You sent a prompt, you received a completion, and the transaction was over. Any complexity—looping, tool-calling, or memory—had to be built by your developers in a separate layer, usually involving fragile Python scripts or heavy frameworks like LangChain.

    The Gemini Enterprise Agent Platform flips this. With the rollout of ADK 2.0 (Agent Development Kit), the “model” is now just a component of an “agent.” In this new architecture, the platform handles the state. You no longer manage a stateless API; you manage a persistent entity with a memory buffer and a task queue.

    For agencies, this means moving away from “deploying models” and toward autonomous agent governance. If you are still billing clients for “custom GPTs” or simple RAG pipelines, you are effectively selling 2024 technology. The current standard is stateful multi-step execution where the agent can initiate its own sub-processes, query external APIs, and wait for asynchronous callbacks without the developer managing the intermediate state.

    ADK 2.0 and the Developer Workflow

    The core of this transition is ADK 2.0. Unlike its predecessor, which felt like a wrapper for REST calls, ADK 2.0 is built for local-first development. Most of our internal testing at Tygart Media now happens through the Gemini CLI, which allows operators to spin up agent environments that mirror production exactly.

    When you use the Gemini CLI to initialize a project (gemini init --agent-type=stateful), it doesn’t just create a YAML file. It provisions a “Reasoning Engine” that can handle long-running tasks. We recently tested this on a complex data migration for a logistics client. In the Vertex AI days, we would have had to write a massive script to handle 404 errors, retries, and schema mismatches. With the Gemini Enterprise Agent Platform, we deployed a “Migration Agent” that simply had the goal: “Sync these 12 databases. If a schema doesn’t match, research the correct mapping in the legacy docs and retry. Log all failures to Antigravity for human review.”

    The agent didn’t just run; it resided on the platform for three days, executing tasks, pausing when it hit rate limits, and resuming without losing its place in the sequence. This is the difference between a tool and a worker.

    Agent Studio: Low-Code Orchestration That Actually Works

    Google also introduced Agent Studio, which replaces the old Vertex AI Model Garden. While the Model Garden was a catalog, Agent Studio is a visual IDE for agentic loops. It allows systems architects to map out decision trees where the “nodes” aren’t just LLM calls, but “skills”—authenticated connections to BigQuery, Google Search, or internal ERPs.

    The key feature here is stateful multi-step logic. In previous iterations, if an agent failed at step 4 of a 10-step process, you had to restart from step 1 or build complex checkpointing logic. Agent Studio handles the checkpointing natively. For an operator, this reduces the “failure surface area.” We can now see exactly where an agent’s reasoning diverged and “hot-fix” the prompt or the tool definition mid-execution.

    The Hard Truth About Autonomous Agent Governance

    As Vertex AI is rebranded and replaced, the biggest hurdle for agencies isn’t the code—it’s the governance. When you move from “models” to “agents,” you are introducing non-deterministic actors into a client’s environment.

    We’ve seen what happens when governance is ignored. In a pilot project earlier this year, an autonomous agent tasked with “optimizing ad spend” accidentally deleted three high-performing campaigns because it interpreted “efficiency” as “cutting all costs.” This wasn’t a model failure; the model did exactly what it was told. It was a governance failure. There were no guardrails or supervisor agents to check its work.

    In the Gemini Enterprise Agent Platform, governance is a first-class citizen. You can now deploy “Supervisor Agents” that sit one level above your worker agents. These supervisors don’t perform tasks; they only audit the “Chain of Thought” (CoT) of the workers. At Tygart Media, we use tools like Claude Code to write the initial guardrail logic, then deploy it to the Gemini platform to monitor our production loops. If the worker agent’s proposed action deviates from the safety policy by more than a 0.15 variance in the embedding space, the supervisor kills the process and pings an operator.

    Pricing Shift: From Tokens to Outcomes

    One of the most disruptive changes in the May 2026 rollout is the pricing model. Google is moving away from purely token-based billing for Enterprise Agent Platform users, introducing outcome-based pricing for specific task completions.

    The old model penalized efficiency. If you spent more tokens making an agent “think” more deeply to avoid a mistake, you paid more. The new model allows you to pay per “Successful Task Completion.” This aligns Google’s incentives with the agency’s. We no longer care about the context window length as a cost factor; we care about the “Agentic Success Rate” (ASR).

    For a mid-sized agency, this simplifies the math significantly. If a client wants a support agent that handles 1,000 tickets, you can now project a flat cost per resolved ticket rather than guessing how many tokens a “difficult” customer might consume.

    A Practical Failure: Why ‘Models’ Weren’t Enough

    To understand why this change was necessary, look at our failure with “Project Orion” in late 2025. We tried to build a competitor analysis engine using Vertex AI and Gemini 1.5 Pro. We used a standard RAG setup. It worked 70% of the time. The other 30% of the time, the model would hallucinate a competitor’s pricing because it couldn’t access a gated PDF or failed to navigate a Javascript-heavy website.

    The model was “smart,” but it was “blind” and “unreliable” in a loop. It had no way to say, “I failed to read this page, let me try a different browser headers strategy.”

    Two weeks ago, we rebuilt Project Orion on the Gemini Enterprise Agent Platform using ADK 2.0. The new agent has a “retry skill.” When it hits a Javascript wall, it triggers a headless browser sub-agent. If it still fails, it searches for a cached version on the Wayback Machine. It doesn’t report back until the task is done or it has exhausted a defined set of “recovery behaviors.” Our ASR jumped from 70% to 94%. We didn’t change the model; we changed the architecture from a “static call” to an “autonomous worker.”

    What You Should Do Tomorrow

    If you are managing an AI stack, the “Vertex AI” name disappearing from your console is your signal to stop building “wrappers” and start building “systems.” Here is the tactical path forward:

    1. Audit your current ‘Models’: Identify which of your current deployments are actually just stateless prompts. These are your biggest liabilities. Plan to migrate them to the Gemini Enterprise Agent Platform to take advantage of stateful memory.
    2. Adopt a CLI-First Workflow: Stop using the web console for anything other than monitoring. Use the Gemini CLI and integrate it with Claude Code or your local IDE. The speed of iteration in ADK 2.0 is only visible when you are working in a terminal environment.
    3. Install a Governance Layer: Before you deploy your next agent, define its “Exit Criteria.” Use the new Supervisor patterns in Agent Studio to ensure no agent can execute an external API call (like send_email or update_database) without a secondary “Reasoning Audit.”
    4. Re-evaluate your Contracts: If you are billing based on “implementation hours,” you are going to get crushed as agents become easier to deploy. Move toward “Performance-Based Retainers” that mirror Google’s outcome-based pricing. If the agent solves the problem, you get paid.

    The Gemini Enterprise Agent Platform isn’t just a new tool; it’s a new operating system for business. The agencies that thrive in the next 12 months won’t be the ones with the best prompts, but the ones with the most robust, well-governed agentic loops.

  • SEO is Dead, Long Live ‘Source-Worthy’ Content (SGE Reality Check)

    SEO is Dead, Long Live ‘Source-Worthy’ Content (SGE Reality Check)

    The Search Landscape of May 2026: Stop Chasing Traffic, Start Chasing Citations

    The transition is complete. As of this month, Google’s AI Overviews (formerly SGE) appear for over 52% of all search queries. If you are looking at your Search Console and seeing a 30% drop in informational traffic compared to last year, you aren’t alone. You’re simply seeing the result of the “Zero-Click” era reaching its final form. For digital agency owners and systems architects, the old SEO playbook is a liability. If you are still optimizing for clicks on “What is…” or “How to…” keywords, you are effectively donating your intellectual property to train a model that will replace your visit.

    The currency of search has shifted. We have moved from the era of link equity to the era of Source-Worthy Content. In this new reality, the goal isn’t to get the user to click through to read a basic definition; it is to ensure that your data, your unique perspective, or your proprietary methodology is the primary source cited by the Retrieval-Augmented Generation (RAG) systems powering Google, Perplexity, and OpenAI.

    The Numbers Don’t Lie: The Death of the Click

    By mid-2026, the data across our portfolio is clear. Informational query traffic—the top-of-funnel “educational” content that used to drive massive awareness—has cratered by 20-40% across most B2B and technical sectors. Users are getting their answers directly in the search interface. They don’t need to visit your site to learn “how to configure a headless CMS” if Gemini can pull the five essential steps from your documentation and present them in a neat bulleted list.

    However, while traffic is down, the value of a single citation within an AI Overview has skyrocketed. We’ve found that being the primary citation in a RAG-driven answer drives higher-intent leads than the old-school organic #1 spot ever did. The users who do click through from an AI Overview have already been pre-qualified by the AI. They aren’t looking for a definition; they are looking for the operator who provided the insight. Optimizing for AI overviews is no longer a side project; it is the core of technical SEO.

    Understanding RAG: How Google Picks Its Sources

    To win in 2026, you have to understand the mechanics of Retrieval-Augmented Generation. Google’s AI isn’t just “hallucinating” answers based on its training data; it is actively searching the live web, retrieving specific “chunks” of information, and then synthesizing those chunks into a response. This is RAG optimization.

    When an AI Overview is generated, Google’s system follows a three-step process:

    1. Retrieval: It identifies the top-ranking traditional search results for the query. (This is why maintaining traditional page-one rankings is still a prerequisite for being a source).
    2. Selection: It selects specific paragraphs, data tables, or unique insights from those top results that best satisfy the user’s intent.
    3. Generation: It rewrites those insights into a cohesive answer, adding citations to the sources it used.

    If your content is generic—if it says exactly what every other site says—the AI will synthesize the answer without citing you specifically, or it will cite a larger authority (like Wikipedia or a massive news outlet) that says the same thing. To be cited, your content must be source-worthy. It must provide something the AI cannot find elsewhere or synthesize from common knowledge.

    Why Generic Content is Erased by AI

    The era of “skyscraper” content—taking ten existing articles and making a longer one—is over. AI is better at that than you are. In fact, most of that generic content is now being flagged by LLMs as “low information gain.”

    When we audit a site using the Gemini CLI, we look for “Information Gain” scores. If a paragraph doesn’t offer a new data point, a specific case study result, or a unique operator’s perspective, it’s invisible to the RAG process. Generic advice like “SEO requires good keywords” is discarded. Specific advice like “We saw a 12% lift in RAG citations by moving from 1,000-word articles to 400-word modular content blocks” is source-worthy.

    The LLM wants to cite the originator. If you are just a curator, you are a middleman that the AI has successfully bypassed.

    The ‘Source-Worthy’ SEO Framework

    At Tygart Media, we’ve pivoted our Agency Playbook to focus on four pillars of source-worthy SEO. This is how we ensure our clients remain the “source of truth” in an AI-dominated search engine.

    1. Proprietary Data and “Proof of Work”

    The AI cannot hallucinate your internal data (yet). Original surveys, technical benchmarks, and project post-mortems are the most cited pieces of content in 2026. If you run a test on a new deployment pipeline and publish the raw numbers, Google’s AI Overview will cite your specific numbers. We’ve moved away from “opinion pieces” and toward “experiment logs.” Every article should contain at least one table or chart of data that didn’t exist on the internet before you published it.

    2. The Operator’s Perspective (E-E-A-T)

    Experience and Expertise are now the primary filters for RAG selection. Google is prioritizing content that shows “Proof of Effort.” Use first-person accounts. Instead of writing “How to use Claude Code,” write “What we learned after 500 hours using Claude Code to refactor a legacy Python monolith.” The specific failures and technical hurdles you describe are unique identifiers that the AI recognizes as authoritative.

    3. Modular Content Architecture

    Long-form, sprawling articles are difficult for RAG systems to “chunk” effectively. We are now building content in modular blocks. Each section of an article is designed to stand alone as a complete answer to a sub-query. We use <section> tags and specific ID attributes to make it easy for the crawler to identify and retrieve the exact block it needs. This is optimizing for AI overviews by making your content “consumable” for machines, not just humans.

    4. Structured Data for RAG

    Schema.org hasn’t gone away; it has become the metadata for AI. We use Dataset, HowTo, and Review schema more aggressively than ever. But more importantly, we are using Gemini CLI to auto-generate JSON-LD that specifically maps out the “Claims” made in our articles. By explicitly stating “Our claim: Informational traffic is down 30%,” we make it easier for the AI to attribute that fact to us.

    Technical Execution: Modular E-E-A-T and Gemini CLI

    The workflow for a modern agency operator involves high-level automation. We don’t manually audit 500 pages for “source-worthiness.” We use tools like Claude Code and Gemini CLI to process our content libraries.

    Our current stack for RAG optimization looks like this:

    • Analysis: We pipe our top-performing URLs through a script that uses the Gemini API to compare our content against the current AI Overview for that keyword. The script identifies “content gaps”—information the AI is providing that isn’t on our page, or information we have that the AI is ignoring.
    • Refactoring: If a page is losing traffic but has high “Source Worthiness,” we use Claude Code to refactor the HTML into a more modular structure, adding Dataset schema to any tables.
    • Validation: we use Antigravity to simulate how a RAG system would “chunk” the page. If the chunks are incoherent, we rewrite the headers to be more explicit.

    One failure we saw early in 2026 was attempting to “game” the AI by over-optimizing for specific keywords. The AI sees through keyword density. It is looking for semantic weight. When we tried to force-feed keywords, our RAG citation rate dropped. When we focused on “operator-restrained” technical clarity, the citations returned.

    Case Study: The 40% Traffic Drop and the 15% Lead Increase

    We recently worked with a systems architecture firm that saw their organic traffic from “cloud migration tips” fall by 40% in the google sge impact may 2026 rollout. Initially, there was panic. However, upon closer inspection, their “Request a Consultation” conversions were actually up by 15%.

    What happened? Their generic “tips” were being swallowed by the AI Overview. But the AI Overview was citing their specific “Cloud Migration Cost Calculator” and their “2025 Migration Failure Report.” The traffic they lost was the “looky-loos” who just wanted a quick tip. The traffic they gained (via the AI citations) was from CTOs who saw their specific data cited as the authority and clicked through to hire them. This is the shift from “volume” to “value.”

    Action Plan: What You’d Do Tomorrow

    If you are managing a content library or an agency portfolio, don’t wait for your traffic to hit zero. Start the pivot to source-worthy SEO immediately. Here is the operator’s checklist for tomorrow morning:

    1. Audit for “What is” Content: Use your preferred crawler to identify every page that targets a purely informational, definitional keyword. These are your “donor” pages. Decide whether to delete them, consolidate them, or upgrade them with proprietary data.
    2. Inject Original Data: Find three pieces of internal data—even if they are small—and add them to your top 10 most important pages. Use tables. Add a “Methodology” section.
    3. Modularize Your Headers: Ensure every H3 in your articles can stand alone as a question and every following paragraph as a direct, concise answer. Remove the “fluff” and the “introductory transitions.” The AI doesn’t need a “In this section, we will explore…” lead-in. It needs the facts.
    4. Verify Citations: Perform a manual search for your primary keywords. Look at the AI Overview. If you are ranking #1-3 in organic but aren’t cited in the AI response, your content isn’t “Source-Worthy.” It’s too generic. Rewrite the top-ranking paragraph to offer a unique, data-backed perspective that the AI is currently missing.
    5. Update Your Schema: Move beyond basic Article schema. Implement Speakable, Dataset, and ClaimReview schema where applicable. Use a tool like Gemini CLI to automate the generation of these blocks based on your existing text.

    SEO isn’t dead; the middleman is dead. The search engine of 2026 doesn’t want to send users to a website; it wants to provide an answer. Your job is to be the only source that the answer cannot exist without. Build for the machine, provide for the human, and protect your intellectual property by making it too specific to be ignored.

  • Claude Code’s Rate Limit Doubling: What May 2026 Changed and How to Pick a Plan Now

    Claude Code’s Rate Limit Doubling: What May 2026 Changed and How to Pick a Plan Now

    If you bought a Claude Code subscription in March or April and felt like you were hitting the 5-hour wall every single afternoon, you weren’t imagining it. Anthropic spent six months tightening Claude Code’s quotas — and then, over two weeks in May 2026, gave most of them back. The rate-limit math that drove plan-selection advice on the internet through April is now obsolete. Here’s what actually changed, what the numbers look like today, and how to think about Pro versus Max if you’re picking a plan this week.

    What Anthropic actually did

    On May 6, 2026, Anthropic doubled the 5-hour rate limits on Claude Code across every paid plan — Pro, Max 5x, Max 20x, Team Premium, and seat-based Enterprise. In the same announcement, they removed the peak-hour throttle that had been quietly halving available quota for Pro and Max users during weekday business hours. They also lifted API-side rate limits on the Opus tier.

    One week later, on May 13, 2026, they followed up with a 50% increase to the weekly cap across the same plans. Unlike the 5-hour change, that weekly bump carries an expiration date: July 13, 2026, unless extended. Treat it as a temporary boost, not a permanent feature.

    The trigger Anthropic pointed to is a deal that brings the full capacity of the Colossus 1 data center in Memphis online — over 300 megawatts and roughly 220,000 NVIDIA GPUs. That detail matters less than the practical one: capacity-driven throttling that had been the dominant constraint since late 2025 has loosened.

    The new numbers, by plan

    The shape of the plan ladder hasn’t changed — Pro at $20, Max 5x at $100, Max 20x at $200, Team Premium at $100/seat with a 5-seat minimum. What changed is what each tier actually delivers per window.

    • Pro ($20/mo): Roughly 90 prompts per 5-hour window now (up from a number that, in practice, was hovering around 45 once the peak-hour throttle kicked in). No peak penalty. Weekly cap is 50% higher through July 13.
    • Max 5x ($100/mo): Same doubled 5-hour window. Weekly Opus 4.7 budget moved from approximately 50 hours to approximately 75.
    • Max 20x ($200/mo): Doubled 5-hour window. Weekly Opus 4.7 budget moved from approximately 200 hours to approximately 300.
    • Team Premium ($100/seat/mo, annual; $125 monthly): Mirrors Max 5x quotas at the seat level. 5-seat minimum still applies.

    Two numbers that haven’t changed: the API pay-as-you-go pricing for the underlying models (claude-sonnet-4-6 at roughly $3 per million input tokens and $15 per million output; claude-opus-4-7 at roughly $5 in and $25 out), and the existence of the weekly cap itself. The weekly cap is still the thing that kills Max users mid-Friday.

    What this changes about plan selection

    Most of the “which plan should I buy” guides written before May 6 over-recommend Max 5x because they were sizing it against artificially compressed Pro limits. With a doubled 5-hour cap and no peak throttle, Pro at $20 is now genuinely enough for a developer doing focused coding sessions a few hours a day — something that wasn’t reliably true a month ago.

    The Max 5x case still holds, but it’s narrower now. The honest test: if you regularly burn through your Pro 5-hour window before lunch, or if you run two or three concurrent Claude Code sessions on different repos, $100 still pays for itself. If you don’t, Pro will hold.

    Max 20x is increasingly a workflow choice rather than a quota choice. The doubled limits made Max 5x sufficient for almost every solo workflow I can describe. Where 20x still earns its price is multi-agent workflows, where a coordinator-and-workers pattern can burn three to seven times the tokens of a single-agent session because every teammate maintains its own context window.

    The hidden costs that didn’t change

    The rate-limit relief is real, but several gotchas that drove “Claude Code costs me more than I expected” complaints in Q1 are still live:

    • Set ANTHROPIC_API_KEY in your shell and Claude Code bills at API rates — your subscription is silently ignored. Unset it before launching the CLI if you’re on a plan.
    • Weekly caps count active processing time only. Idle browsing is free. Long-running tool calls and extended-thinking budgets aren’t.
    • Extended thinking is billed as output tokens. On Opus 4.7 that’s roughly $25 per million. Default thinking budgets of tens of thousands of tokens per request stack up fast on API.
    • MCP server output sits in context for the rest of the session. A “list the last 20 PRs” call can dump 8,000 tokens of metadata that you’ll re-pay for on every subsequent turn until the conversation rolls over.

    If you were running into the 5-hour wall and assumed it was a usage problem, check whether one of those four is actually the cause before you upgrade.

    What to do this week

    If you’re on Pro and were considering Max 5x, wait two weeks. The new Pro ceiling is high enough that the upgrade decision now needs different evidence than it did in April.

    If you’re already on Max 5x and felt squeezed, the May 13 weekly bump should give you breathing room — but mark July 13 on your calendar. If the temporary 50% increase isn’t extended, the squeeze comes back.

    If you’re picking a plan from scratch today: start on Pro. The doubled limits are real, the peak-hour penalty is gone, and the upgrade path to Max stays open with no friction. Buy quota when you’ve measured that you need it, not before.

    The model versions to use

    For anyone writing the API string into a script this week: flagship is claude-opus-4-7, workhorse is claude-sonnet-4-6, fast tier is claude-haiku-4-5-20251001. Pull from docs.anthropic.com/en/docs/about-claude/models before shipping anything — the version strings have moved twice already this year and they’ll move again.

  • Claude Code Plan Mode: How to Use It, When to Skip It (2026 Guide)

    Claude Code Plan Mode: How to Use It, When to Skip It (2026 Guide)

    Published: May 25, 2026 | Last fact-check: May 25, 2026 against Anthropic docs and Claude Code v2.1+ behavior

    Quick Answer

    Plan Mode is a Claude Code setting that forces the agent to think through and approve a plan before taking destructive actions. Trigger it with Shift+Tab pressed twice in the terminal (the first press cycles to Auto-Accept Mode; the second lands on Plan Mode). Use it for risky multi-step work; skip it for simple read-only or contained edits.

    How to enable it, when it pays off, and when it gets in your way below.

    Plan Mode (sometimes called “planning mode”) is one of the more underused features in Claude Code in 2026. It changes how the agent works in a specific, measurable way: before Claude Code edits files, runs commands, or modifies state, it produces a plan and waits for your approval. You see what it intends to do, you say yes or no, and only then does it act.

    For the right kind of task, Plan Mode is the difference between a clean execution and a regrettable one. For the wrong kind of task, it is friction that slows you down. This guide separates the two.

    Claude Code Plan Mode vs Auto Mode: When to Use Each

    Scenario Use Plan Mode Use Auto Mode
    Unfamiliar codebase Yes — review the plan first Only if you know it well
    Large multi-file refactor Yes — catch scope creep early Not recommended
    Simple bug fix (< 5 lines) Overkill Yes
    Adding a new feature Yes — plan clarifies approach Acceptable for small features
    Writing tests Optional Yes, usually safe
    Touching database migrations Yes — irreversible changes No
    CI/CD pipeline changes Yes No

    What Plan Mode Actually Does

    In default mode, Claude Code is allowed to take actions as it reasons. It can read files, write files, run bash, edit code, all in one conversational flow. This is the strength of Claude Code as an agent — it gets work done without asking permission for every step.

    In Plan Mode, Claude Code’s behavior changes:

    1. You describe the task.
    2. Claude Code investigates the codebase (read-only operations are still allowed).
    3. Claude Code drafts a plan listing every file it intends to change, every command it intends to run, and every decision point.
    4. You read the plan. You approve it, modify it, or reject it.
    5. Only after approval does Claude Code start writing files or running commands.

    The plan is presented in the terminal as a structured outline. You can ask Claude Code to revise the plan, add steps, remove steps, or change the order. Iterating on the plan is fast because no actions have been taken yet.

    How to Enable Plan Mode

    There are four ways to activate Plan Mode in Claude Code:

    1. Shift+Tab pressed twice. Each press of Shift+Tab cycles through the three permission modes: Default → Auto-Accept → Plan → Default. Two presses lands on Plan Mode. The status bar shows ⏸ plan mode on when active.
    2. The /plan slash command. Type /plan at the start of any prompt to enter Plan Mode for that turn only. Useful for one-off plans without flipping the whole session.
    3. The –permission-mode plan flag at startup. Start the session in Plan Mode from the command line.
    4. Headless mode for scripts and CI. claude --print --permission-mode plan "your task" for automation that should never edit files.
    # Start session in Plan Mode
    claude --permission-mode plan
    
    # Or mid-session — press Shift+Tab TWICE
    # (first press = Auto-Accept Mode, second press = Plan Mode)
    
    # Or one-shot Plan Mode for next prompt only
    /plan

    Plan Mode is persistent within a session — it stays on until you cycle out with another Shift+Tab. Close and reopen Claude Code and it defaults back to off. Toggle it on for risky work, leave it on for the whole session if you are doing higher-risk work end-to-end.

    Important: Plan Mode is a hard read-only sandbox enforced at the tool level. Claude Code physically cannot edit files, run commands, or modify state while Plan Mode is active. This is not a suggestion or a soft check — the write tools are unavailable.

    When Plan Mode Pays Off

    Plan Mode is worth the friction in these situations:

    • Multi-file refactors. When the agent will touch 5+ files, you want to see the list before it starts editing. A small confusion about which files to change becomes a big mess fast.
    • Database migrations or schema changes. Anything that touches durable state and is hard to undo benefits from a confirmed plan.
    • Production code paths. If a session affects code that ships to users, the plan checkpoint is cheap insurance.
    • Ambiguous instructions. When you are not sure how the agent will interpret your request, Plan Mode surfaces the interpretation before any work happens.
    • New repository onboarding. When you do not yet know the codebase well, Plan Mode lets the agent show you what it learned during investigation before it acts.
    • Long-running batch jobs. Approving a plan for 200 file edits and then walking away is safer than launching 200 edits blind.

    When Plan Mode Gets In the Way

    Plan Mode is not free. The friction it adds is a real cost for certain workflows:

    • Single-file tweaks. Asking Claude Code to fix a typo or rename a variable does not need a plan. The plan takes longer than the fix.
    • Tight feedback loops. When you are iterating quickly — try a change, see the result, adjust — Plan Mode slows the loop. Default mode wins here.
    • Read-only investigation. If you are asking questions about the codebase (“how does this auth flow work”), there is nothing to plan. Plan Mode is irrelevant.
    • Work in a sandbox. If you are working in a throwaway directory or branch where mistakes are cheap, the safety net of Plan Mode is overkill.

    The decision is not “is Plan Mode good.” It is “is the cost of approval less than the cost of an unintended action.” For risky multi-step work, yes. For cheap iteration, no.

    Working Inside the Plan

    Once Claude Code presents a plan, you have several options:

    1. Approve as-is. Tell Claude Code to proceed. It executes the plan in order.
    2. Approve with modifications. Tell Claude Code to remove specific steps, reorder them, or add additional steps. It revises the plan and re-presents.
    3. Ask questions. Drill into specific steps. “Why are you editing file X?” Claude Code explains the reasoning.
    4. Reject and restart. If the plan is wrong-shape, tell Claude Code so. It will rebuild the plan from a corrected understanding.
    5. Cancel. Exit Plan Mode entirely if you’ve decided this is not the right task or session for it.

    The plan is conversational. You are not stuck with the first draft. Iterating on the plan is much cheaper than iterating after the work is done.

    What Plan Mode Does Not Protect Against

    Plan Mode is not a sandbox. The plan, once approved, executes for real. Plan Mode does not:

    • Prevent you from approving a bad plan
    • Catch logic errors inside individual file edits
    • Prevent destructive bash commands if you approved them in the plan
    • Replace tests or code review

    It is a thinking checkpoint, not a safety net. The human still owns the decision.

    Plan Mode vs Other Safety Patterns

    Plan Mode is one of several safety patterns Claude Code supports:

    • Read-only sessions: Restrict the agent to read operations only.
    • Per-tool permissions: Approve each tool use individually as it happens.
    • Plan Mode: Approve a batch of intended actions before execution begins.
    • Auto-accept mode: The opposite — accept all tool uses without asking. Fast and risky.

    Per-tool permission is more granular but slower. Plan Mode is bulkier but faster once approved. Use the right tool for the situation; do not assume one is always correct.

    A Working Habit

    The habit that has worked across hundreds of Claude Code sessions: default mode on, Shift+Tab twice into Plan Mode before any session that will (a) touch production state, (b) edit more than 5 files, or (c) run commands that are hard to undo. Shift+Tab again to cycle back to default for everything else.

    The shortcut becomes muscle memory in a week. Once it is muscle memory, the cost of Plan Mode drops to nearly zero, and you can use it liberally on anything that even smells risky.

    Frequently Asked Questions

    What is Plan Mode in Claude Code?

    Plan Mode is a Claude Code setting that forces the agent to produce a written plan and wait for your approval before making changes. It surfaces what the agent intends to do so you can adjust it before any work happens.

    How do I enable Plan Mode in Claude Code?

    Press Shift+Tab twice in the terminal (the first press cycles to Auto-Accept; the second lands on Plan Mode), type /plan as a slash command, or start the session with –permission-mode plan. The status bar shows ⏸ plan mode on when active.

    When should I use Plan Mode?

    For multi-file refactors, database migrations, production code paths, ambiguous instructions, new repositories you don’t know yet, and long-running batch jobs. Skip Plan Mode for single-file tweaks, tight iteration loops, and read-only investigation.

    Does Plan Mode make Claude Code slower?

    Yes, for short tasks — the plan adds latency that is not worth it on quick edits. For long or risky tasks, the plan is faster than fixing mistakes afterward.

    Can I edit the plan before approving it?

    Yes. Tell Claude Code to revise the plan — add steps, remove steps, reorder. Iterating on the plan is much cheaper than iterating after execution.

    Is Plan Mode the same as a sandbox?

    Plan Mode IS a hard read-only sandbox at the tool level — Claude Code cannot write files or run commands while it’s active. But once you approve the plan and exit Plan Mode, the work executes for real. Plan Mode prevents accidental writes during planning; it does not prevent you from approving a bad plan.

    What’s the difference between Plan Mode and per-tool permissions?

    Per-tool permissions ask you to approve each tool use individually as it happens (more granular, slower). Plan Mode batches all intended actions into one plan you approve up front (bulkier, faster once approved).

    The Bottom Line

    Plan Mode is leverage for risky work and friction for everything else. Make Shift+Tab+Shift+Tab muscle memory. Use Plan Mode whenever the cost of an unintended action exceeds the cost of approval — multi-file refactors, production changes, ambiguous specs. Skip it on cheap iteration. That single rule will save you more headaches than any other Claude Code habit.


  • Claude Code Router: Model Routing, OpenRouter & Custom Rules in 2026

    Claude Code Router: Model Routing, OpenRouter & Custom Rules in 2026

    Published: May 25, 2026 | Last fact-check: May 25, 2026 — current model lineup: Opus 4.7, Sonnet 4.6, Haiku 4.5

    Quick Answer

    A Claude Code router is any layer that decides which Claude model handles which request — Opus for hard reasoning, Sonnet for daily work, Haiku for fast cheap tasks. Anthropic ships some built-in routing, but the most leveraged users build their own routing rules on top to optimize cost and latency.

    Built-in routing, manual model selection, and the third-party router landscape below.

    “Claude Code router” is a phrase that means different things to different people in 2026, and the differences matter for what you should actually build or buy.

    It can mean (1) Anthropic’s built-in logic that picks a model when you do not specify one, (2) third-party tools that route between Anthropic models and other LLMs through one Claude Code interface, or (3) custom routing rules you build yourself to match models to tasks. This guide walks through each, when each makes sense, and the trade-offs.

    Why Routing Matters in the First Place

    Claude is not one model. It is a family. As of 2026 the production tiers are roughly:

    • Claude Opus 4.7 — $5/$25 per million tokens. Current flagship. Best for hard, ambiguous, multi-step reasoning and agentic coding.
    • Claude Sonnet 4.6 — $3/$15 per million tokens. The workhorse. Within ~1 point of Opus on coding benchmarks at 40% less cost. Right answer for 80% of daily work.
    • Claude Haiku 4.5 — $1/$5 per million tokens. Fast and cheap. Right answer for high-volume formulaic tasks: classification, extraction, formatting, routing, simple Q&A.

    Output costs 5x input across all three tiers. Prompt caching cuts cached input costs by ~90%. Batch API cuts everything by 50% if you can wait up to 24 hours.

    Using Opus for everything is wasteful. Using Haiku for everything is sloppy. Routing — matching the model to the task — is how you get the best output for the lowest cost. For someone running Claude Code several hours a day, intelligent routing is the difference between a $100/month Max bill and a $1,000/month API bill for the same work.

    Anthropic’s Built-In Claude Code Routing

    When you launch Claude Code without specifying a model, it picks a default. As of 2026 the default for most users is Sonnet, with Opus accessible via flags or settings, and Haiku used internally for some sub-tasks like tool selection and simple file operations.

    You can override the default at session start:

    # Start Claude Code with Opus for a tough refactor
    claude --model claude-opus-4-7   # current flagship
    
    # Or set it in your settings.json
    {
      "model": "claude-sonnet-4-6"  // current workhorse
    }

    Anthropic also routes internally: when Claude Code uses sub-agents for parallel work, it can route those sub-agents to lighter models automatically. This routing is opaque to you and generally well-tuned. You usually do not need to think about it.

    Manual Model Selection: The 80/20 Approach

    For most users, manual routing beats automatic routing. The rule:

    • Sonnet by default. Daily work, content drafts, code edits, file operations, debugging.
    • Opus when you hit a wall. Architectural decisions, hard refactors, ambiguous specs, anything that requires real reasoning.
    • Haiku for batch. Classification, taxonomy assignment, metadata generation, SEO meta descriptions, anything formulaic at volume.

    This 80/20 split is achievable with two or three commands and zero infrastructure. It is the right starting point.

    Third-Party Claude Code Routers

    A small ecosystem has emerged around third-party routers that sit between Claude Code and the model layer. The two most common patterns:

    OpenRouter and Multi-Provider Routers

    OpenRouter is the most widely used third-party router. You point Claude Code at OpenRouter as the API endpoint, and OpenRouter routes your requests to Claude (or to GPT, Gemini, DeepSeek, Llama, etc.). Why use it:

    • You want fallback when Anthropic has an outage.
    • You want to mix Claude with other models on a per-task basis.
    • You want a single billing surface across providers.
    • You want BYOK (bring your own key) routing where you mix your own provider keys.

    The trade-off: latency adds a few hundred milliseconds per call, and some Anthropic-specific features (prompt caching, certain beta tools) work less smoothly through the proxy.

    Custom In-House Routers

    Larger teams build their own routing layer. A typical pattern: a small Python or TypeScript service that inspects the incoming request, applies routing rules (length thresholds, task type detection, cost ceilings), picks a model, and forwards the call to Anthropic.

    This is overkill for most individuals. It pays off when you have:

    • Strict cost controls that need enforcement, not suggestion
    • Multi-tenant usage where different customers get different models
    • Compliance requirements that need request inspection and logging
    • A real engineering team that can maintain the service

    Routing Rules That Actually Work

    If you are going to invest in any routing logic, these are the rules that pay back:

    1. By task type. Code review → Opus. New code generation → Sonnet. Format conversion → Haiku.
    2. By input length. Long context (40K+ tokens) where you need careful reasoning → Opus. Long context where you need extraction → Sonnet with prompt caching.
    3. By cost ceiling. Anything over a threshold token count gets a hard cap or downgrade.
    4. By time of day. Overnight batch jobs route to cheaper models. Interactive daytime work routes to your preferred quality tier.
    5. By failure recovery. If a Sonnet call returns a low-confidence or refused response, retry once with Opus before giving up.

    Most of these rules are five lines of code each. The discipline is more about deciding the rules than implementing them.

    What Anthropic Does Not Yet Ship

    As of writing, Anthropic does not ship a built-in “route this query to the right model” intelligence layer in Claude Code. The model you set is the model you get for the session, with the exception of internal sub-agent routing.

    This is likely to change. The shape of where Claude Code is going — more autonomy, longer sessions, more parallel agents — implies more sophisticated internal routing. For now, the routing decisions worth making are the ones you make yourself.

    Costs: What Routing Actually Saves

    Concrete example. An operator running a Claude Code content pipeline that:

    • Drafts articles (Sonnet): 8,000 input + 4,000 output tokens per article
    • Generates SEO meta and FAQ (Haiku): 2,000 + 500 tokens
    • Reviews and edits (Opus): 10,000 + 2,000 tokens for trickier articles

    Running everything on Opus would roughly triple the cost. Running everything on Sonnet would save vs Opus but produce noticeably weaker meta-generation than Haiku at similar quality. Routing by task type saves real money — often 40-60% versus a single-model approach — without sacrificing output quality.

    When Not to Build a Router

    Routing is leverage when you operate at volume. If you run Claude Code casually — a couple of hours a day, one task at a time — you do not need a router. You need to learn the three models well enough to pick the right one by feel. Build a router only when (a) cost is a real line item in your budget, (b) you are running multiple workflows that have genuinely different model needs, or (c) you want fallback infrastructure for resilience.

    Frequently Asked Questions

    What is a Claude Code router?

    A Claude Code router is any layer — Anthropic’s built-in defaults, a third-party tool like OpenRouter, or custom code — that decides which Claude model handles a given request.

    Does Claude Code have built-in routing?

    Partial. Claude Code picks a default model (Sonnet) and routes internal sub-agent tasks to lighter models. It does not automatically promote your main session to Opus when a task gets hard.

    What’s the difference between OpenRouter and a custom router?

    OpenRouter is a hosted multi-provider gateway with billing and fallback built in. A custom router is something you build to enforce your own rules. OpenRouter is right for most teams. Custom routers are right for teams with strict requirements.

    Should I use OpenRouter with Claude Code?

    Useful if you want fallback, multi-provider mixing, or unified billing. Less useful if you only use Claude and want Anthropic-specific features like prompt caching to work optimally.

    How do I pick the right Claude model for a task?

    Default Sonnet. Opus for hard reasoning, architectural decisions, ambiguous specs. Haiku for high-volume formulaic tasks (classification, formatting, metadata).

    How much can routing save me?

    For volume users, 40-60% versus running everything on Opus, with no measurable drop in output quality if the routing rules are sensible.

    Is there a cost to routing through OpenRouter?

    OpenRouter adds a small markup on token pricing in exchange for the routing and aggregation features. For most users this is acceptable; for very high volume, going direct to Anthropic is cheaper.

    The Bottom Line

    Claude Code routing is leverage when you operate at volume and a distraction when you do not. Start by learning the three Claude models by feel and picking manually. Add OpenRouter if you want fallback. Build a custom router only when cost or compliance actually justifies the engineering. The router is not the goal; the right model on the right task is the goal.

  • Anthropic API Key: How to Get One, Set Up Billing & Keep It Safe (2026)

    Anthropic API Key: How to Get One, Set Up Billing & Keep It Safe (2026)

    Published: May 25, 2026 | Last fact-check: May 25, 2026 against Anthropic Console behavior and current API key format

    Quick Answer

    Get an Anthropic API key at console.anthropic.com → API Keys → Create Key. The key starts with sk-ant- and is shown once — copy and store it in a password manager immediately. Add billing credits before making API calls.

    Full setup, security, and usage walkthrough below.

    An Anthropic API key is the credential that lets your application, script, or tool call Claude programmatically. Whether you are wiring Claude into Claude Code, building an internal agent, or integrating Claude into a SaaS product, the API key is the first step. This walkthrough covers how to create one, how to keep it safe, and the most common mistakes people make in the first 48 hours after they have it.

    Anthropic API Pricing Tiers (June 2026)

    Model API ID Input (per MTok) Output (per MTok) Context
    Claude Opus 4.8 claude-opus-4-8 $5.00 $25.00 1M tokens
    Claude Sonnet 4.6 claude-sonnet-4-6 $3.00 $15.00 1M tokens
    Claude Haiku 4.5 claude-haiku-4-5-20251001 $1.00 $5.00 200K tokens

    All models support 50% Batch API discount for non-real-time requests. Prices verified June 9, 2026.

    What an Anthropic API Key Is (and Isn’t)

    The Anthropic API key authenticates requests to the Anthropic Messages API. It identifies which workspace and organization is making the call, what model permissions it has, and where to bill the token usage.

    What an API key is not: a login. You cannot use an API key to sign into claude.ai. The web interface and the API are separate billing surfaces. Your Pro or Max subscription does not grant API credit by default; API usage requires its own billing setup.

    How to Get an Anthropic API Key

    The process takes three minutes if you already have an Anthropic account, ten if you do not.

    1. Go to console.anthropic.com. This is the Claude Console (sometimes called the Anthropic Console), the developer dashboard separate from the consumer claude.ai interface.
    2. Sign in or create an account. If you already use claude.ai, your login works here. New accounts require email verification.
    3. Click “API Keys” in the left sidebar. You may need to expand the navigation under your workspace name first.
    4. Click “Create Key.” Give the key a descriptive name (e.g., “Claude Code Laptop,” “Production Backend,” “Local Dev”). The name is for your reference only.
    5. Copy the key immediately. Anthropic shows the full key exactly once. After you close the modal, you cannot retrieve it — only revoke it and create a new one.
    6. Store it in a password manager or secret vault. 1Password, Bitwarden, AWS Secrets Manager, GCP Secret Manager — anywhere except a text file on your desktop or a committed .env in a public repo.

    Adding Billing Before You Can Use the Key

    A common surprise: a freshly created API key cannot make calls until you add a payment method and credits to your Anthropic account. The key exists, but every request returns a billing error.

    To add billing:

    1. In the Claude Console, click “Billing” or “Plans & Billing” in the left sidebar.
    2. Add a payment method (credit card; Anthropic also supports invoicing for enterprise).
    3. Either pre-purchase API credits or enable auto-recharge. Most users enable auto-recharge with a low threshold to avoid hitting empty mid-job.
    4. Set a monthly usage limit if you want a safety cap.

    Once billing is set up, your API key works.

    Anthropic API Key Format

    An Anthropic API key starts with the prefix sk-ant- followed by a long alphanumeric string. The full key is roughly 100 characters. If your key does not start with sk-ant-, you have copied something incomplete.

    Different key types exist:

    • Live keys (sk-ant-api...): Production calls, real billing.
    • Admin keys (sk-ant-admin...): Workspace admin operations, not for inference calls.

    Most developers only need a live key.

    Which Claude Models the API Key Works With

    A standard live API key gives you access to the current generation of Claude models:

    • Claude Opus 4.8 (claude-opus-4-8) — current flagship, released April 16 2026. $5/$25 per million tokens.
    • Claude Sonnet 4.6 (claude-sonnet-4-6) — released February 17 2026. $3/$15 per million tokens. The production default for most workloads.
    • Claude Haiku 4.5 (claude-haiku-4-5) — released October 15 2025. $1/$5 per million tokens. Fast and cheap for high-volume work.

    Earlier model versions (Sonnet 4, Opus 4.6, Haiku 3.5, etc.) are still callable by their specific snapshot IDs until Anthropic announces deprecation. Check the deprecation timeline in the Claude Console for any model you depend on in production.

    How to Use the API Key

    You pass the key in the x-api-key header on every request to the Messages API:

    curl https://api.anthropic.com/v1/messages \
      --header "x-api-key: $ANTHROPIC_API_KEY" \
      --header "anthropic-version: 2023-06-01" \
      --header "content-type: application/json" \
      --data '{
        "model": "claude-opus-4-8",
        // Other current options: claude-sonnet-4-6, claude-haiku-4-5
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": "Hello"}]
      }'

    In Python or Node.js, the official SDKs read ANTHROPIC_API_KEY from your environment automatically. You should never hardcode the key in source code.

    Security: How to Not Leak Your Key

    Anthropic API keys leak constantly. Most leaks happen the same way:

    1. Committing the key to a public GitHub repo. The single most common leak. GitHub scans for known credential patterns and notifies Anthropic; your key gets auto-revoked within minutes. You will know because your calls suddenly start failing.
    2. Pasting the key into a shared chat or document. Anyone with access becomes a credential holder.
    3. Putting the key in client-side JavaScript. A browser app shipping its API key to users is giving the key away. Always proxy through a backend.
    4. Logging the key. Any logging system that captures HTTP headers can leak the key. Mask sensitive headers in your logger config.

    The good rule: treat your API key like a credit card number, because that’s what it functions as.

    Rotating an Anthropic API Key

    You should rotate keys quarterly at minimum, and immediately if a key is suspected compromised. Rotation in the Claude Console:

    1. Go to API Keys.
    2. Create a new key with a fresh name (e.g., “Claude Code Laptop 2026 Q3”).
    3. Update your application’s environment variable or secret manager to use the new key.
    4. Verify the new key works.
    5. Revoke the old key.

    The five-minute rotation is far cheaper than dealing with a leaked key that was used by an attacker for hours before you noticed.

    Workspace and Organization Keys

    Anthropic accounts are organized as: Organization → Workspaces → API Keys. Most individuals only use one of each. Teams use multiple workspaces to separate environments (production, staging, dev) or projects.

    Each key belongs to one workspace. Billing rolls up to the organization. If you need separate billing visibility per project, separate workspaces are the lever.

    Monitoring API Key Usage

    The Claude Console shows per-key usage in the “Usage” section. You can see:

    • Token spend per key per day
    • Model breakdown (Opus, Sonnet, Haiku usage)
    • Input vs output token split
    • Cache usage (if you have prompt caching enabled)

    Set up usage alerts in Billing. The Anthropic console can email you when daily or monthly spend crosses a threshold. This is the cheapest insurance against a runaway loop or compromised key.

    Frequently Asked Questions

    How do I get an Anthropic API key?

    Sign in to console.anthropic.com, open API Keys in the sidebar, click Create Key, name it, and copy the key immediately. You cannot retrieve the full key after closing the creation modal.

    Is the Anthropic API key free?

    The key itself is free to generate. Using it costs money — Anthropic bills per token at the API pricing in effect. You must add billing credits before the key works.

    Does my Claude Pro or Max subscription include API credits?

    No. Pro and Max subscriptions cover the chat interface and Claude Code (with usage caps). API usage is billed separately against your Anthropic account.

    What does an Anthropic API key start with?

    Live API keys start with sk-ant-api. Admin keys start with sk-ant-admin. The key is roughly 100 characters long.

    What happens if my Anthropic API key gets leaked?

    Anyone with the key can use it to make API calls billed to your account until the key is revoked. If you suspect a leak, revoke immediately in the Claude Console and check Usage for any suspicious activity.

    Can I use the same API key for Claude Code and my own app?

    You can, but you should not. Use separate keys per environment (Claude Code Laptop, Production Backend, Local Dev). Separate keys make revocation surgical instead of catastrophic.

    Where should I store my Anthropic API key?

    In a password manager (1Password, Bitwarden) for personal use, or in a secret manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) for production. Never commit it to a repo or hardcode it in source.

    How do I rotate an Anthropic API key?

    Create a new key in the Claude Console, update your application to use the new key, verify it works, then revoke the old key. Rotate quarterly as a baseline.

    The Bottom Line

    Getting an Anthropic API key is a three-minute process. Keeping it safe is a discipline. Use a password manager, rotate quarterly, never put the key in client-side code, and set usage alerts in the Claude Console. Treat the key as production infrastructure, not a developer toy, and it will serve you for years without incident.


    Frequently Asked Questions

    How do I get an Anthropic API key?

    Go to console.anthropic.com, sign in or create an account, then navigate to Settings > API Keys. Click ‘Create Key’, give it a name, and copy the key immediately — it is only shown once. You’ll need to add a credit card and funds to your account before making API calls.

    Is there a free tier for the Anthropic API?

    Anthropic does not offer a persistent free tier for the API. New accounts may receive a small initial credit to test the API. After that, all usage is billed at standard token rates. The free tier of claude.ai (the chat interface) is separate from API access.

    How much does the Anthropic API cost?

    As of June 2026: Claude Haiku 4.5 costs $1 input / $5 output per million tokens. Claude Sonnet 4.6 costs $3 input / $15 output per million tokens. Claude Opus 4.8 costs $5 input / $25 output per million tokens. The Batch API offers 50% off for non-real-time workloads.

    How do I keep my Anthropic API key secure?

    Never commit API keys to version control. Store them in environment variables or a secrets manager (AWS Secrets Manager, GCP Secret Manager, Vault). Use separate keys per application so you can rotate or revoke them independently. Set spending limits in the Anthropic console to cap accidental runaway costs.

    What happens if my Anthropic API key is compromised?

    Go to console.anthropic.com > Settings > API Keys immediately and click Revoke next to the compromised key. Create a new key and rotate it into your applications. Review your usage logs for unexpected spend. Anthropic will not refund charges made with a compromised key unless you contact support promptly.

    Can I use my Anthropic API key with Claude Code and Claude Cowork?

    Claude Code (the CLI tool) uses your API key when you run it outside a claude.ai subscription context. Claude Cowork (the desktop app) uses your subscription, not a raw API key. For self-hosted integrations, scripts, and Agent SDK workflows, your API key from console.anthropic.com is what you need.

  • Claude Code Pricing in 2026: Pro vs Max vs API Costs Explained

    Claude Code Pricing in 2026: Pro vs Max vs API Costs Explained

    Published: June 9, 2026 | Last fact-check: June 10, 2026 against Anthropic’s pricing page. Rates change — always verify at anthropic.com/pricing before commitments.

    Quick Answer

    Claude Code is included with Pro ($20/month), Max 5x ($100/month), Max 20x ($200/month), and Team Premium seats ($100/seat annual, 5-seat minimum). Team Standard does NOT include Claude Code. API-only billing is also available: Sonnet 4.6 at $3/$15 per million tokens, Opus 4.8 at $5/$25, Haiku 4.5 at $1/$5. Most individual developers get the best value from Max 5x at $100/month.

    Full pricing breakdown and which tier fits which user below.

    Claude Code pricing in 2026 is structured around two paths: subscription plans (Pro, Max, Team) that include Claude Code with usage caps, and API-only access where you pay Anthropic per token used. Most users choose a subscription. Heavy enterprise users sometimes choose the API path, and some use both.

    This guide breaks down what each tier actually costs, what you get, and which path makes sense for which kind of user. The price ceiling sits at the Max $200/month plan for individuals, and at custom enterprise contracts above that.

    Claude Code Subscription Plans (2026)

    Claude Code pricing: model cost breakdown (June 2026)

    Model Input $/MTok Output $/MTok Context Best for in Claude Code
    Claude Fable 5 $10 $50 1M tokens Most demanding reasoning, maximum capability
    Claude Opus 4.8 $5 $25 1M tokens Complex refactors, long-horizon agentic coding
    Claude Sonnet 4.6 $3 $15 1M tokens Daily development — best cost/capability ratio
    Claude Haiku 4.5 $1 $5 200k tokens Fast lookups, simple completions, cost control

    Prices from platform.claude.com as of June 10, 2026. Batch API reduces costs by 50%. Prompt caching can reduce input costs significantly for repeated context. Claude Code bills through your Anthropic API account.

    Claude Code subscription vs API billing

    Option How billed Best for
    Claude Max plan Flat monthly ($100 or $200) Heavy daily Claude Code users who want predictable costs
    API pay-as-you-go Per token used Variable usage, cost-optimized workflows, teams
    API with caching Per token (cached inputs discounted) Long system prompts or repeated context (e.g., large codebase)

    Anthropic offers four consumer-facing tiers that include Claude Code:

    Plan Price Best For
    Free $0 Trying Claude in the browser; not Claude Code
    Pro $20/month ($17/month annual) Light Claude Code use; focused coding sessions
    Max 5x $100/month (monthly only) Daily Claude Code users; solo devs and operators
    Max 20x $200/month (monthly only) Heavy users; multi-agent workflows; long sessions
    Team Standard $25/seat/mo ($20 annual, 5-seat minimum) Small teams; collaboration but NO Claude Code access
    Team Premium $100/seat/month (annual, 5-seat minimum) Engineering teams; required for Claude Code on Team plans
    Enterprise Custom Larger orgs with security/compliance needs

    Critical note for Team customers: Team Standard does NOT include Claude Code. You need Team Premium seats ($100/seat annual, $125/seat monthly) for any developer who needs Claude Code access. You can mix Standard and Premium seats on one team — useful when only part of your org codes.

    What Each Tier Actually Includes

    Pro: $20/month

    Pro gives you access to Claude.ai (the chat interface), Claude Desktop, and Claude Code via the CLI. Usage limits are tighter than most committed users prefer — running multi-file refactors or long agent sessions hits the cap quickly. Pro is reasonable as a starting point. It is not adequate for serious daily Claude Code work.

    Max 5x: $100/month

    The 5x designation refers to the rough multiplier on usage limits compared to Pro. For most individual developers who use Claude Code several hours per day, this tier provides enough headroom to work without running into limits constantly. It is the sweet spot for solo operators and small consultancies.

    Max 20x: $200/month

    20x headroom for users who run Claude Code as an always-on agent — overnight jobs, batch processing, multi-hour orchestration. If you find yourself routinely worried about hitting limits on the 5x tier, the 20x tier removes that worry.

    Team Standard: $20-25/seat/month (5-seat minimum)

    Team Standard gives a small group shared admin, SSO, SCIM, shared projects, usage analytics, and centralized billing. It is collaboration infrastructure. Crucially, Team Standard does not include Claude Code access — any developer who needs Claude Code must be on a Premium seat.

    Team Premium: $100-125/seat/month (5-seat minimum)

    Team Premium adds Claude Code to the Team Standard feature set. At $100/seat annual, the per-seat economics match individual Max 5x ($100/month) while adding team management. For an engineering team of 5+ developers using Claude Code daily, Team Premium is a straight upgrade over individual Max subscriptions. You can mix Standard and Premium seats on one team — non-coding teammates can sit on Standard while developers get Premium.

    Claude Code via API: Pay-Per-Token

    The alternative to a subscription is using Claude Code with API credentials directly. You provide an Anthropic API key, and your token usage gets billed against your Anthropic account at API rates.

    API pricing (per million tokens, May 2026 standard rates):

    • Claude Haiku 4.5: $1.00 input / $5.00 output — cheapest current-generation model, ideal for classification, routing, summarization at volume
    • Claude Sonnet 4.6: $3.00 input / $15.00 output — best price-to-quality ratio; the production default
    • Claude Opus 4.8: $5.00 input / $25.00 output — current flagship; complex reasoning and agentic coding
    • Prompt caching: cached reads at 10% of standard input rate — up to 90% savings on repeated context
    • Batch API: 50% off both input and output if you can wait up to 24 hours for results
    • Output:input ratio: consistently 5x across all current-generation models

    One catch with Opus 4.8: list price is identical to Opus 4.8, but Anthropic shipped a new tokenizer that can produce up to 35% more tokens for the same input text. Your effective bill per request can go up even though the rate card did not. Worth knowing before you switch your default model.

    Always check anthropic.com/pricing for current rates — these change.

    For heavy users, the API path can be cheaper than Max, but you give up the predictability of a flat monthly fee. For lighter users, the API path is almost always more expensive than Pro.

    How to Decide: Subscription vs API

    The decision tree is simpler than it looks.

    • You use Claude Code less than an hour a day: Pro at $20/month.
    • You use Claude Code several hours a day: Max 5x at $100/month.
    • You run Claude Code as an unattended agent or for batch work: Max 20x at $200/month, or API with prompt caching enabled.
    • You’re a team of 5+ developers: Team Premium at $100/seat/month (annual; $125 monthly), or look at Enterprise.
    • You have unpredictable spikes: API with budget alerts gives you the most control.

    What’s Not Included in Subscription Plans

    Even on Max 20x, a few things still cost extra or fall outside the standard plan:

    • Anthropic API tokens for non-Claude Code use: If you build apps that call the Anthropic API directly, those tokens bill against API credits, not your Max subscription.
    • Third-party MCP servers with their own costs: Many MCP servers are free, but some integrate with paid services that bill you separately.
    • Storage and infrastructure costs: Where you actually run Claude Code (your laptop, your cloud VM) still costs whatever it costs.

    Hidden Value: Why Max Pays Back Quickly

    $100/month sounds steep until you compare it to what Claude Code replaces. For an operator running multi-step content workflows, infrastructure automation, or coding tasks that would otherwise require additional contracting hours, the Max plan typically pays back inside the first week of the month.

    One concrete example: drafting and publishing a single SEO-optimized WordPress article with full schema, taxonomy, internal linking, and AEO/GEO optimization takes a human content team 3-5 hours. Running it through a Claude Code pipeline takes 15 minutes of supervised work. The output quality difference is small; the cost difference is large.

    This is the framing that matters: Claude Code pricing is not “how much does the AI cost.” It is “how much labor does the AI replace.” On that framing, Max 5x is the cheapest line item in most knowledge-work budgets.

    Annual vs Monthly Billing

    Anthropic offers a discount for annual prepayment on Pro and Max tiers — generally around 20% off. If you are confident in your usage pattern, the annual prepay is the right call. If you are still evaluating, monthly gives you flexibility to change tiers as your needs shift.

    New for June 15, 2026: the Agent SDK Credit Pool (Dual-Bucket Billing)

    Starting June 15, 2026, Anthropic splits subscription usage into two buckets: interactive Claude Code sessions keep drawing from your normal plan limits, while unattended Agent SDK work (claude -p, cron jobs, CI pipelines, scripts) draws from a new monthly credit pool — Pro $20, Max 5x $100, Max 20x $200, Team Standard $20/seat, Team Premium $100/seat — with overage billed at standard API rates.

    Practical impact: if you run any headless automation on a subscription today, that usage stops counting against your interactive limits and starts metering against the credit pool. Light automation — a nightly script or two — fits comfortably inside Pro’s $20 pool; sustained agent fleets will spill into API-rate overage, at which point a dedicated API key is usually easier to manage. Full mechanics, worked examples, and what to do before the cutover: Claude Agent SDK dual-bucket billing — what changes June 15, 2026. To model your own numbers, use the interactive calculator on our main Claude pricing page.

    Frequently Asked Questions

    How much does Claude Code cost per month?

    Claude Code is included with Claude Pro ($20/month), Max 5x ($100/month), or Max 20x ($200/month). API-only usage is billed per token at separate rates.

    Is there a free version of Claude Code?

    No. Claude Code requires either a paid Claude subscription (Pro, Max, or Team) or API credentials with a funded account. The Claude free tier does not include Claude Code.

    What’s the difference between Max 5x and Max 20x?

    The numbers refer to roughly how much usage you get relative to Pro. Max 5x ($100/month) suits daily developers. Max 20x ($200/month) suits heavy users running agent workflows or long batch jobs.

    Can I use Claude Code with just an API key instead of a subscription?

    Yes. Claude Code accepts an Anthropic API key for authentication. You pay per-token usage at API rates instead of a flat subscription fee.

    Is Claude Code cheaper than GitHub Copilot or Cursor?

    At the entry level, Copilot ($10/month) and Cursor Pro ($20/month) cost less than Max. Per unit of output for serious work, Claude Code on Max often comes out cheaper because of how much it can do per session.

    Does Team pricing include Claude Code?

    Only Team Premium ($100/seat annual, $125/seat monthly, 5-seat minimum) includes Claude Code. Team Standard does NOT include Claude Code. You can mix Standard and Premium seats on the same team so non-coding teammates can sit on Standard while developers get Premium.

    What happens if I hit my Claude Code usage limit?

    On Pro and Max, Claude Code slows or pauses until your usage window resets (typically rolling 5-hour windows on Pro, longer reset cadences on Max). You can upgrade tiers anytime for immediate additional capacity.

    The Bottom Line on Claude Code Pricing

    For most serious users: Max 5x at $100/month. For light users: Pro at $20/month. For heavy agent workloads: Max 20x at $200/month or API with prompt caching. The pricing is competitive with other AI coding tools, and the value relative to labor it replaces makes Max the cheapest line item on most knowledge-work budgets.


    More Claude Code Pricing Questions: Plans, Seats, and Limits

    Is Claude Code free?

    Claude Code is not free. It requires a paid subscription: Pro ($20/month), Max 5x ($100/month), Max 20x ($200/month), or Team Premium seats ($100/seat/month annual). The Free tier does not include Claude Code. API-only access is also available at standard token rates.

    What is the cheapest plan that includes Claude Code?

    Pro at $20/month is the cheapest Claude subscription that includes Claude Code. However, Pro has tighter usage limits and heavy Claude Code sessions will hit the cap quickly. For daily developer use, Max 5x at $100/month provides much more headroom.

    Does Claude Code use API tokens from my subscription?

    Claude Code usage counts against your subscription plan’s included usage, not against separate API credits. Subscription plans and API access are billed separately — a Pro subscription does not give you API credits. If you need programmatic API access alongside Claude Code, you need both.

    How does Claude Code pricing compare to GitHub Copilot?

    GitHub Copilot costs $10–$19/month for individuals. Claude Code starts at $20/month (Pro) with usage limits, or $100/month (Max 5x) for heavier use. Claude Code offers a larger context window and stronger reasoning for complex multi-file tasks; Copilot has tighter IDE integration. For pure code completion, Copilot is cheaper. For agentic coding and large-context work, Claude Code is more capable.

    Can I use Claude Code on a Team Standard plan?

    No. Team Standard ($25/seat/month annual) does not include Claude Code. Only Team Premium seats ($100/seat/month annual) include Claude Code. You can mix Standard and Premium seats on one Team plan — assign Premium only to developers who need Claude Code.

    What happens to Claude Code usage when I hit my plan limit?

    When you hit your included usage limit, you can continue on Pro, Max 5x, and Max 20x using extra usage billed at standard API rates with a spending cap you set. This prevents surprise overages while keeping Claude Code available for critical work beyond your plan ceiling.

    Claude Code API and Model Questions

    How much does Claude Code cost in 2026?

    Claude Code bills through your Anthropic API account based on which model you use. As of June 2026: Claude Opus 4.8 costs $5/$25 per million input/output tokens; Claude Sonnet 4.6 costs $3/$15 per MTok; Claude Haiku 4.5 costs $1/$5 per MTok; Claude Fable 5 (the new June 2026 flagship) costs $10/$50 per MTok. There is no separate Claude Code subscription — usage is API-billed. Heavy users may find the Claude Max plan ($100–$200/month flat) more cost-effective.

    What is the cheapest way to use Claude Code?

    Use Claude Haiku 4.5 ($1/$5 per MTok) for simple tasks and Claude Sonnet 4.6 ($3/$15 per MTok) for most development work. Enable prompt caching for large codebases — repeated context (like a long system prompt or frequently referenced file) is cached and billed at a significant discount. Use the Message Batches API for non-real-time work to get 50% off standard rates. Reserve Opus 4.8 or Fable 5 for tasks that genuinely require maximum capability.

    Does Claude Code have a subscription plan?

    Claude Code itself does not have its own subscription — it bills through your Anthropic API account. However, the Claude Max plan ($100/month for 5x usage limits, or $200/month for 20x limits) can cover Claude Code usage. If you’re using Claude Code heavily every day, Max may be more cost-effective than pure pay-as-you-go API billing. Check platform.claude.com/docs/en/about-claude/pricing for current plan details.

    Which Claude model should I use with Claude Code?

    Claude Sonnet 4.6 is the best default for most Claude Code workflows — it offers near-Opus intelligence at half the price ($3 vs $5 per input MTok) and supports extended thinking. Use Claude Opus 4.8 for complex multi-file refactors or architecturally difficult problems where output quality is worth the premium. Claude Fable 5 (launched June 10, 2026) is available for maximum capability tasks. Use Haiku 4.5 for fast, cheap lookups and simple completions.

    Does Claude Code support prompt caching?

    Yes. Claude Code supports Anthropic’s prompt caching feature. For workflows where you repeatedly pass the same large context — a codebase system prompt, a long CLAUDE.md file, frequently referenced documentation — prompt caching stores that context and bills repeated reads at a discounted rate. This can significantly reduce costs for projects with large persistent context. See platform.claude.com/docs/en/build-with-claude/prompt-caching for implementation details.

    How do I track my Claude Code API spending?

    Monitor usage at platform.claude.com — the console shows token usage and cost by model, date range, and API key. Set spending limits on your API key to cap maximum monthly spend. For teams, use separate API keys per project or environment to attribute costs. The usage dashboard updates in near-real time so you can catch runaway spend before it compounds.