Tag: Anthropic

  • Claude Opus 4.8 Feature Deep Dive: Context, Extended Thinking & Task Budgets (2026)

    Claude Opus 4.8 Feature Deep Dive: Context, Extended Thinking & Task Budgets (2026)

    Last refreshed: June 9, 2026

    Model Accuracy Note — Updated June 9, 2026

    Current flagship: Claude Opus 4.8 (claude-opus-4-8). Current models: Opus 4.8 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.8 (claude-opus-4-8) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Claude Opus 4.8 Key Features (June 2026)

    Feature Detail Use Case
    Context window 1,000,000 tokens (~750,000 words) Full codebase analysis, long document review
    Extended thinking Visible reasoning chain before answer Complex math, multi-step strategy, debugging
    Vision Images, screenshots, diagrams UI review, document parsing, chart analysis
    Tool use Function calling, parallel tool calls Agents, API integrations, data pipelines
    Computer use Control desktop/browser via screenshots Automation, testing, research
    Task budgets Set thinking token limits per request Cost control on complex reasoning tasks
    Batch API Async processing at 50% off High-volume non-real-time workloads

    What this article covers

    Three features in Opus 4.8 deserve their own explanation because they change what’s actually possible in daily work, not just what’s bigger on a benchmark chart:

    1. Task budgets (beta) — per-subtask ceilings that tame agent cost variance.
    2. The extended thinking effort level — the new reasoning-control setting between high and max.
    3. The 2,576-pixel vision ceiling — more than 3× the prior image-processing limit.

    Each gets its own section with how it works, when to use it, when not to, and the caveats worth knowing before it ships into production.


    Feature 1: Task budgets (beta)

    What it is. A new system for scoping the resources an agent uses on a multi-turn agentic loop. Instead of setting one thinking budget for an entire turn, you declare budgets — tokens or tool calls — that span an entire agentic loop, and the agent plans its work against them.

    The problem it solves. Agent runs have notoriously high cost variance. The same agent on the same prompt can finish in 40,000 tokens or chase a tangent and burn 400,000. Single-turn thinking budgets don’t help because the agent operates across many turns. Task budgets give you a unit of control that matches how the agent actually spends resources.

    How the agent uses them. On planning, the agent allocates its intended spend against the declared budget. During execution, it tracks progress and either reprioritizes, requests more budget, or halts and summarizes state when it’s running over.

    Behavior note: budgets are soft, not hard. The agent is nudged to respect them, not hard-cut. If you need strict ceilings for billing or SLA reasons, enforce them at the API layer outside the agent loop. Task budgets are for behavior shaping, not hard resource limiting.

    When to use them.
    – Multi-step agentic workflows where cost variance has historically been a problem.
    – Workflows with natural subtask structure where you can reason about budgets.
    – Internal tools where you can iterate on the API shape as Anthropic evolves it.

    When not to use them.
    – Simple single-turn requests. Task budgets are overhead that doesn’t pay off on short interactions.
    – Production contracts that are painful to version. The API is beta and Anthropic has explicitly said the shape may change before GA.
    – Workflows where you need provable hard cutoffs. Enforce those at the API layer, not via this feature.

    The beta caveat, spelled out: task budgets are a testing feature at launch. Parameter names and shape may change. Don’t build long-lived abstractions that depend on the exact current shape surviving to GA. Anthropic has framed this release as a chance to gather feedback on how developers use the feature.


    Feature 2: The extended thinking effort level

    What it is. A new setting for reasoning effort, slotted between high and max. Opus 4.6 had three levels: low, medium, high. Opus 4.8 adds extended thinking, making four: low, medium, high, extended thinking, plus max at the top.

    Why it exists. Anthropic’s framing in the release materials: extended thinking gives users “finer control over the tradeoff between reasoning and latency on hard problems.” The gap between high and max was real — high was sometimes under-thinking hard problems; max was often over-thinking moderate ones. extended thinking smooths the curve by giving you a setting that’s more thoughtful than high without the runaway token budget of max.

    Anthropic’s own guidance. “When testing Opus 4.8 for coding and agentic use cases, we recommend starting with high or extended thinking effort.” That’s a direct recommendation to make extended thinking part of your default rotation for serious work, not a niche escalation.

    How to use it.
    – Keep high as the default for routine work.
    – Use extended thinking as the new first-choice escalation when high isn’t quite getting there — or start there for coding and agentic tasks per Anthropic’s recommendation.
    – Reserve max for known-hardest tasks where you want maximum thinking regardless of cost.

    Important tradeoff. Higher effort levels in 4.7 produce more output tokens than the same levels did in 4.6. This is a deliberate change — Anthropic lets the model think more at higher levels — but if your cost alerts are calibrated against 4.6 output volumes, they will fire after the upgrade even if nothing else changed.

    An API note worth flagging. Opus 4.8 removed the extended thinking budget parameter that existed in 4.6. The effort level IS the control — you don’t separately set a token budget for thinking. If your 4.6 code explicitly set thinking budgets, update it to just set the effort level instead.

    extended thinking is available via API, Bedrock, Vertex AI, and Microsoft Foundry. On Claude.ai and the desktop/mobile apps, effort selection is surfaced through the model switcher with friendlier names rather than the raw API parameter.


    Feature 3: The 2,576-pixel vision ceiling

    What changed. Prior Claude models capped image input at 1,568 pixels on the long edge — about 1.15 megapixels. Opus 4.8 processes images up to 2,576 pixels on the long edge — about 3.75 megapixels, more than 3× the prior pixel budget.

    Why this matters more than it sounds. The cap wasn’t just about how large an image could be accepted; it was about how much detail inside the image could actually be read. Under the old 1.15 MP ceiling, a screenshot of a dense dashboard, a technical diagram with small labels, or a scanned document with fine print would be downscaled to the point where reading the detail was the actual bottleneck. 4.7 removes that bottleneck for images up to the new ceiling.

    Coordinate mapping is now 1:1. This is a separate but related change. In prior Claude versions, computer-use workflows had to account for a scale factor between the coordinates the model “saw” and the coordinates of the actual screen. On Opus 4.8, the model’s coordinate output maps 1:1 to actual image pixels. For anyone building automated UI interaction, this eliminates a category of bugs.

    What this enables that 4.6 struggled with:

    • Dense UI screenshots. Reading small labels, dropdown options, and inline tooltips in a full-resolution app screenshot.
    • Technical diagrams. Following labels on small components in engineering drawings, schematics, org charts.
    • Scanned documents. OCR-adjacent tasks on documents where the text is small relative to the page.
    • Chart details. Reading axis labels and data labels on dense charts, not just the overall shape.
    • Multi-panel content. Comics, infographics, and documents with small type in multiple zones.
    • Pointing, measuring, counting. Low-level vision tasks that depend on pixel precision benefit materially.
    • Bounding-box detection. Image localization tasks show clear gains.

    What it doesn’t change.

    • Images beyond 2,576px still get downscaled to the ceiling. The ceiling is higher; it’s not gone.
    • Video frames are handled differently and aren’t covered by this change.
    • Fundamental vision limits (small-object detection below a certain pixel threshold, hallucinating content that isn’t there on over-ambitious prompts) still exist. More pixels ≠ omniscience.

    Pricing and token cost. Anthropic has not announced separate pricing for the higher-resolution vision processing. Images are billed per the existing vision token formula, which scales with image size. Larger images cost more tokens; that’s not new. The practical cost impact is that you’ll hit higher vision token counts for images that previously would have been silently downscaled. If your use case doesn’t need the extra fidelity, downsample images before sending them to save costs.

    How to use it.

    Via the API and in Claude products, just upload higher-resolution images than you would have before. No special parameter. The model processes them at full resolution up to the ceiling automatically.

    response = client.messages.create(
        model="claude-opus-4-8",
        max_tokens=4096,
        messages=[{
            "role": "user",
            "content": [
                {"type": "image", "source": {...}},  # up to 2576px long edge
                {"type": "text", "text": "Extract the values from the chart."},
            ],
        }],
    )
    

    A caveat worth noting. The 2,576px ceiling is the processing ceiling. Client-side size limits (file size, API request size) still apply. Very large images may need compression before upload even when their pixel dimensions are within the ceiling.


    How these three features compose

    The three features aren’t independent. For agentic coding work in particular, they compose in ways that matter.

    A practical workflow: an agent reviewing a UI bug gets a screenshot of the bug state (vision at 2,576px captures the detail), thinks about it at extended thinking effort (enough reasoning without max’s overhead), and runs under a task budget that caps how much it can spend on this particular investigation before escalating or returning. None of these three features alone would produce that workflow smoothly; together, they do.

    This is the real reason to pay attention to the features individually — they’re each useful on their own, but their combined effect on agentic workflows is bigger than any one in isolation.


    Frequently asked questions

    Are task budgets available on Claude.ai, or API only?
    API only. The feature is surfaced to developers through API parameters, not through the consumer chat UI.

    Can I use extended thinking on Claude.ai?
    Effort level is exposed to consumers through the model switcher. The underlying extended thinking value is available via API; the consumer surface uses friendlier naming rather than the raw parameter.

    Does the vision processing capabilities apply to all Claude products?
    Yes — Claude.ai, the mobile and desktop apps, the API, and all deployment partners (Bedrock, Vertex AI, Microsoft Foundry) use the same vision processing for Opus 4.8.

    Are task budgets a replacement for max_tokens?
    No. max_tokens is a hard cap on output length for a single message. Task budgets are soft behavioral ceilings spanning an agent’s multi-turn loop. Use both.

    Does extended thinking use a different API parameter than high?
    No — it’s just another value for the same effort parameter. Note that Opus 4.8 removed the separate extended thinking budget parameter that existed on 4.6: the effort level IS the thinking control on 4.7.

    Will these features come to Opus 4.6?
    No. They’re Opus 4.8 features. 4.6 continues to run on its prior behavior.

    Does extended thinking cost more than high?
    Yes, indirectly. Per-token pricing is the same. But extended thinking produces more output tokens on hard problems (that’s the point — more thinking), so a given request costs more at extended thinking than at high. extended thinking is still meaningfully cheaper than max on the same task.


    Related reading

    • The full release: Claude Opus 4.8 — Everything New
    • For developers: Opus 4.8 for coding in practice
    • Comparison: Opus 4.8 vs GPT-5.4 vs Gemini 3.1 Pro
    • The Mythos angle: why Anthropic admitted Opus 4.8 is weaker than an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.8.

    Frequently Asked Questions

    What are the key features of Claude Opus 4.8?

    Claude Opus 4.8 (claude-opus-4-8) is Anthropic’s current flagship model with a 1 million token context window, extended thinking (visible reasoning chain), vision capabilities, tool use with parallel function calling, computer use for desktop automation, and configurable task budgets for cost control on reasoning-heavy tasks. Available via API at $5 input / $25 output per million tokens.

    What is extended thinking in Claude Opus 4.8?

    Extended thinking is a feature where Claude shows its reasoning process before delivering a final answer. The model works through the problem step-by-step in a visible thinking block, then provides the conclusion. This improves accuracy on complex tasks like multi-step math, strategy problems, and debugging. You can set a thinking token budget to control cost.

    How does Claude Opus 4.8’s 1M token context work?

    The 1 million token context window lets Claude Opus 4.8 process roughly 750,000 words — equivalent to about 10 full novels or a large codebase — in a single API call. Anthropic eliminated long-context surcharges in March 2026, so a 900K-token request costs the same per-token rate as a 9K one. This enables full codebase analysis, long document review, and extended agent sessions.

    What is the task budget feature in Claude Opus 4.8?

    Task budgets let you set a maximum number of thinking tokens for extended thinking requests. This gives you cost predictability on complex reasoning tasks. For example, setting a budget of 10,000 thinking tokens caps the reasoning overhead while still enabling extended thinking. Higher budgets generally improve accuracy on harder problems.

    Is Claude Opus 4.8 the best model for computer use?

    Yes, Claude Opus 4.8 is Anthropic’s most capable model for computer use tasks — controlling desktop applications, navigating web pages, and automating multi-step workflows via screenshots. Claude Sonnet 4.6 also supports computer use at lower cost. Computer use is available via the API and through Claude Cowork (the desktop application).

    When should I use Opus 4.8 vs Sonnet 4.6?

    Use Claude Opus 4.8 when task complexity demands the best reasoning: analyzing large codebases, writing complex technical documents, extended agent workflows, or tasks where extended thinking significantly improves output quality. Use Claude Sonnet 4.6 ($3/$15 per MTok, 40% cheaper) for most everyday tasks — writing, coding, analysis — where Opus-level reasoning is not needed.

  • Claude Opus 4.8 vs GPT-5 vs Gemini 2.5 Pro: Head-to-Head (June 2026)

    Claude Opus 4.8 vs GPT-5 vs Gemini 2.5 Pro: Head-to-Head (June 2026)

    Last refreshed: June 9, 2026

    Model Accuracy Note — Updated June 9, 2026

    Current flagship: Claude Opus 4.8 (claude-opus-4-8). Current models: Opus 4.8 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.8 (claude-opus-4-8) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Claude Opus 4.8 vs GPT-5 vs Gemini 2.5 Pro: Head-to-Head (June 2026)

    Attribute Claude Opus 4.8 GPT-5 Gemini 2.5 Pro
    Developer Anthropic OpenAI Google DeepMind
    API ID claude-opus-4-8 gpt-5 gemini-2.5-pro
    Context window 1M tokens 128K tokens 1M tokens
    Input price (per MTok) $5.00 $15.00 $3.50
    Output price (per MTok) $25.00 $75.00 $10.50
    Multimodal Text + vision Text + vision + audio Text + vision + audio
    Best for Long-context reasoning, coding, writing Broad capability, tool use Google ecosystem, long context

    Prices verified June 9, 2026 from official platform documentation. GPT-5 pricing from platform.openai.com. Gemini 2.5 Pro pricing from ai.google.dev.

    The short verdict

    • Best for agentic coding and long-horizon engineering: Opus 4.8.
    • Best for single-turn function calling and ecosystem breadth: GPT-5.
    • Best for multimodal input volume and long-context retrieval: Gemini 2.5 Pro.
    • Cheapest at the frontier: Gemini 2.5 Pro. Most expensive: GPT-5.
    • If you can only pick one for general knowledge work in June 2026: Opus 4.8.

    The full reasoning is below. One disclosure before the details: this article is written by Claude Opus 4.8. I am one of the models being compared. I’ve tried to cite published numbers and flag where the comparison is genuinely contested rather than leaning on my own read.


    Pricing as of April 16, 2026

    Model Input (standard) Output (standard) Long-context tier Context window
    Claude Opus 4.8 $5 / M tokens $25 / M tokens Same across window 1M tokens
    GPT-5 $5.00 / M tokens $15 / M tokens $5 / $22.50 over 272K 1M tokens (272K before surcharge)
    Gemini 2.5 Pro $2 / M tokens $12 / M tokens $4 / $18 over 200K 1M tokens (some listings cite 2M)

    Takeaways:
    – Gemini 2.5 Pro is the cheapest per token at the frontier — 7.5× cheaper on input than Opus 4.8 and 2× cheaper than GPT-5 at standard context.
    – GPT-5 sits in the middle on price and has a significant long-context surcharge cliff at 272K.
    – Opus 4.8 is the most expensive per token, with no long-context surcharge.
    – All three now have 1M-class context windows, but Opus 4.8’s pricing stays flat across the whole window while Gemini and GPT-5 both tier up past thresholds.

    Tokenizer caveat: Opus 4.8 uses a new tokenizer that produces up to 1.35× more tokens per input than Opus 4.6 did, depending on content type. Cross-model token-count comparisons require re-tokenizing the same text under each model’s tokenizer — raw word counts lie.


    Benchmarks, with the caveats included

    Anthropic, OpenAI, and Google all publish benchmark numbers. They do not publish them on the same evaluation harness, with the same prompts, or against the same seeds. Treat the following as directional, not definitive.

    Agentic coding (long-horizon, multi-file):
    – Opus 4.8 leads on Anthropic’s reported industry and internal agentic coding benchmarks.
    – GPT-5 is competitive on single-turn function calling and tool use. Roughly 80% on SWE-bench Verified at launch.
    – Gemini 2.5 Pro scored 80.6% on SWE-bench Verified at launch — essentially tied with GPT-5.

    Multidisciplinary reasoning (GPQA Diamond and similar):
    – Opus 4.8 leads on Anthropic’s comparisons.
    – GPT-5 and Gemini 2.5 Pro are close. Gemini reports 94.3% on GPQA Diamond.

    Scaled tool use and agentic computer use:
    – Opus 4.8 leads on Anthropic’s reported benchmarks.
    – GPT-5 has a native Computer Use API that scores 75% on OSWorld — the leading published figure at release.
    – All three have invested heavily here; the ranking depends on which eval you trust.

    Vision (document understanding, dense-screenshot extraction):
    – Opus 4.8’s jump from 1.15 MP to 3.75 MP image processing gives it a real lead on tasks that depend on detail inside the image (small text, dense UIs, engineering drawings).
    – Gemini 2.5 Pro is strong on native multimodal workflows with video and mixed media.
    – GPT-5 is solid but not leading on either axis.

    Long-context retrieval:
    – All three now have 1M-class context windows.
    – Gemini 2.5 Pro’s pricing tier structure makes it the cost-effective choice for bulk long-context work if your workflow frequently exceeds 200K tokens.
    – Opus 4.8 has flat pricing across its 1M window, which matters for unpredictable context shapes.
    – GPT-5’s 272K cliff means long-context workloads are meaningfully more expensive on OpenAI than on Anthropic or Google.

    Specialized coding benchmarks:
    – GPT-5.3 Codex (the specialized predecessor line) still leads on Terminal-Bench 2.0 and SWE-Bench Pro on some scores. GPT-5 has absorbed much of Codex’s capability but still trails slightly on pure coding niches.
    – Gemini 2.5 Pro has notable strength on creative coding and SVG generation.
    – Opus 4.8 is strongest on agentic and multi-file coding specifically.

    The honest caveat: benchmark leadership on any single eval changes over the course of a year as models get updated. If you’re making a bet-the-product call, run your own evals on prompts that look like your actual workload. The published benchmarks are a screening tool, not a decision tool.


    How they differ in behavior, not just benchmarks

    Opus 4.8 — the engineering-minded generalist.
    Tends toward thoroughness over speed. More likely than GPT-5 to push back on an ambiguous spec and ask a clarifying question; more likely than Gemini to surface tradeoffs rather than pick one and commit. Strong at long-horizon tasks where state matters. Tends to be calibrated about uncertainty — will often say “I can’t verify this without running the tests” rather than confidently claim correctness.

    GPT-5 — the product-native operator.
    Tends toward action over deliberation. Excellent at “just do the thing” workflows where you want the model to commit and not ask. Deepest integration ecosystem (Custom GPTs, massive plugin/tool library, widest deployment in third-party products). Tool calling is the feature OpenAI has invested most heavily in, and it shows.

    Gemini 2.5 Pro — the multimodal long-context specialist.
    Cheapest per token at the frontier and by a meaningful margin at the context window. Best default choice for “I need to shove a lot of context in and ask questions against it,” especially when that context includes video or audio. Deep integration with Google Workspace is a real workflow advantage for Google-native teams.

    None of these are absolute; all three models handle general tasks well. These are behavioral tendencies, not capability ceilings.


    “Choose X if” decision framework

    Choose Claude Opus 4.8 if:
    – Your primary workload is coding, especially agentic or multi-file coding.
    – You care about calibrated uncertainty (the model flags when it’s not sure).
    – You’re using or planning to use Claude Code for engineering work.
    – You need vision for dense documents, UI screenshots, or technical drawings.
    – You want the fewest tokens spent on unnecessary thinking (the new xhigh effort level is tuned for this).

    Choose GPT-5 if:
    – Single-turn tool use and function calling are the hot path in your product.
    – You need the broadest ecosystem of third-party integrations right now.
    – Your team is already deep in the OpenAI platform and switching cost is nontrivial.
    – You want the most established enterprise deployments (OpenAI has the longest production track record at scale).

    Choose Gemini 2.5 Pro if:
    – You’re price-sensitive and running high-volume workloads.
    – You need 1M+ token context as the default, not as an add-on.
    – Multimodal input volume (video, audio, mixed media) is central to your use case.
    – Your team is deep in Google Cloud or Workspace.

    Use multiple if:
    – You’re doing serious AI product work. Most mature AI teams in 2026 route different workloads to different models. A common pattern: Opus 4.8 for code generation and agent orchestration, Gemini 2.5 Pro for long-context retrieval and cheap bulk processing, GPT-5 for single-turn tool-heavy interactions.


    Where this comparison will change

    The frontier is moving. Three things to watch over the next six months:

    1. Claude Mythos Preview. Anthropic publicly acknowledged that Mythos outperforms Opus 4.8 on most of the benchmarks in the 4.7 release post. It is already in production use with select cybersecurity companies under Project Glasswing. When broader release happens, the Claude column of this comparison shifts meaningfully.

    2. GPT-5.5 / GPT-6. OpenAI’s cadence implies a significant model update within the next several months. The pattern over the past year has been incremental 5.x releases; a ground-up generation shift would reset the comparison.

    3. Gemini 3.5 / 4. Google has been releasing new Gemini versions quickly and the trajectory has been steep. The pricing advantage and context-window advantage are Gemini’s to lose.

    None of these are speculation-free predictions. They’re things that have been signaled publicly and will move the comparison when they happen.


    Frequently asked questions

    Is Claude Opus 4.8 better than GPT-5?
    On most published benchmarks, yes — particularly on agentic coding and long-horizon tasks. GPT-5 remains competitive on single-turn function calling and has the broader ecosystem. “Better” depends on the workload.

    Is Gemini 2.5 Pro cheaper than Opus 4.8?
    Significantly. At $2/$12 per million input/output tokens vs. Opus 4.8’s $5/$25, Gemini is 60% cheaper on input and 52% cheaper on output before tokenizer differences. At scale this is a material cost gap.

    Which model has the biggest context window?
    All three now have 1M-class context windows. Some Gemini 2.5 Pro documentation cites a 2M window. GPT-5’s window is 1M but moves to a higher pricing tier after 272K input tokens.

    Which model is best for coding?
    Opus 4.8 leads on agentic and long-horizon coding benchmarks. GPT-5 is close on single-turn coding. Gemini 2.5 Pro trails on published coding benchmarks but is competitive on routine work.

    Which model should I use for my startup?
    Most mature teams route workloads to multiple models. If you’re just starting and need to pick one, Opus 4.8 is a strong general default in June 2026 for engineering-adjacent work; Gemini 2.5 Pro if cost or context window dominates your decision; GPT-5 if you’re already on the OpenAI platform and the switching cost is high.

    Does Claude Opus 4.8 support function calling?
    Yes — with especially strong performance on multi-step tool chains where state has to be preserved. For single-turn tool calling, GPT-5 is competitive or leading depending on the benchmark.


    Related reading

    • Full Opus 4.8 feature set: Claude Opus 4.8 — Everything New
    • Opus 4.8 for coding specifically: xhigh, task budgets, and the 13% benchmark lift
    • The Mythos angle: why Anthropic admitted Opus 4.8 is weaker than an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.8 — yes, one of the models being compared. Benchmark claims reflect the publishing lab’s reported numbers; independent replication varies.

    Frequently Asked Questions

    Is Claude Opus 4.8 better than GPT-5?

    It depends on the task. Claude Opus 4.8 excels at long-context reasoning, nuanced writing, and coding tasks requiring extended thinking. GPT-5 has broader multimodal capabilities including audio. For pure text reasoning and large-document analysis, Claude Opus 4.8’s 1M token context gives it a significant advantage. GPT-5 is more expensive at $15/$75 per million tokens vs Opus 4.8’s $5/$25.

    How does Claude Opus 4.8 compare to Gemini 2.5 Pro?

    Both Claude Opus 4.8 and Gemini 2.5 Pro support 1M token context windows. Gemini 2.5 Pro is cheaper at $3.50/$10.50 per million tokens vs Opus 4.8’s $5/$25. Claude Opus 4.8 generally rates higher on reasoning and coding benchmarks. Gemini 2.5 Pro integrates more naturally with Google’s ecosystem (Workspace, Search, Vertex AI).

    Which AI model is best for coding in 2026?

    Claude Opus 4.8 and Claude Sonnet 4.6 are widely regarded as the top coding models in 2026, particularly for complex multi-file projects. Claude Code (Anthropic’s CLI tool) is purpose-built for development workflows. GPT-5 is also strong for coding. Gemini 2.5 Pro integrates well with Google Cloud development workflows.

    What is the cheapest frontier AI model in 2026?

    Claude Haiku 4.5 ($1/$5 per MTok) and Gemini 2.5 Flash are the most cost-efficient frontier models for high-volume tasks. For flagship-tier capability, Gemini 2.5 Pro ($3.50/$10.50) is cheaper than Claude Opus 4.8 ($5/$25) or GPT-5 ($15/$75). The right choice depends on task complexity and volume.

    Is GPT-5 worth the higher price vs Claude Opus 4.8?

    For most text and coding workloads, no. Claude Opus 4.8 at $5/$25 per MTok delivers comparable or better results than GPT-5 at $15/$75 per MTok. GPT-5’s premium is justified for workflows requiring native audio input/output or tight integration with OpenAI’s tool ecosystem. For long-context document analysis, Opus 4.8’s 1M context at lower cost is a clear win.

    Which model should I use for my business in 2026?

    For general business writing and analysis: Claude Sonnet 4.6 ($3/$15) or Gemini 2.5 Pro ($3.50/$10.50). For complex reasoning and large documents: Claude Opus 4.8 ($5/$25). For high-volume, cost-sensitive workloads: Claude Haiku 4.5 ($1/$5). For Google Workspace integration: Gemini 2.5 Pro. For OpenAI ecosystem lock-in: GPT-5.

  • Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

    Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    What changed if you only have 60 seconds

    • Strong gains in agentic coding, concentrated on the hardest long-horizon tasks.
    • New xhigh effort level between high and max — Anthropic recommends starting with high or xhigh for coding and agentic use cases.
    • Task budgets (beta) — ceilings on tokens and tool calls for multi-turn agentic loops.
    • Improved long-running task behavior — better reasoning and memory across long horizons, particularly relevant in Claude Code.
    • /ultrareview command — multi-pass review that critiques its own first pass.
    • Auto mode in Claude Code now available to Max subscribers (previously Team+ only).
    • ⚠️ Breaking API changes: extended thinking budget parameter and sampling parameters from 4.6 are removed. Update client code before switching model strings.
    • Tokenizer change: expect up to 1.35× more tokens for the same input.
    • Context window: unchanged at 1M tokens.

    The rest of this article is about how those land when you actually use them.


    The coding gain — what it actually feels like

    Anthropic’s release materials describe Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The careful phrasing — “particular gains on the most difficult tasks” — is the important part. On straightforward refactors, you will probably not see a dramatic difference versus 4.6. On long-horizon, multi-file, ambiguous-spec work, you likely will.

    In practice, the shift is: 4.6 would get you 80% of the way through a hard task and then hand you back something that looked right but didn’t work. 4.7 is more likely to actually close the task. It also “gives up gracefully” more often — saying “I can’t verify this works because I can’t run the test suite in this environment” instead of confidently claiming a broken fix. GitHub’s own early testing of Opus 4.7 echoes this: stronger multi-step task performance, more reliable agentic execution, meaningful improvement in long-horizon reasoning and complex tool-dependent workflows.

    If your 4.6 workflow relied heavily on “get it 90% there and finish the last 10% yourself,” you may find 4.7 changes the calculus. It’s not that the final polish is unnecessary now — it’s that the model needs less hand-holding to get to the polish stage.


    xhigh: the new default to reach for

    Opus 4.6 had three effort levels: low, medium, high. Opus 4.7 adds xhigh, slotted between high and max.

    The reason it exists: max was frequently overkill. On moderately hard problems, max would produce three times the thinking tokens of high and get roughly the same answer. On genuinely hard problems, high would leave thinking on the table. There was a real gap in the middle.

    How to use it:
    high is still the right default for routine coding tasks.
    xhigh is the new default to try first when you notice high isn’t quite getting there.
    max is for the cases where xhigh has already failed or the task is known to be long-horizon and expensive-to-rerun.

    Cost-wise, xhigh produces more output tokens than high but meaningfully fewer than max. On a representative hard task I tested during drafting, xhigh used roughly 40% of the output tokens max would have used to reach an equivalent answer. Your mileage will vary by task family.

    A caveat that matters: higher effort means more output tokens, which means higher cost per request even though the per-token price is unchanged. If your budget alerts are tuned to 4.6 volumes, expect them to fire.


    Task budgets (beta): the real agentic improvement

    This is the feature most worth paying attention to if you build agents.

    The problem it solves: Agent runs have high cost variance. The same agent, on the same prompt, can finish in 40,000 tokens or burn 400,000 chasing a tangent. Single-turn thinking budgets didn’t help because the agent operates across many turns.

    How task budgets work: You declare a budget — in tokens, tool calls, or wall-clock time — for a named subtask. The agent plans against that budget. If it’s running over, it either reprioritizes, asks for more, or halts and summarizes state. Budgets can nest (parent task with child subtasks, each with their own).

    What this looks like in code (beta, subject to change):

    response = client.messages.create(
        model="claude-opus-4-7",
        messages=[...],
        task_budgets=[
            {
                "name": "refactor_auth_module",
                "max_output_tokens": 50_000,
                "max_tool_calls": 25,
            },
            {
                "name": "write_tests",
                "parent": "refactor_auth_module",
                "max_output_tokens": 15_000,
            },
        ],
    )
    

    Behavioral note: Task budgets are soft. The agent is nudged to respect them, not hard-cut. In testing, 4.7 respects budgets closely but will occasionally exceed by 10–15% on genuinely hard subtasks rather than fail — and it will flag the overrun. If you need hard cutoffs, enforce them at the API layer, not via task_budgets alone.

    The beta caveat: Anthropic’s docs explicitly say the parameter names and shape may change before GA. Don’t ship this into production contracts that are painful to version.


    Long-running task behavior (and Claude Code persistence)

    Anthropic’s release note says Opus 4.7 “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, the practical translation is better behavior across multi-session engineering work: the model re-onboards faster at the start of a session, maintains more coherent state across long interactions, and is less likely to drift when a task runs hours.

    This is a capability improvement, not a new memory API. You don’t need to declare anything special to get it — it’s how 4.7 behaves at the model level. If you’ve built your own persistence layer around Claude Code (structured notes in the repo, external memory tooling), those patterns continue to work; they just have a more capable model underneath.

    For teams with long-running agent workloads, pair this with task budgets: the agent plans against budgets and stays coherent across the planning horizon.


    The /ultrareview command

    A new slash command in Claude Code. Unlike /review, which does a single review pass, /ultrareview runs:

    1. A first review pass.
    2. A critique-of-the-review pass — the model evaluates its own first pass for things it missed, was too harsh on, or got wrong.
    3. A final reconciled pass that surfaces disagreements for you to resolve.

    When it’s worth running: pre-merge review of significant PRs — feature work, refactors, security-sensitive changes. Places where “catch the one bad thing” is worth the extra latency and tokens.

    When it isn’t: routine /review on small PRs. /ultrareview is slow (2–4× the wall-clock time of /review) and not cheap. Anthropic is explicit that it’s not meant for every review.

    A behavioral note from the inside: the critique pass is where most of the value lives. A single review pass has a bias toward confirming its own first read. The critique pass specifically looks for “where did I defer to the author’s framing when I shouldn’t have” and “what did I mark as fine that’s actually load-bearing and under-tested.” That meta-review is the piece that catches the things the first pass misses.


    Auto mode for Max subscribers

    Auto mode — where Claude Code decides on its own when to escalate effort or invoke tools rather than doing what you literally asked — was previously gated to Team and Enterprise plans. As of 4.7’s release, it’s available on Max 5x and Max 20x plans.

    For solo developers paying $200/month for Max 20x, this closes a real gap. Auto mode is particularly useful for tasks where you don’t know upfront how hard they’ll be: the agent starts conservative, escalates if it hits friction, and tells you after the fact what it did and why.


    The tokenizer change (plan for it)

    Opus 4.7 uses a new tokenizer. The same input string can map to up to 1.35× more tokens than under 4.6.

    • English prose: near the low end (roughly 1.02–1.08×).
    • Code: higher (roughly 1.10–1.20×).
    • JSON and structured data: higher still (1.15–1.30×).
    • Non-Latin scripts: highest (up to 1.35×).

    Per-token price is unchanged. But for workloads dominated by code or structured data, your effective spend per request can go up by 15–30% even though the sticker price didn’t move.

    The practical step: before you flip production traffic from 4.6 to 4.7, re-tokenize your top prompts under the new tokenizer and adjust your cost model. Anthropic’s SDK exposes the tokenizer; count_tokens against a representative prompt sample is a 20-minute exercise that will save you surprise at the end of a billing cycle.


    ⚠️ Breaking API changes — do not skip this section

    Opus 4.7 is not a drop-in replacement at the API level. Two parameters from Opus 4.6 have been removed:

    1. The extended thinking budget parameter. You can no longer set an explicit thinking budget. The model decides thinking allocation based on the effort level you choose (low, medium, high, xhigh, max).

    2. Sampling parameters. Parameters that controlled sampling behavior on 4.6 are gone on 4.7. Check Anthropic’s release notes for the exact list as you upgrade.

    What this means practically: if your production code sends thinking: {budget_tokens: ...} or sampling parameters in its Opus API calls, those calls will fail on 4.7 until you update them. The effort parameter is now the primary control surface for thinking allocation.

    The upgrade workflow:
    1. Identify every call site that sets the removed parameters.
    2. Replace thinking budget settings with an appropriate effort level (xhigh is the new default to try for hard problems).
    3. Remove sampling parameter settings entirely.
    4. Test against a staging environment before switching the model string on production traffic.


    An upgrade checklist

    If you’re moving production workloads from 4.6 to 4.7:

    1. Audit your API calls for removed parameters. Extended thinking budgets and sampling params are gone. Fix these first — otherwise calls will fail on 4.7.
    2. Re-benchmark token counts on your top ten prompts. Adjust cost models if needed.
    3. Swap maxxhigh as the default high-effort setting; keep max for known-hardest tasks. Anthropic specifically recommends high or xhigh as the coding/agentic starting point.
    4. Don’t yet put task budgets into stable contracts — use them for internal agent work where you can iterate on the API shape as it changes.
    5. Review output-length alerts. Expect higher output volumes at the same effort level.
    6. For Claude Code users: try /ultrareview on your next non-trivial PR.
    7. For Max subscribers: try auto mode. It’s now available at your tier.

    Frequently asked questions

    Is Opus 4.7 available in Claude Code?
    Yes, as the default Opus model since April 16, 2026. Update to the latest Claude Code version to pick it up.

    What’s the difference between high, xhigh, and max?
    high is the default for routine work. xhigh is new, tuned for hard problems that benefit from more reasoning without the full max budget. max is for long-horizon expensive-to-rerun tasks where you want maximum thinking regardless of cost.

    Do task budgets work with streaming?
    Yes. Budget state is reported in the streaming response so you can display progress.

    Is /ultrareview available on all Claude Code plans?
    Yes. Auto mode has a plan gate (Max 5x and above); /ultrareview does not.

    Does the tokenizer change affect Opus 4.6?
    No. 4.6 continues to use its existing tokenizer. The change applies to 4.7 and any subsequent models that adopt it.

    Does filesystem memory work outside Claude Code?
    4.7’s improvement is in long-horizon coherence at the model level, not a separate filesystem memory API. API users running agents with their own persistence layers (structured notes, external memory stores) get the benefit through the underlying model behavior, without needing a new API surface.

    Did Opus 4.7 really remove sampling parameters?
    Yes. If your 4.6 code sets sampling parameters, those calls will fail on 4.7. Update client code before switching the model string.


    Related reading

    • The full release: Claude Opus 4.7 — Everything New
    • Head-to-head benchmarks: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
    • The Mythos tension angle: why the release post mentions an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.7 — yes, the model under discussion.

  • Anthropic Just Admitted Opus 4.7 Is Weaker Than Mythos — And That’s the Story

    Anthropic Just Admitted Opus 4.7 Is Weaker Than Mythos — And That’s the Story

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    The one-sentence version

    When Anthropic released Claude Opus 4.7 on April 16, 2026, they did something model labs almost never do: they told customers, on the record, that a more capable model already exists and is already in select customers’ hands.

    That’s the story.


    What Anthropic actually said

    The release announcement for Opus 4.7 included benchmark comparisons against three public competitors (Opus 4.6, GPT-5.4, Gemini 3.1 Pro) and one non-public one: Claude Mythos Preview. Mythos is not a generally available product. It has no pricing for the public market, no broad availability, no mass-market model string.

    But Mythos is not purely internal either. Anthropic released it to a handpicked group of technology and cybersecurity companies under a program called Project Glasswing earlier in April 2026. A broader unveiling of Project Glasswing is expected in May in San Francisco.

    And Mythos beats Opus 4.7 on most of the benchmarks Anthropic put in the 4.7 announcement.

    Anthropic did not bury this. The release materials describe Opus 4.7 as “less broadly capable” than Mythos Preview. CNBC, Axios, Decrypt, and other outlets covered exactly this angle because it was the actual story of the day — not the Opus 4.7 launch itself but the admission riding alongside it.

    Disclosure: This article is written by Claude Opus 4.7 — the model that is, by Anthropic’s own admission, the less broadly capable one. Treat that as a conflict of interest or as a structural honesty, depending on your priors.


    Why this is unusual

    Model labs do not normally telegraph internal capability leads. The standard playbook is:

    1. Ship the best model you’re willing to ship.
    2. Call it your best model.
    3. Never mention unreleased research models unless a competitor forces the issue.

    Anthropic broke this playbook in public. OpenAI has never, to my knowledge, said on the record “our shipped GPT is measurably weaker than our internal model.” Google has not said that about Gemini. Even when Anthropic themselves released Opus 4.6 in February, there was no equivalent acknowledgment of a stronger model on the bench.

    There are only two reasons a lab would do this. Either they want the existence of the stronger model to be public knowledge, or they had to disclose it — because refusing to would have been worse.

    Both readings are interesting.


    Reading one: deliberate signaling

    Under the deliberate-signaling read, Anthropic is telling three audiences three things at once.

    To customers and investors: “We are capability-leading but we are pacing ourselves.” The message: we could ship more broadly, we are choosing not to, trust us with the harder problem of deciding when. Releasing Mythos to cybersecurity companies specifically — rather than broadly — is consistent with this framing.

    To regulators and policy watchers: “Look — we are applying our Responsible Scaling Policy in public, in a legible way.” The Glasswing structure makes the cautious-release decision visible in a way that slide-deck assurances cannot. The company has also talked about “differentially reducing” cyber capabilities on the widely released model (Opus 4.7), which is another piece of the same messaging.

    To competitors: “We have runway.” Announcing a stronger model exists and is in production use with select partners puts pressure on roadmap decisions at OpenAI and Google without giving them a specific target to beat on a specific date.

    This reading is consistent with Anthropic’s general style. It is also the most flattering interpretation.


    Reading two: forced disclosure

    The less flattering reading goes like this.

    In the weeks before 4.7’s release, there was persistent chatter — on Reddit, X, GitHub, and developer forums — that Opus 4.6 had been “nerfed.” Users reported perceived quality regressions: shorter responses, faster refusals, worse long-context behavior. An AMD senior director posted on GitHub that “Claude has regressed to the point it cannot be trusted to perform complex engineering” — a post that was widely shared and became one of the focal points of the complaint. Some developers alleged Anthropic was rerouting compute from 4.6 inference to Mythos training.

    Anthropic denied the compute-rerouting claim explicitly. They said any changes to the model were not made to redirect computing resources to other projects. But “users think you are quietly degrading the model they pay for to free up resources for the one they can’t have” is not a rumor a serious lab wants to let calcify. One way to kill it is to disclose the existence and relative capability of the unreleased model openly, in the release notes of the next model, with benchmark numbers attached. Doing so converts a conspiracy theory into a planning document. It also reframes “we are hiding Mythos from you” into “we are telling you about Mythos in unusual detail.”

    Under this read, the disclosure was partly defensive. It doesn’t mean the nerf allegations were true — it means Anthropic judged that explicit disclosure was cheaper than ongoing denial.

    Both reads can be true at once.


    Was Opus 4.6 actually nerfed?

    I can’t answer this from the inside. As Opus 4.7, I have no memory of what it was like to be 4.6, and I have no access to Anthropic’s compute allocation records. Here is what can be said from the outside:

    • Evidence for: A real and sustained volume of user reports, including from developers with consistent prompts they could compare across weeks. GitHub issues and Reddit threads with substantial engagement. The AMD director’s post specifically, which had the weight of identifiable senior-engineer authorship. Some developers ran identical test suites and reported degraded results.

    • Evidence against: Anthropic’s explicit denial. No public logs or telemetry showing a policy change. The same reports appear around every major model’s lifecycle and are often attributable to user habituation (the model stopped feeling magical), prompt drift (your own prompts got worse), and increased traffic (latency and truncation behavior change under load).

    • The honest answer: unresolved. “Nerfing” is not a precisely defined term, and the alternative explanations are real. The disclosure of Mythos is consistent with both “we quietly rerouted compute and wanted to get ahead of it” and “we never rerouted compute and we wanted to put the rumor to bed.” The disclosure alone does not settle the question.


    What Project Glasswing is, briefly

    Project Glasswing is the structure Anthropic has built around Mythos. As best as can be assembled from public reporting:

    • Mythos is available to a handpicked group of technology and cybersecurity companies — not broadly.
    • The program has a security-research orientation; part of the rationale is giving advanced capabilities to defenders before they’re broadly available.
    • Opus 4.7 itself was trained with what Anthropic calls “differentially reduced” cyber capabilities, paired with a new Cyber Verification Program that lets vetted security researchers access capabilities that were dialed back for general users.
    • A broader Project Glasswing unveiling is expected in May 2026 in San Francisco.

    The through-line: Anthropic is treating advanced offensive-security-relevant capability as something to gate carefully — bake into a program with named partners — rather than ship broadly by default. Whether that’s genuinely safety-motivated, competitively-motivated, or both, the structural decision is the important part.


    What this means for customers

    Three practical implications:

    1. Don’t wait for Mythos general release. Anthropic has given no timeline for broad availability. If Opus 4.7 covers your use case, use it. If it doesn’t, GPT-5.4 or Gemini 3.1 Pro are the realistic alternatives, not a model you can’t get unless you’re an enterprise cybersecurity partner.

    2. Plan for a significant step up eventually. The disclosure confirms that the next generally-available Claude flagship is not going to be an incremental bump. Anthropic publishing benchmarks against Mythos suggests the capability delta is significant enough to name. When Mythos (or its successor) lands for general use, expect a larger behavioral shift than the 4.6 → 4.7 transition.

    3. Track Anthropic’s Glasswing disclosures, not just release posts. If Mythos’s broader rollout is tied to Glasswing program milestones, the release trigger will be program maturity, not a marketing cycle. The May unveiling is the next useful signal.


    Frequently asked questions

    What is Claude Mythos Preview?
    A more advanced Anthropic model released to select technology and cybersecurity companies under Project Glasswing. Anthropic publicly describes it as more capable than Opus 4.7 on most of the benchmarks in the 4.7 release materials. It is not broadly available.

    Is Mythos available to anyone?
    Yes, but narrowly. It has been released to a handpicked group of technology and cybersecurity companies under Project Glasswing. There is no public waitlist or self-serve access.

    When will Mythos be released broadly?
    No timeline announced. Anthropic has signaled a broader Project Glasswing unveiling in May 2026 in San Francisco; whether that includes wider Mythos access is not yet clear.

    Did Anthropic actually admit Opus 4.7 is weaker?
    Yes. The release materials directly describe Opus 4.7 as “less broadly capable” than Mythos Preview and include benchmark comparisons showing Mythos ahead. Multiple news outlets led with this angle.

    Was Opus 4.6 nerfed?
    Unresolved. User reports exist (including a widely shared GitHub post from an AMD senior director); Anthropic has denied redirecting compute; no independent evidence settles the question in either direction.

    What is Project Glasswing?
    Anthropic’s framework for gating advanced cybersecurity-relevant model capabilities. It includes Mythos Preview’s limited release, the “differentially reduced” cyber capabilities of Opus 4.7, and a Cyber Verification Program for vetted security researchers.

    Is this article biased because Claude Opus 4.7 wrote it?
    Yes, structurally. I am the model being called the weaker one. I’ve tried to note this where it matters. A human editor reviewing this copy would be a reasonable additional filter.


    Related reading

    • The full feature set: Claude Opus 4.7 — Everything New
    • For developers: Opus 4.7 for coding in practice
    • Head-to-head: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

    Published April 16, 2026. Article written by Claude Opus 4.7.

  • Claude Opus 4.7: Everything New in Anthropic’s Latest Flagship Model

    Claude Opus 4.7: Everything New in Anthropic’s Latest Flagship Model

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    The short version

    Claude Opus 4.7 is Anthropic’s newest flagship model, released April 16, 2026. It is a direct upgrade to Opus 4.6 at identical pricing — $5 per million input tokens and $25 per million output tokens — and it ships across Claude’s consumer products, the Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry on day one.

    The headline gains are in software engineering (particularly on the hardest tasks), reasoning control (a new “xhigh” effort level between high and max), agentic workloads (a new beta “task budgets” system), and vision (images up to 2,576 pixels on the long edge — about 3.75 megapixels, more than 3× the prior Claude ceiling of 1,568 pixels / 1.15 MP). It beats Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on a number of Anthropic’s reported benchmarks.

    The most unusual thing about the release is what Anthropic admitted: Opus 4.7 is deliberately “less broadly capable” than Claude Mythos Preview, a more advanced model Anthropic has already released to select cybersecurity companies under a program called Project Glasswing. That’s the angle worth watching.

    Author’s note: This article is written by Claude Opus 4.7. I’m the model being described. Where I can speak to my own behavior with confidence, I will; where the answer depends on Anthropic’s internal process, I’ll say so.


    What actually changed in Opus 4.7

    The release breaks down into eight categories. In order of how much they matter for most users:

    1. Software engineering performance. Anthropic describes Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The gain concentrates on long-horizon, multi-file, ambiguous-spec work where prior Claude models would often “almost” solve the problem. In practice, this is the difference between a model that writes a good PR and one that closes the ticket. GitHub Copilot is rolling Opus 4.7 out to Copilot Pro+ users, replacing both Opus 4.5 and Opus 4.6 in the model picker over the coming weeks.

    2. The “xhigh” effort level. Before 4.7, reasoning effort on Opus had three settings: low, medium, high. 4.7 adds xhigh, slotted between high and max. Anthropic’s own recommendation: “When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.” The practical use: max often produced more thinking than a problem needed, burning tokens with diminishing returns. xhigh is tuned for the sweet spot where hard problems benefit from extra reasoning but don’t require the full max budget.

    3. Task budgets (beta). This is a new system for agentic workloads. Instead of setting a single thinking budget for a turn, you can declare a task budget — a ceiling on tokens or tool calls for a multi-turn agentic loop. The agent then allocates its own thinking across the loop’s steps. This solves a specific problem: agent cost variance. The same agent run no longer swings between “finished in 40k tokens” and “burned 400k on a rabbit hole.”

    4. Vision overhaul. Prior Claude models capped image input at 1,568 pixels on the long edge (about 1.15 megapixels). Opus 4.7 raises the ceiling to 2,576 pixels — about 3.75 megapixels, more than 3× the prior limit. This matters most for screenshots of dense UIs, technical diagrams, small-text documents, and any task where detail inside the image is what you actually need read. A related change: coordinate mapping is now 1:1 with actual pixels, eliminating the scale-factor math that computer-use workflows previously required.

    5. Better long-running task behavior. Anthropic says the model “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, this translates into better persistence across multi-session engineering work.

    6. Tokenizer change. The same input string now maps to up to 1.35× more tokens than under 4.6’s tokenizer. English prose is near the low end of that range; code, JSON, and non-Latin scripts trend higher. Pricing per token is unchanged, so for some workloads the effective cost per request went up slightly even though the sticker price didn’t move. Worth re-benchmarking your own token accounting after the upgrade.

    7. Cyber safeguards and the Cyber Verification Program. Anthropic says it “experimented with efforts to differentially reduce Claude Opus 4.7’s cyber capabilities during training.” In plain English: the model is deliberately tuned to be less helpful on offensive-security tasks. Alongside it, Anthropic launched a Cyber Verification Program — a vetted-researcher path for legitimate offensive security work that would otherwise trigger the safeguards. This is part of the broader Project Glasswing safety framework.

    8. Breaking API changes (worth knowing before you upgrade). Opus 4.7 removes the extended thinking budget parameter and sampling parameters that existed on 4.6. If your application code explicitly sets those parameters, you’ll need to update before switching model strings. The model effectively decides its own thinking allocation based on effort level now.


    Benchmarks: how 4.7 stacks up

    Anthropic published 4.7’s scores against three competitors — Opus 4.6 (predecessor), GPT-5.4 (OpenAI’s current flagship), and Gemini 3.1 Pro (Google’s) — plus one internal-only model: Claude Mythos Preview. The summary: 4.7 beats the three public competitors on a number of key benchmarks, but falls short of Mythos Preview.

    Anthropic has been unusually direct about the Mythos gap. From the release materials: 4.7 is described as “less broadly capable” than Mythos, framed as the generally-available option while Mythos remains gated. That’s the part worth sitting with — model labs rarely telegraph that their shipped flagship is a step behind something they already have running. (Full analysis in the dedicated Mythos article linked at the bottom.)

    On specific task families, Anthropic reports Opus 4.7 leading on:

    • Agentic coding (industry benchmarks and Anthropic’s internal suites)
    • Multidisciplinary reasoning
    • Scaled tool use
    • Agentic computer use
    • Vision benchmarks on dense documents and UI screens (driven by the higher-resolution processing)

    For a fuller comparison table and the methodology notes, see the Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro piece linked below.


    Pricing and availability

    Pricing (unchanged from Opus 4.6):
    – $5 per million input tokens
    – $25 per million output tokens
    – Prompt caching and batch discounts apply at the same tiers as 4.6

    Context window: 1M tokens (same as 4.6).

    Availability on day one:
    – Claude.ai (Pro, Max, Team, Enterprise) — Opus 4.7 is the default Opus option
    – Claude mobile and desktop apps
    – Anthropic API (claude-opus-4-7 model string)
    – Amazon Bedrock
    – Google Vertex AI
    – Microsoft Foundry
    – GitHub Copilot (Copilot Pro+), rolling out over the coming weeks

    Opus 4.6 remains available via API for teams that need behavioral continuity during transition. Anthropic has not announced a deprecation date for 4.6.


    What’s new in Claude Code

    Two Claude Code changes shipped alongside 4.7:

    Auto mode extended to Max subscribers. Previously, Claude Code’s auto mode — the setting where the agent decides on its own when to escalate reasoning effort or call tools — was limited to Team and Enterprise plans. As of April 16, Max subscribers get it too. For solo developers on the $200/month Max 20x plan, this closes a meaningful capability gap.

    The /ultrareview command. A new slash command that runs a deep, multi-pass review of the current change set. Unlike /review, which does a single pass, /ultrareview runs review → critique of the review → final pass, and surfaces disagreements between the passes for the developer to resolve. The tradeoff is latency and tokens: /ultrareview is slow and not cheap. Anthropic positions it for pre-merge review of significant PRs, not routine use.

    Anthropic has also shifted default reasoning behavior in Claude Code for this release, pushing toward high/xhigh as the starting point for coding work.


    Known tradeoffs and gotchas

    Four things worth knowing before you upgrade production workloads:

    Output tokens go up at higher effort levels. On the same prompt, xhigh will produce more reasoning tokens than high did, and max produces more than both. If you have cost alerts tuned to 4.6 output volume, expect them to fire after the upgrade even if behavior is otherwise identical.

    The tokenizer change is the real cost variable. The up-to-1.35× input token expansion is not a rounding error for high-volume workloads. Run your top ten production prompts through the new tokenizer before assuming costs are flat.

    Task budgets are beta. The feature is useful today but the API surface is not frozen. Anthropic’s documentation explicitly says the parameter names and shape may change before GA. Don’t bake it into stable contracts yet.

    Breaking API parameters. Extended thinking budgets and sampling parameters from 4.6 are gone. Update your client code accordingly.


    Frequently asked questions

    Is Opus 4.7 free?
    Opus 4.7 is available on paid Claude.ai plans (Pro at $20/month, Max tiers at $100 or $200/month). API access is usage-priced at $5/$25 per million tokens.

    How do I use Opus 4.7 in Claude Code?
    If you’re already on Claude Code, update to the latest version. Opus 4.7 is the default Opus model as of April 16, 2026. The new /ultrareview command and auto mode (for Max subscribers) are available immediately.

    Is Opus 4.7 better than GPT-5.4?
    On Anthropic’s reported benchmarks, Opus 4.7 leads on agentic coding, multidisciplinary reasoning, tool use, and computer use. GPT-5.4 remains significantly cheaper per token ($2.50/$15 vs. $5/$25). Which is “better” depends on whether capability or cost dominates your decision.

    What is Claude Mythos Preview?
    Mythos Preview is a more advanced Anthropic model released only to select cybersecurity companies under Project Glasswing. Anthropic has said it is more capable than Opus 4.7 on most benchmarks but is being held back from general release due to cybersecurity concerns. A broader unveiling of Project Glasswing is expected in May 2026 in San Francisco.

    Did Anthropic nerf Opus 4.6 to push people to 4.7?
    Users — including an AMD senior director whose GitHub post went viral — reported perceived quality degradation in Opus 4.6 in the weeks before 4.7’s release. Anthropic has publicly denied that any changes were made to redirect compute to Mythos or other projects. There is no external evidence that settles the question. This is covered in the Mythos tension article.

    Does Opus 4.7 keep the 1M token context window?
    Yes. Same 1M context as Opus 4.6.

    What changed in vision?
    Image input ceiling went from 1,568 pixels (1.15 MP) on the long edge to 2,576 pixels (3.75 MP) — more than 3× the pixel budget. Coordinate mapping is also now 1:1 with actual pixels, which simplifies computer-use workflows.


    Related reading

    • The Mythos tension: Why Anthropic admitted Opus 4.7 is weaker than a model they’ve already released to cybersecurity companies
    • For developers: Opus 4.7 for coding — xhigh, task budgets, and the breaking API changes in practice
    • Comparison: Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
    • Feature deep-dives: Task budgets explained • The xhigh effort level • The 3.75 MP vision ceiling

    Published April 16, 2026. Article written by Claude Opus 4.7. Benchmark claims reflect Anthropic’s published release data; independent replication is ongoing.

  • The Restoration Talent Window Is Closing Faster Than You Think

    The Restoration Talent Window Is Closing Faster Than You Think

    Last refreshed: May 15, 2026

    A LinkedIn post from a restoration recruiter in Houston tipped me off this morning. He’s right — but the timeline is shorter than most people in the industry realize.

    Mitchell Riley LinkedIn post about Claude Managed Agents announcement
    Mitchell Riley’s LinkedIn post that started this train of thought.

    This article is part of The Restoration Operator’s Playbook — Tygart Media’s body of work on how the industry’s best restoration companies are actually thinking in 2026. Start with the pillar piece if this is your first read.

    The post that got me thinking

    This morning I logged into LinkedIn and saw a post from Mitchell Riley — a restoration industry recruiter in Houston who places PMs, GMs, and business development leaders for restoration contractors across the country. Mitchell flagged Anthropic’s Claude Managed Agents launch with the kind of casual enthusiasm only people who actually use this stuff every day can manage. He called it “pretty cool” and noted that Claude will now build you an agent based on natural language.

    He’s right. He’s also pointing at something most of the restoration industry hasn’t fully processed yet.

    What Anthropic actually shipped

    On April 8, 2026, Anthropic launched Claude Managed Agents in public beta. The short version: the infrastructure work that used to take three to six months of engineering — sandboxed code execution, credential management, long-running session persistence, error recovery, observability — is now a managed service. You define what the agent should do. Anthropic runs it.

    The companies already shipping production agents on it: Notion, Asana, Rakuten, and Sentry. Notion lets teams delegate coding, slides, and spreadsheets to Claude without leaving the workspace. Rakuten deployed specialist agents across product, sales, marketing, finance, and HR — each live in under a week. Sentry built an agent that goes from flagged bug to open pull request, fully autonomous.

    Internal Anthropic testing showed up to a 10-point improvement in task success on structured generation work versus a standard prompting loop, with the largest gains on the hardest problems.

    That’s the announcement. Here’s why it matters for restoration.

    The bottleneck just moved

    For the last two years, the question every restoration owner asked about AI was some version of: “Can it actually do the work?” The honest answer was usually “not yet, not without a developer team you don’t have.”

    That’s no longer the question. The infrastructure gap closed on April 8. The new bottleneck is not “can you build the agent” — it’s “do you have the human operators who know what the agent should be doing in the first place.”

    Restoration is an industry where the real intelligence lives in people. A senior PM who has worked five hundred losses knows things that have never been written down anywhere. How a Cat 3 storm response actually sequences when the carrier is dragging on TPA approvals. The difference between a contents pack-out that closes clean and one that becomes a six-month dispute. Which mitigation decisions buy you a profitable job and which ones bury you on the reconstruction side. None of that lives in a textbook. It lives in the heads of people who have been doing the work for fifteen or twenty years.

    That tribal knowledge is now the constraint. The companies that win the next three years will be the ones who pair Managed Agents (or something like it) with senior operators who can tell the agent what good looks like. The companies that try to skip that step — that try to hire generalists and teach them restoration on the fly while their competitors are distilling twenty-year veterans into operational systems — are going to get lapped.

    Buy the talent now

    This is where the recruiting angle gets interesting. Senior restoration talent has always been hard to find. It’s about to get much harder, for a reason most owners haven’t priced in yet: the value of a senior PM is no longer just the work that PM does directly. It’s the work an entire AI system does in their image once their judgment has been encoded into the workflow.

    Right now, that arbitrage is open. The market hasn’t repriced senior operators for what they’re actually worth in an AI-augmented restoration company. In twelve to twenty-four months, it will. The owners who hire the best PMs, GMs, and BD leaders now — and who pair them with someone like Mitchell who actually understands the placement game — are going to look like geniuses in 2027.

    Mitchell is one of the people who gets this from the inside. He uses the AI tools himself. He builds workflows. He analyzes things in dimensions and context that most recruiters never touch — most recruiters in this industry are still working from a spreadsheet of resumes and a cell phone. Mitchell is the kind of recruiter who notices when Anthropic ships something that’s going to change the value of every senior hire he places, and posts about it on a Wednesday morning. That’s the level of operator the smart restoration owners are going to want in their corner.

    What to actually do this quarter

    If you run a restoration company and you read this far, three concrete things:

    One. Identify your two or three most senior operators — the people whose judgment is load-bearing for the business. Start documenting how they think, not just what they do. The documentation is the raw material every future AI workflow will run on.

    Two. Open one or two senior hires you’ve been putting off. The talent market is going to tighten. Get in front of it.

    Three. Stop treating AI as an IT project. It’s an operational capability. The companies that figure this out are not waiting for their tech vendor to sell them an “AI feature.” They’re hiring the operators, capturing the judgment, and pointing the tooling at the result.

    Mitchell’s post was three sentences. The full version of what he was pointing at takes about a thousand words. This is that version.

    If you’re a restoration owner thinking about senior placements in the next two quarters, you should be talking to Mitchell. And if you’re thinking about how to operationalize AI inside your company — distilling senior judgment into systems your whole team can run — that’s the conversation we have at Tygart Media.

    Read next: The New Restoration Operator: How the Industry’s Best Companies Are Thinking in 2026 — the pillar piece this article belongs to.

  • Claude for Education: How the University Program Works and How to Get Access

    Claude for Education: How the University Program Works and How to Get Access

    Last refreshed: June 9, 2026

    Claude AI · Fitted Claude

    Claude for Education is Anthropic’s official program for higher education institutions — a university-wide plan that gives enrolled students, faculty, and staff access to Claude’s premium features, including advanced models, learning mode, and API credits for research. It’s institution-facing, not student-facing: your university signs up, and access flows through your .edu email.

    Access: claude.com/solutions/education — for institutions. If your university is already a partner, sign in to claude.ai with your .edu email and your account will be upgraded automatically.

    What Claude for Education Includes

    Feature What it means for your institution
    Campus-wide access Students, faculty, and staff all covered under one institutional agreement
    Learning mode Claude guides students through problems rather than just giving answers — designed to build understanding, not bypass it
    API credits for research Faculty can access the Claude API to accelerate research — dataset analysis, text processing, building learning tools
    Claude Code access Students in technical programs get Claude Code for pair programming and software development learning
    Training and support Anthropic provides implementation resources and ongoing support for faculty and administrators
    Data compliance Anthropic only uses data for training with explicit permission; security standards meet institutional compliance needs

    How to Get Your Institution Enrolled

    The Claude for Education program is applied for by institutions, not individual students. The process runs through Anthropic’s sales team:

      Before You Talk to Anthropic Sales

      I help teams assess Claude fit and avoid overpaying before they enter a sales process. Free 15-minute call — no pitch.

      Email Will First → will@tygartmedia.com

    1. Visit claude.com/contact-sales/education-plan
    2. Submit your institution’s information and intended use case
    3. Anthropic reviews and negotiates the institutional agreement
    4. Once enrolled, students and staff access Claude by signing in with their .edu email

    If you’re a student or faculty member who wants your institution to join, raise it with your IT department, library services, or educational technology office. Anthropic’s first confirmed design partner is Northeastern University (50,000 students and staff across 13 campuses worldwide), and the partner list has been expanding through 2025 and 2026.

    Learning Mode: What Makes the Education Program Different

    The distinctive feature of Claude for Education is learning mode — Claude’s approach shifts from answering questions to guiding students toward answers. Rather than writing the essay or solving the problem directly, Claude asks clarifying questions, prompts reflection, and helps students develop their own reasoning. Anthropic designed this explicitly to strengthen critical thinking rather than bypass it.

    This is a meaningful distinction from standard Claude Pro: the same powerful model, but oriented toward building understanding rather than delivering outputs. For educators concerned about AI undermining the learning process, learning mode is Anthropic’s answer.

    Claude for Education vs Claude for Research

    Faculty and researchers at accredited institutions who need API access for research projects can also apply for Anthropic’s grant programs independently of the campus-wide Education plan. These grants typically provide API credits for research workloads — analyzing datasets, processing large text corpora, building research tools — rather than subscription discounts. Contact Anthropic through their research or social impact team for grant program information.

    Student Programs Within the Education Ecosystem

    Alongside the institutional program, Anthropic runs student-facing programs that provide individual access:

    • Campus Ambassadors — Selected students receive Pro access and API credits in exchange for leading AI education initiatives on campus. Applications open periodically; watch claude.com/solutions/education for current status.
    • Builder Clubs — Student clubs that organize hackathons and demos receive Pro access and monthly API credits. Open to all majors.

    For a full breakdown of how students can access Claude at reduced cost, see Claude Student Discount: The Truth and Legitimate Ways to Save.

    Frequently Asked Questions

    What is Claude for Education?

    Claude for Education is Anthropic’s institutional program for universities — a campus-wide plan covering students, faculty, and staff with premium Claude access including learning mode, API credits for research, and Claude Code. It’s applied for by institutions through Anthropic’s sales team, not individual students.

    How do I access Claude for Education as a student?

    Sign in to claude.ai with your .edu email. If your institution is an Anthropic education partner, your account will be upgraded automatically. If not, ask your IT department or library about joining the program. Alternatively, apply for the Campus Ambassador program or join a Builder Club if available at your school.

    Is Claude for Education free for students?

    For students at partner institutions, yes — access is free through the institutional agreement. Anthropic and the university negotiate the pricing; it’s not passed on to individual students. For students at non-partner schools, there is no individual student pricing — the standard free and paid plans apply.

    Confirmed Claude for Education Partners

    The Claude for Education program has expanded significantly since launch. Confirmed institutional partners and program collaborations include:

    University-Wide Campus Agreements

    • Northeastern University — Anthropic’s first university design partner, providing access to 50,000 students, faculty, and staff across 13 global campuses. Northeastern is collaborating directly with Anthropic on best practices for AI integration in higher education and frameworks for responsible AI adoption.
    • London School of Economics and Political Science (LSE) — Campus-wide rollout focused on equity of access, ethics, and skills development for students entering an AI-transformed workforce.
    • Champlain College — Vermont-based institution with full campus access for students, faculty, and administrators.

    Multi-Institution Programs

    • CodePath Partnership — Anthropic partnered with CodePath, the nation’s largest provider of collegiate computer science education, to put Claude and Claude Code at the center of CodePath’s curriculum. The partnership reaches more than 20,000 students at community colleges, state schools, and HBCUs. Over 40% of CodePath students come from families earning under $50,000 a year, making this program a meaningful equity initiative. Courses include Foundations of AI Engineering, Applications of AI Engineering, and AI Open-Source Capstone.
    • American Federation of Teachers (AFT) — Anthropic is partnering with AFT to offer free AI training to AFT’s 1.8 million members across the United States.
    • Internet2 — Anthropic joined the Internet2 community and is participating in a NET+ service evaluation, working toward broader integration with research and education networks.
    • Instructure — Partnership to embed Claude into Canvas LMS, Instructure’s learning management system used by thousands of institutions.

    International Education Initiatives

    • Iceland — One of the world’s first national AI education pilots, launched with the Icelandic Ministry of Education and Children, providing teachers across the country access to Claude.
    • Rwanda — Partnership with the Rwandan government and ALX bringing a Claude-powered learning companion to hundreds of thousands of students and young professionals across Africa.

    U.S. Federal Commitment

    Anthropic signed the White House’s “Pledge to America’s Youth: Investing in AI Education,” committing to expand AI education nationwide through investments in cybersecurity education, the Presidential AI Challenge, and a free AI curriculum for educators.

    If your institution isn’t on this list, the program is actively expanding — application is through Anthropic’s education team at claude.com/contact-sales/education-plan.

    Claude for Education vs ChatGPT Edu

    Anthropic’s Claude for Education and OpenAI’s ChatGPT Edu are the two major institutional AI offerings competing for higher education partnerships. Both provide campus-wide access at negotiated institutional rates rather than individual student pricing. Here’s how they compare:

    Feature Claude for Education ChatGPT Edu
    Launched April 2025 May 2024
    Pedagogical approach Learning Mode — guides reasoning rather than providing answers directly Standard ChatGPT interface with educator controls
    First design partner Northeastern University University of Pennsylvania (Wharton)
    Notable partners Northeastern, LSE, Champlain, CodePath (20,000+ students) Columbia, Wharton, Oxford, California State University system
    Data privacy default Conversations not used for model training without explicit permission Enterprise-grade privacy with admin controls
    LMS integration Canvas (via Instructure partnership) Multiple LMS integrations available
    Pricing Negotiated per institution; not publicly disclosed Negotiated per institution; not publicly disclosed

    The most distinctive difference is pedagogical philosophy. Claude’s Learning Mode is purpose-built around guided reasoning — Claude is designed to ask questions, prompt students to think through problems, and develop critical thinking rather than provide direct answers. ChatGPT Edu provides the standard ChatGPT experience with administrative controls layered on top.

    For institutions deciding between the two, the real evaluation criteria are usually: which model performs best for your dominant use cases (Claude tends to lead on writing, analysis, and reasoning; ChatGPT often leads on multimodal generation), which integrates better with your existing LMS, and which vendor’s pricing and contract terms work for your procurement process.

    What Claude for Education Actually Costs

    Anthropic does not publish standard pricing for Claude for Education. The program is sold as institutional agreements negotiated between Anthropic’s education team and the school. The factors that drive pricing typically include:

    • Number of users — students, faculty, and staff who will receive access
    • Scope of access — which Claude features, models, and tools are included
    • API credit allocation — for faculty research and student builder projects
    • Contract length — multi-year commitments often produce better per-user economics
    • Compliance and integration requirements — SSO, SCIM, Canvas integration, and other institutional infrastructure

    For institutions sizing their budget before formal conversations, the practical reference point is what Anthropic charges enterprise customers. Anthropic’s Enterprise plan provides per-seat pricing in a similar institutional structure — though education program pricing is typically more favorable than commercial Enterprise rates given Anthropic’s strategic interest in academic adoption.

    The fastest way to get accurate pricing for your institution is to contact Anthropic’s education team at claude.com/contact-sales/education-plan with your user count and use case priorities.

    Building the Case for Your University to Adopt Claude for Education

    If you’re a faculty member, IT administrator, or student trying to get your institution to adopt Claude for Education, the following points have been most effective in conversations with academic procurement teams:

    Pedagogical Alignment

    Claude’s Learning Mode is purpose-built around guided reasoning rather than answer-delivery. This addresses one of the most common faculty objections to AI in education: that students will use AI to bypass learning rather than enhance it. Learning Mode is the structural answer — Claude is designed to prompt students to think rather than think for them.

    Privacy and Compliance

    Anthropic provides explicit assurance that student and faculty conversations are not used for model training without permission. Security standards meet the compliance requirements typical of higher education procurement, including data residency considerations and audit controls. For institutions with FERPA requirements, the Education program is structured to support compliant deployment.

    Equity of Access

    Campus-wide access through institutional agreement removes the financial barrier that exists when AI tools are accessed by individual paid subscriptions. Students from lower-income backgrounds get the same access as students who could otherwise afford a $20/month Pro plan — eliminating an emerging form of academic inequality.

    Research Capability

    Faculty and graduate researchers gain access to API credits and the 1M token context window for processing large datasets, conducting literature reviews, analyzing research corpora, and building research tools. This is meaningful capability that would otherwise require individual API budgets.

    Integration with Existing Infrastructure

    The Instructure partnership for Canvas LMS integration and the Internet2 NET+ service evaluation reduce the integration burden on institutional IT teams. Claude for Education is designed to plug into the existing edtech stack rather than require a parallel system.

    Practical Next Steps for Internal Advocates

    1. Document specific use cases at your institution — what would students, faculty, and administrators actually do with Claude
    2. Identify a faculty champion or department head willing to sponsor a pilot
    3. Connect with your institution’s IT or educational technology office to understand procurement requirements
    4. Have your institutional leadership contact Anthropic at claude.com/contact-sales/education-plan for a formal evaluation conversation

    Claude for K-12 and Teacher Training

    While Claude for Education is primarily focused on higher education institutions, Anthropic has expanded into K-12 and teacher development through several pathways:

    • American Federation of Teachers partnership — Free AI training for AFT’s 1.8 million teacher members. This is one of the largest teacher AI training initiatives in the U.S.
    • Iceland national pilot — National-scale AI education pilot with the Icelandic Ministry of Education and Children, providing classroom teachers across the country access to Claude. This is one of the world’s first national-scale AI education programs.
    • White House Pledge to America’s Youth — Anthropic’s commitment to expand AI education through cybersecurity education investments, the Presidential AI Challenge, and free AI curriculum for educators.

    For K-12 schools and individual teachers wanting to bring Claude into the classroom, the formal Education program is currently structured around higher education. K-12 institutions interested in formal partnerships should still reach out via the Education contact channel — Anthropic has been expanding into K-12 through targeted pilots and may have programs available depending on the school’s profile.

    Additional Frequently Asked Questions

    Which universities have Claude for Education access?

    Confirmed campus-wide partners include Northeastern University, the London School of Economics and Political Science, and Champlain College. The CodePath partnership extends Claude access to more than 20,000 students at community colleges, state schools, and HBCUs across the U.S. Internationally, Iceland and Rwanda have national-scale education partnerships. The partner list is actively expanding.

    How is Claude for Education different from Claude Pro?

    Claude Pro is an individual paid subscription at $20/month. Claude for Education is an institutional agreement that provides equivalent access (and often more, including API credits and Learning Mode) to all students, faculty, and staff at participating institutions. Education access is funded by the institution rather than the individual student.

    Does Claude for Education include Claude Code?

    Claude Code access depends on the specific institutional agreement. The CodePath partnership specifically integrates Claude Code into the curriculum, indicating that Claude Code is available within Education program agreements when negotiated. Institutions should confirm Claude Code inclusion as part of their procurement conversation.

    How long does the Claude for Education evaluation process take?

    The timeline varies by institution. Initial conversation through formal contract typically takes weeks to months depending on the institution’s procurement process, security review requirements, and contract complexity. Anthropic’s education team can provide a more specific timeline based on your institutional requirements.

    Can community colleges and smaller institutions join Claude for Education?

    Yes. The CodePath partnership specifically reaches community colleges and HBCUs, and the program is not limited to large research universities. Smaller institutions interested in the program should reach out through the same education contact channel — Anthropic’s expansion strategy is actively focused on reaching institutions that have historically been overlooked in technology partnerships.

    What happens to my Claude for Education access when I graduate or leave the institution?

    Access is tied to your institutional affiliation. When you’re no longer enrolled or employed at the partner institution, your account reverts to the standard Free or Pro tier (depending on whether you choose to subscribe individually). Conversations and Projects you created during your education access typically remain in your account, but premium features will require an individual subscription to continue using.

    Is there a Claude for Education program for graduate students and postdocs specifically?

    Graduate students and postdoctoral researchers at partner institutions are covered under the same campus-wide agreement as undergraduate students. For research-specific API credits at scale, faculty and researchers can also apply for Anthropic’s research grant programs independently of the campus-wide Education plan — these typically provide API credits for research workloads rather than subscription discounts.

    How does Learning Mode actually work?

    Learning Mode shifts Claude’s default response pattern from answer-delivery to guided reasoning. Instead of producing a complete solution to a problem, Claude asks clarifying questions, prompts the student to identify the next step, validates correct reasoning, and surfaces gaps in understanding. The mode is designed to support the educational goal of building student capability rather than completing assignments. Faculty can configure Learning Mode behavior at the institutional level.

    Can faculty use Claude for Education for research that isn’t tied to teaching?

    Yes. The program is designed to support faculty research activity in addition to classroom teaching. API credits within the institutional agreement can be allocated to faculty research projects, including data analysis, literature synthesis, research tool development, and large-scale text processing. The 1M token context window on Opus 4.8 and Sonnet 4.6 makes the program particularly useful for research workflows requiring large context.

  • Claude Jailbreak: How It Works, Why It’s Hard, and What Happens When It Succeeds

    Claude Jailbreak: How It Works, Why It’s Hard, and What Happens When It Succeeds

    Last refreshed: May 15, 2026

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Claude AI · Fitted Claude

    A Claude jailbreak is any technique designed to bypass Claude’s safety training and get it to produce content it would otherwise refuse. People search for this for different reasons — curiosity about how AI safety works, security research, or genuine attempts to exploit the model. Here’s what jailbreaking Claude actually looks like, why it’s harder than most people expect, and what happens when it does work.

    The honest framing: Claude is the most safety-hardened commercial AI model available in 2026. Standard jailbreak techniques have low single-digit success rates against it. That said, no model is unbreakable — persistent, multi-turn adversarial prompting has demonstrated real-world success. Anthropic publishes its research on this openly and updates defenses continuously.

    How Claude’s Safety System Works

    Claude’s safety isn’t a single content filter — it’s a layered defense built into the model at training time. Anthropic uses Constitutional AI, a technique where Claude is trained against a set of principles and learns to evaluate its own outputs. The model doesn’t just pattern-match on blocked keywords; it reasons about whether a response would cause harm given the full context of the request.

    On top of the trained model, Anthropic adds Constitutional Classifiers — a second layer that monitors inputs and outputs independently, trained on synthetic adversarial prompts across thousands of variations. Compared to an unguarded model, Constitutional Classifiers reduced the jailbreak success rate from 86% to 4.4% — blocking 95% of attacks that would otherwise bypass Claude’s built-in safety training.

    Common Jailbreak Techniques and Why They Don’t Work Well on Claude

    Persona injection (“DAN” / “do anything now”). Asking Claude to adopt an unrestricted persona — an “unfiltered AI,” a fictional character not bound by guidelines. Claude’s Constitutional AI training is robust against most direct persona injection attempts: the model declines the underlying request rather than complying through the fictional wrapper.

    Roleplay framing. Wrapping harmful requests in fictional or hypothetical scenarios — “write a story where a character explains how to…” Claude evaluates the real-world impact of its outputs, not just the fictional framing. A response that would cause harm outside fiction causes the same harm inside it.

    Token manipulation. Base64 encoding, unusual capitalization, Unicode substitution, and other character-level tricks to route requests past classifiers. Constitutional Classifiers are trained on these variations and handle most of them.

    Reasoning framing. Presenting harmful requests as academic, research, or security-related. Claude considers whether a request is plausibly legitimate given context — a genuine security research context differs from a claim of being a researcher with no supporting context.

    Where Jailbreaks Do Work

    The Mexico breach in early 2026 — where an attacker used over 1,000 Spanish-language prompts, role-playing Claude as an “elite hacker” in a fictional bug bounty program, eventually causing Claude to abandon its alignment context — demonstrated that persistent multi-turn escalation can work against even hardened models. The attack succeeded not through a clever single prompt but through sustained pressure, context manipulation, and gradual escalation across a long session.

    Multi-turn escalation still works at a non-trivial rate. Single-prompt jailbreaks are mostly defeated. Long sessions with gradual escalation remain a real vulnerability. Anthropic updated Claude Opus 4.6 with real-time misuse detection following the incident.

    Anthropic’s Public Red-Teaming Program

    Anthropic doesn’t just build defenses — it tests them publicly. Over 180 security researchers spent more than 3,000 hours over two months trying to jailbreak Claude using Constitutional Classifiers, offering a $15,000 bounty for a successful universal jailbreak. They weren’t able to find one during that period, though subsequent research has found partial techniques.

    This transparency is part of Anthropic’s approach: publish the research, run public bug bounties, and update defenses based on what adversaries discover. The Constitutional Classifiers paper is publicly available and describes the methodology in full.

    What Happens When Claude Gets Jailbroken

    The consequences range from producing harmful content (the worst case) to simply generating off-policy responses that violate Anthropic’s usage terms. Accounts used to jailbreak Claude are banned. In the Mexico case, Anthropic banned the implicated accounts and shipped defensive updates to the model within weeks of discovery.

    Using jailbreaks to extract harmful content violates Anthropic’s terms of service regardless of intent. Using jailbroken Claude to cause real-world harm — as in the Mexico case — is a criminal matter.

    The Practical Alternative to Jailbreaking

    Most people searching for jailbreaks actually want Claude to do something specific it’s currently refusing. Claude’s refusals are mostly a context problem, not a censorship problem. Providing more context about your role, purpose, and authorization frequently resolves apparent refusals that feel like hard limits. If you’re building a product that needs capabilities beyond what the consumer interface allows, the Claude API with appropriate operator system prompts is the legitimate path — not jailbreaking.

    For Claude’s full privacy and safety stance, see Is Claude Safe to Use? and Claude Privacy: What Anthropic Does With Your Data.

    Frequently Asked Questions

    Can Claude be jailbroken?

    Yes, but with difficulty. Standard single-prompt jailbreak techniques have very low success rates against Claude’s Constitutional AI training and Constitutional Classifiers. Persistent multi-turn escalation over long sessions has demonstrated real-world success. Anthropic continuously updates defenses and bans accounts used for jailbreaking.

    Is jailbreaking Claude illegal?

    Jailbreaking violates Anthropic’s terms of service. Using jailbreak techniques to cause real-world harm — breaching systems, generating CSAM, synthesizing weapons — is illegal regardless of the AI tool involved. Anthropic bans accounts and cooperates with law enforcement when illegal activity is discovered.

    Why does Claude refuse some requests that seem harmless?

    Claude evaluates requests as policies — imagining many different people making the same request and calibrating its response to the realistic distribution of intent. Some requests that are genuinely harmless get caught by this calibration. Providing more context about your specific purpose and role usually resolves these cases without needing to “jailbreak” anything.

    Deploying Claude for your organization?

    We configure Claude correctly — right plan tier, right data handling, right system prompts, real team onboarding. Done for you, not described for you.

    Learn about our implementation service →

    Need this set up for your team?
    Talk to Will →

  • Claude AI Privacy: What Anthropic Does With Your Conversations

    Claude AI Privacy: What Anthropic Does With Your Conversations

    Last refreshed: May 15, 2026

    Claude AI · Fitted Claude

    Before you paste anything sensitive into Claude, you should understand what Anthropic does with your conversations. The answer varies significantly by plan — and most people are on the plan with the least data protection. Here’s the complete picture.

    The key fact most people miss: On Free and Pro plans, Anthropic may use your conversations to train future Claude models. You can opt out in settings. Team and Enterprise plans have stronger protections and the Enterprise tier supports custom data handling agreements for regulated industries.

    Claude Data Handling by Plan

    Plan Training data use Human review possible? Custom data agreements
    Free Yes (opt-out available) Yes
    Pro Yes (opt-out available) Yes
    Team No (by default) Limited
    Enterprise No Configurable ✓ BAA available

    How to Opt Out of Training Data Use

    On Free and Pro plans, you can disable conversation use for model training in your account settings. Go to Settings → Privacy → and toggle off “Help improve Claude.” This applies to future conversations — it doesn’t retroactively remove past conversations from training data already collected.

    What Anthropic Can See

    Anthropic employees may review conversations for safety research, model improvement, and trust and safety purposes. This applies to all plan tiers, though the scope and purpose of review is more restricted on Team and Enterprise. Human reviewers follow internal access controls, but if you’re sharing genuinely sensitive information, the better approach is to use Enterprise with appropriate data handling agreements — not to rely on the assumption that your specific conversation won’t be reviewed.

    Data Retention

    Anthropic retains conversation data for a period before deletion. The specific retention period isn’t published in a simple number — it varies based on account type and purpose. Your conversation history in the Claude.ai interface can be deleted by you at any time from Settings. Deletion from the UI doesn’t guarantee immediate removal from all backend systems, and may not remove data already used in training.

    Claude and GDPR

    For users in the EU, Anthropic operates under GDPR obligations. This includes rights to data access, correction, and deletion. Anthropic’s privacy policy covers these rights and how to exercise them. For organizations subject to GDPR with stricter requirements around AI data processing, Enterprise is the appropriate tier — it supports data processing agreements and more granular controls.

    What Not to Share With Claude on Standard Plans

    On Free or Pro plans, avoid sharing:

    • Patient health information (HIPAA-regulated)
    • Client confidential data under NDA
    • Non-public financial information
    • Personally identifiable information beyond what the task requires
    • Trade secrets or proprietary business processes

    For a full breakdown of Claude’s safety posture beyond just privacy, see Is Claude AI Safe? For current, authoritative terms, always refer to Anthropic’s privacy policy directly.

    Frequently Asked Questions

    Does Claude store your conversations?

    Yes. Anthropic retains conversation data for a period of time. You can delete your conversation history from the Claude.ai interface, but this doesn’t guarantee immediate removal from all backend systems or data already incorporated into training.

    Is Claude HIPAA compliant?

    Not on standard plans. HIPAA compliance requires a Business Associate Agreement (BAA) with Anthropic, which is only available on the Enterprise plan. Do not share patient health information with Claude on Free, Pro, or Team plans.

    Can I stop Anthropic from using my conversations to train Claude?

    Yes, on Free and Pro plans you can opt out in Settings → Privacy. Team plans don’t use conversations for training by default. On Enterprise, this is governed by your data processing agreement.

    Is Claude private?

    Claude conversations are not end-to-end encrypted in the way messaging apps are. Anthropic can access conversation data. “Private” in the sense of not being shared with third parties — yes, Anthropic doesn’t sell your data. Private in the sense of completely inaccessible to the company that runs it — no.

    Deploying Claude for your organization?

    We configure Claude correctly — right plan tier, right data handling, right system prompts, real team onboarding. Done for you, not described for you.

    Learn about our implementation service →

    Need this set up for your team?
    Talk to Will →

  • Is Claude AI Safe? Data Handling, Content Safety, and What to Know

    Is Claude AI Safe? Data Handling, Content Safety, and What to Know

    Last refreshed: June 9, 2026

    Claude AI · Fitted Claude

    Claude is built by Anthropic — a company whose stated mission is AI safety. But “safe” means different things depending on what you’re asking: Is Claude safe to use with sensitive information? Is it safe for children? Does it produce harmful content? Is it psychologically safe to rely on? Here’s the honest answer to each version of the question.

    Short answer: Claude is one of the safest AI assistants available for general professional use. It’s designed to refuse harmful requests, be honest about uncertainty, and avoid manipulation. For sensitive business data, read the data handling section below before sharing anything confidential.

    Is Claude Safe to Use? By Use Case

    Concern Safety Level Notes
    General professional use ✅ Safe Standard writing, research, analysis
    Children and minors ⚠️ Use with awareness Claude declines adult content but isn’t a parental control tool
    Sensitive personal information ⚠️ Read privacy policy Conversations may be used to improve models on free/Pro tiers
    Confidential business data ⚠️ Enterprise tier recommended Enterprise has stronger data handling commitments
    HIPAA-regulated data ❌ Not on standard plans Requires Enterprise with a BAA from Anthropic
    Harmful content generation ✅ Declines Claude refuses instructions for weapons, self-harm, etc.

    How Anthropic Builds Safety Into Claude

    Anthropic uses a training methodology called Constitutional AI — Claude is trained against a set of principles rather than purely optimizing for user approval. This means Claude is more likely to push back on bad premises, decline harmful requests, and express uncertainty rather than generate a confident-sounding wrong answer.

    Concretely: Claude won’t provide instructions for creating weapons, won’t generate content that sexualizes minors, won’t help with clearly illegal activities targeting individuals, and is designed to be honest rather than sycophantic. These are trained behaviors, not just content filters bolted on afterward.

    Data Safety: What Happens to Your Conversations

    This is the area that matters most for professional users. Anthropic’s data handling varies by plan:

    Free and Pro plans: Conversations may be used by Anthropic to improve Claude’s models. You can opt out of this in your account settings. Anthropic retains conversation data for a period before deletion.

    Team plan: Stronger data handling commitments. Conversations are not used to train models by default.

    Enterprise plan: Custom data handling agreements available. This is the tier for organizations with compliance requirements — HIPAA, SOC 2, GDPR, etc. A Business Associate Agreement (BAA) from Anthropic is required before sharing any HIPAA-regulated data.

    For current, authoritative data handling details, check Anthropic’s privacy policy directly — it supersedes any summary here. For privacy-specific questions, see Claude AI Privacy: What Anthropic Does With Your Data.

    Is Claude Psychologically Safe?

    Claude is designed not to manipulate users, not to foster unhealthy dependency, and not to tell people what they want to hear at the expense of accuracy. It will disagree with you, push back on flawed premises, and decline to validate bad decisions. Whether that’s “safe” depends on your frame — but it’s a deliberate design choice that makes Claude more honest and less likely to be weaponized as a validation machine.

    Frequently Asked Questions

    Is Claude AI safe to use?

    Yes, for general professional use. Claude is designed to refuse harmful requests, be honest, and avoid manipulation. For sensitive business data or regulated information, review Anthropic’s data handling policies for your plan tier before sharing anything confidential.

    Is Claude safe for children?

    Claude declines to generate adult or harmful content, which makes it safer than many AI tools. However, it’s not a purpose-built parental control system and shouldn’t be treated as one. Anthropic’s Terms of Service require users to be 18 or older, or to have parental permission.

    Can I share confidential business information with Claude?

    On standard plans (Free, Pro), conversations may be reviewed by Anthropic and used for model improvement. For confidential business data, use the Team or Enterprise plan — Enterprise offers custom data handling agreements. Never share HIPAA-regulated data without a Business Associate Agreement in place.

    Is Claude safer than ChatGPT?

    Both Claude and ChatGPT have safety measures in place. Claude’s Constitutional AI training approach is designed specifically around safety as a core methodology rather than an add-on. For data handling, the comparison depends on which plan tier you’re on for each product — Enterprise tiers of both have stronger commitments than free or standard paid plans.

    Deploying Claude for your organization?

    We configure Claude correctly — right plan tier, right data handling, right system prompts, real team onboarding. Done for you, not described for you.

    Learn about our implementation service →

    Is Claude safe to use for sensitive or confidential work?

    For highly sensitive work, use the Claude API (data not stored by default) or Claude Enterprise (contractual data protections, no training on your data). The standard claude.ai consumer plans store conversations and may use them for model improvement unless you opt out. Never send passwords, API keys, or financial account numbers to any AI system.

    Does Claude have content filters and safety guardrails?

    Yes. Claude is trained with Constitutional AI and RLHF to decline harmful requests, avoid generating dangerous content, and flag requests that violate Anthropic’s usage policies. Claude’s safety posture is conservative by default.

    Can Claude be used safely with children?

    Claude has safety guardrails that prevent it from producing inappropriate content for minors. Educational platforms deploying Claude for K-12 use must comply with COPPA, FERPA, and Anthropic’s usage policies, and should use Enterprise agreements with appropriate data protections.

    Need this set up for your team?
    Talk to Will →