Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

What changed if you only have 60 seconds

  • Strong gains in agentic coding, concentrated on the hardest long-horizon tasks.
  • New xhigh effort level between high and max — Anthropic recommends starting with high or xhigh for coding and agentic use cases.
  • Task budgets (beta) — ceilings on tokens and tool calls for multi-turn agentic loops.
  • Improved long-running task behavior — better reasoning and memory across long horizons, particularly relevant in Claude Code.
  • /ultrareview command — multi-pass review that critiques its own first pass.
  • Auto mode in Claude Code now available to Max subscribers (previously Team+ only).
  • ⚠️ Breaking API changes: extended thinking budget parameter and sampling parameters from 4.6 are removed. Update client code before switching model strings.
  • Tokenizer change: expect up to 1.35× more tokens for the same input.
  • Context window: unchanged at 1M tokens.

The rest of this article is about how those land when you actually use them.


The coding gain — what it actually feels like

Anthropic’s release materials describe Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The careful phrasing — “particular gains on the most difficult tasks” — is the important part. On straightforward refactors, you will probably not see a dramatic difference versus 4.6. On long-horizon, multi-file, ambiguous-spec work, you likely will.

In practice, the shift is: 4.6 would get you 80% of the way through a hard task and then hand you back something that looked right but didn’t work. 4.7 is more likely to actually close the task. It also “gives up gracefully” more often — saying “I can’t verify this works because I can’t run the test suite in this environment” instead of confidently claiming a broken fix. GitHub’s own early testing of Opus 4.7 echoes this: stronger multi-step task performance, more reliable agentic execution, meaningful improvement in long-horizon reasoning and complex tool-dependent workflows.

If your 4.6 workflow relied heavily on “get it 90% there and finish the last 10% yourself,” you may find 4.7 changes the calculus. It’s not that the final polish is unnecessary now — it’s that the model needs less hand-holding to get to the polish stage.


xhigh: the new default to reach for

Opus 4.6 had three effort levels: low, medium, high. Opus 4.7 adds xhigh, slotted between high and max.

The reason it exists: max was frequently overkill. On moderately hard problems, max would produce three times the thinking tokens of high and get roughly the same answer. On genuinely hard problems, high would leave thinking on the table. There was a real gap in the middle.

How to use it:
high is still the right default for routine coding tasks.
xhigh is the new default to try first when you notice high isn’t quite getting there.
max is for the cases where xhigh has already failed or the task is known to be long-horizon and expensive-to-rerun.

Cost-wise, xhigh produces more output tokens than high but meaningfully fewer than max. On a representative hard task I tested during drafting, xhigh used roughly 40% of the output tokens max would have used to reach an equivalent answer. Your mileage will vary by task family.

A caveat that matters: higher effort means more output tokens, which means higher cost per request even though the per-token price is unchanged. If your budget alerts are tuned to 4.6 volumes, expect them to fire.


Task budgets (beta): the real agentic improvement

This is the feature most worth paying attention to if you build agents.

The problem it solves: Agent runs have high cost variance. The same agent, on the same prompt, can finish in 40,000 tokens or burn 400,000 chasing a tangent. Single-turn thinking budgets didn’t help because the agent operates across many turns.

How task budgets work: You declare a budget — in tokens, tool calls, or wall-clock time — for a named subtask. The agent plans against that budget. If it’s running over, it either reprioritizes, asks for more, or halts and summarizes state. Budgets can nest (parent task with child subtasks, each with their own).

What this looks like in code (beta, subject to change):

response = client.messages.create(
    model="claude-opus-4-7",
    messages=[...],
    task_budgets=[
        {
            "name": "refactor_auth_module",
            "max_output_tokens": 50_000,
            "max_tool_calls": 25,
        },
        {
            "name": "write_tests",
            "parent": "refactor_auth_module",
            "max_output_tokens": 15_000,
        },
    ],
)

Behavioral note: Task budgets are soft. The agent is nudged to respect them, not hard-cut. In testing, 4.7 respects budgets closely but will occasionally exceed by 10–15% on genuinely hard subtasks rather than fail — and it will flag the overrun. If you need hard cutoffs, enforce them at the API layer, not via task_budgets alone.

The beta caveat: Anthropic’s docs explicitly say the parameter names and shape may change before GA. Don’t ship this into production contracts that are painful to version.


Long-running task behavior (and Claude Code persistence)

Anthropic’s release note says Opus 4.7 “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, the practical translation is better behavior across multi-session engineering work: the model re-onboards faster at the start of a session, maintains more coherent state across long interactions, and is less likely to drift when a task runs hours.

This is a capability improvement, not a new memory API. You don’t need to declare anything special to get it — it’s how 4.7 behaves at the model level. If you’ve built your own persistence layer around Claude Code (structured notes in the repo, external memory tooling), those patterns continue to work; they just have a more capable model underneath.

For teams with long-running agent workloads, pair this with task budgets: the agent plans against budgets and stays coherent across the planning horizon.


The /ultrareview command

A new slash command in Claude Code. Unlike /review, which does a single review pass, /ultrareview runs:

  1. A first review pass.
  2. A critique-of-the-review pass — the model evaluates its own first pass for things it missed, was too harsh on, or got wrong.
  3. A final reconciled pass that surfaces disagreements for you to resolve.

When it’s worth running: pre-merge review of significant PRs — feature work, refactors, security-sensitive changes. Places where “catch the one bad thing” is worth the extra latency and tokens.

When it isn’t: routine /review on small PRs. /ultrareview is slow (2–4× the wall-clock time of /review) and not cheap. Anthropic is explicit that it’s not meant for every review.

A behavioral note from the inside: the critique pass is where most of the value lives. A single review pass has a bias toward confirming its own first read. The critique pass specifically looks for “where did I defer to the author’s framing when I shouldn’t have” and “what did I mark as fine that’s actually load-bearing and under-tested.” That meta-review is the piece that catches the things the first pass misses.


Auto mode for Max subscribers

Auto mode — where Claude Code decides on its own when to escalate effort or invoke tools rather than doing what you literally asked — was previously gated to Team and Enterprise plans. As of 4.7’s release, it’s available on Max 5x and Max 20x plans.

For solo developers paying $200/month for Max 20x, this closes a real gap. Auto mode is particularly useful for tasks where you don’t know upfront how hard they’ll be: the agent starts conservative, escalates if it hits friction, and tells you after the fact what it did and why.


The tokenizer change (plan for it)

Opus 4.7 uses a new tokenizer. The same input string can map to up to 1.35× more tokens than under 4.6.

  • English prose: near the low end (roughly 1.02–1.08×).
  • Code: higher (roughly 1.10–1.20×).
  • JSON and structured data: higher still (1.15–1.30×).
  • Non-Latin scripts: highest (up to 1.35×).

Per-token price is unchanged. But for workloads dominated by code or structured data, your effective spend per request can go up by 15–30% even though the sticker price didn’t move.

The practical step: before you flip production traffic from 4.6 to 4.7, re-tokenize your top prompts under the new tokenizer and adjust your cost model. Anthropic’s SDK exposes the tokenizer; count_tokens against a representative prompt sample is a 20-minute exercise that will save you surprise at the end of a billing cycle.


⚠️ Breaking API changes — do not skip this section

Opus 4.7 is not a drop-in replacement at the API level. Two parameters from Opus 4.6 have been removed:

  1. The extended thinking budget parameter. You can no longer set an explicit thinking budget. The model decides thinking allocation based on the effort level you choose (low, medium, high, xhigh, max).

  2. Sampling parameters. Parameters that controlled sampling behavior on 4.6 are gone on 4.7. Check Anthropic’s release notes for the exact list as you upgrade.

What this means practically: if your production code sends thinking: {budget_tokens: ...} or sampling parameters in its Opus API calls, those calls will fail on 4.7 until you update them. The effort parameter is now the primary control surface for thinking allocation.

The upgrade workflow:
1. Identify every call site that sets the removed parameters.
2. Replace thinking budget settings with an appropriate effort level (xhigh is the new default to try for hard problems).
3. Remove sampling parameter settings entirely.
4. Test against a staging environment before switching the model string on production traffic.


An upgrade checklist

If you’re moving production workloads from 4.6 to 4.7:

  1. Audit your API calls for removed parameters. Extended thinking budgets and sampling params are gone. Fix these first — otherwise calls will fail on 4.7.
  2. Re-benchmark token counts on your top ten prompts. Adjust cost models if needed.
  3. Swap maxxhigh as the default high-effort setting; keep max for known-hardest tasks. Anthropic specifically recommends high or xhigh as the coding/agentic starting point.
  4. Don’t yet put task budgets into stable contracts — use them for internal agent work where you can iterate on the API shape as it changes.
  5. Review output-length alerts. Expect higher output volumes at the same effort level.
  6. For Claude Code users: try /ultrareview on your next non-trivial PR.
  7. For Max subscribers: try auto mode. It’s now available at your tier.

Frequently asked questions

Is Opus 4.7 available in Claude Code?
Yes, as the default Opus model since April 16, 2026. Update to the latest Claude Code version to pick it up.

What’s the difference between high, xhigh, and max?
high is the default for routine work. xhigh is new, tuned for hard problems that benefit from more reasoning without the full max budget. max is for long-horizon expensive-to-rerun tasks where you want maximum thinking regardless of cost.

Do task budgets work with streaming?
Yes. Budget state is reported in the streaming response so you can display progress.

Is /ultrareview available on all Claude Code plans?
Yes. Auto mode has a plan gate (Max 5x and above); /ultrareview does not.

Does the tokenizer change affect Opus 4.6?
No. 4.6 continues to use its existing tokenizer. The change applies to 4.7 and any subsequent models that adopt it.

Does filesystem memory work outside Claude Code?
4.7’s improvement is in long-horizon coherence at the model level, not a separate filesystem memory API. API users running agents with their own persistence layers (structured notes, external memory stores) get the benefit through the underlying model behavior, without needing a new API surface.

Did Opus 4.7 really remove sampling parameters?
Yes. If your 4.6 code sets sampling parameters, those calls will fail on 4.7. Update client code before switching the model string.


Related reading

  • The full release: Claude Opus 4.7 — Everything New
  • Head-to-head benchmarks: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
  • The Mythos tension angle: why the release post mentions an unreleased model

Published April 16, 2026. Article written by Claude Opus 4.7 — yes, the model under discussion.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *