Migrating Off Retired Claude Models: The Breaking-Change Checklist (2026)

About Will

I run a multi-site content operation on Claude and Notion with autonomous agents — and I write about what we do, including what breaks.

Connect on LinkedIn →

Last verified: June 13, 2026

Claude Opus 4 (claude-opus-4-20250514) and Claude Sonnet 4 (claude-sonnet-4-20250514) are deprecated and retire on June 15, 2026, after which requests to them return a 404. The official replacements are claude-opus-4-8 and claude-sonnet-4-6. But swapping the model string alone will break a working integration: depending on which target you choose, several request parameters that were valid on the May 2025 models now return a 400 error, and two changes alter behavior silently. This page maps each removed or changed parameter to the exact failure and the fix.

One distinction governs the whole migration. The Opus path (to claude-opus-4-8) is the strict one: it removes temperature/top_p/top_k and manual thinking budgets entirely. The Sonnet path (to claude-sonnet-4-6) is gentler: it keeps sampling parameters (with the older “one of temperature or top_p, not both” rule) and still accepts budget_tokens as deprecated-but-functional. The one rule both paths share: assistant-turn prefills now return 400.

The breaking-change matrix

Each row is a change that breaks on at least one migration target. “Error” means the API rejects the request server-side (HTTP 400) even though the SDK request type still type-checks. “Silent” means no error — the behavior simply differs.

Change On Opus 4.8 On Sonnet 4.6 Symptom Fix
thinking: {type:"enabled", budget_tokens:N} 400 error (removed) Deprecated, still works 400 on Opus; cost/latency drift on Sonnet thinking: {type:"adaptive"} + output_config.effort
temperature / top_p / top_k 400 error (removed) Keep only one of temperature or top_p 400 on Opus if any set; 400 on Sonnet if both set Remove on Opus; steer via prompt. Keep one on Sonnet
Assistant-turn prefill (last message role:"assistant") 400 error 400 error Request rejected on both output_config.format (structured outputs) or system-prompt instruction
thinking.display default Defaults to "omitted" Returns summarized text Reasoning text empty on Opus (silent) Set display: "summarized" on Opus
Tokenizer New tokenizer (more tokens) Unchanged tokenizer Same text counts higher on Opus; max_tokens too tight Re-baseline with count_tokens; add headroom
output_format (top-level) Deprecated API-wide Deprecated API-wide Works, but slated for removal Move to output_config: {format: {...}}

Model ID swaps and retirement dates

Retiring model Model ID Retires Replacement
Claude Opus 4 claude-opus-4-20250514 (alias claude-opus-4-0) June 15, 2026 claude-opus-4-8
Claude Sonnet 4 claude-sonnet-4-20250514 (alias claude-sonnet-4-0) June 15, 2026 claude-sonnet-4-6

These are the original May 2025 models, not the later Opus 4.6 or Sonnet 4.5 releases. Use the exact replacement strings above — do not append a date suffix to claude-opus-4-8 or claude-sonnet-4-6 (they are dateless pinned snapshots).

budget_tokens to adaptive thinking

The Opus path removes the fixed thinking budget. thinking: {type:"enabled", budget_tokens:N} returns a 400 on claude-opus-4-8. The replacement is adaptive thinking — the model decides how much to think per request — with overall depth controlled by the effort parameter (low | medium | high | xhigh | max). There is no direct token-count equivalent; effort is an output-level control, not a thinking budget.

# Before (Claude Opus 4 / Sonnet 4)
client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "..."}],
)

# After (Claude Opus 4.8)
client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # or "max", "xhigh", "medium", "low"
    messages=[{"role": "user", "content": "..."}],
)

On the Sonnet path, budget_tokens is deprecated but still functional on claude-sonnet-4-6, so it will not 400 — but you should still migrate to adaptive thinking. Note also that Sonnet 4.6 defaults to effort: "high" where Sonnet 4 had no effort parameter at all; if you do not set it explicitly you may see higher latency and token use after the swap.

Sampling parameters: removed vs. restricted

This is where the two paths diverge most. On claude-opus-4-8, setting temperature, top_p, or top_k to any non-default value returns a 400. Remove them entirely and steer behavior through prompting instead. (If you used temperature=0 for determinism, note it never guaranteed identical outputs on prior models either.)

# Opus path — sampling params 400 on claude-opus-4-8
# Before
client.messages.create(
    model="claude-opus-4-20250514",
    temperature=0.7,
    top_p=0.9,
    messages=[...],
)

# After — remove them
client.messages.create(
    model="claude-opus-4-8",
    messages=[...],
)

On claude-sonnet-4-6 the older Claude 4.x rule still applies: you may pass one of temperature or top_p, but passing both returns a 400. So a Sonnet 4 to Sonnet 4.6 move only requires dropping one of the two if you were setting both.

Assistant-turn prefills to structured outputs

Prefilling the final assistant turn — ending your messages array with a role: "assistant" message to force a response shape — returns a 400 on both claude-opus-4-8 and claude-sonnet-4-6. This is the one breaking change you cannot dodge by choosing the gentler target. The replacement depends on what the prefill was doing.

Prefill was used for Replacement
Forcing JSON / YAML / schema output output_config.format with a json_schema
Forcing a classification label A tool with an enum field, or structured outputs
Skipping preambles (“Here is…”) System-prompt instruction: respond directly, no preamble
Continuing an interrupted response Move continuation into the user turn
Steering around bad refusals Usually unnecessary now — plain user-turn prompting suffices
# Before (fails on both targets) — prefill forcing JSON shape
messages=[
    {"role": "user", "content": "Extract the name."},
    {"role": "assistant", "content": "{\"name\": \""},
]

# After — structured outputs replace the prefill
client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    output_config={"format": {"type": "json_schema", "schema": SCHEMA}},
    messages=[{"role": "user", "content": "Extract the name."}],
)

Thinking display: the silent one

On claude-opus-4-8, thinking blocks still stream, but their thinking text field is empty unless you opt in — the default is display: "omitted". There is no error; if your UI rendered the summarized reasoning, it now shows a long pause before output. Restore it by setting the display mode:

thinking = {
    "type": "adaptive",
    "display": "summarized",  # default is "omitted" on Opus 4.8/4.7
}

The block-field name is unchanged — it is still block.thinking on a thinking-type block. The fix is the request parameter, not the response-handling code. (Sonnet 4.6 is not affected by this default change.)

The new tokenizer: re-baseline max_tokens

This change is Opus-only and easy to miss because it produces no error. claude-opus-4-8 uses the tokenizer introduced with Opus 4.7, under which the same text tokenizes to roughly 1x–1.35x as many tokens — up to about 35% more, around 30% on typical content, varying by workload. Three consequences:

What to check Why
max_tokens ceilings and compaction triggers The same output now consumes more tokens; tight limits truncate mid-thought
Client-side token estimators (e.g. fixed char-to-token ratios) Calibrated against the old tokenizer; now undercount
Cost and rate-limit dashboards count_tokens returns higher numbers; re-baseline before reacting

Re-run client.messages.count_tokens(model="claude-opus-4-8", ...) on a representative sample of your prompts. Do not apply a blanket multiplier. Sonnet 4.6 keeps the older tokenizer, so a Sonnet 4 to Sonnet 4.6 move has no tokenizer re-baseline to do.

The full checklist

Step Opus 4 to 4.8 Sonnet 4 to 4.6
Update model ID string Required Required
Replace budget_tokens with adaptive thinking Required (400) Recommended (deprecated)
Sampling params Remove all (400) Keep only one (both 400)
Remove assistant-turn prefills Required (400) Required (400)
Set display: "summarized" if showing reasoning Required for visible thinking Not applicable
Re-baseline max_tokens for new tokenizer Required Not applicable
Set effort explicitly Defaults to high Defaults to high
Move output_format to output_config.format Recommended Recommended
Verify tool inputs parsed with a JSON parser Recommended Recommended
Spot-check one request, then roll out Required Required

If you run Claude Code, /claude-api migrate applies the model swap, breaking-parameter changes, prefill replacement, and effort calibration across a codebase, then produces a verify-it-yourself checklist. It asks you to confirm scope before editing any files.

Is migrating off Claude Opus 4 really not just a model-string change?

No. Moving to claude-opus-4-8 also requires removing temperature/top_p/top_k and any budget_tokens (all now return 400), removing assistant-turn prefills (400), opting back into summarized thinking if your UI shows it, and re-baselining max_tokens for the new tokenizer. Only the Sonnet 4 to Sonnet 4.6 move is close to a drop-in — and even that requires removing prefills.

When exactly do Claude Opus 4 and Sonnet 4 stop working?

June 15, 2026. After that date, requests to claude-opus-4-20250514 and claude-sonnet-4-20250514 return a 404. These are the original May 2025 models, not Opus 4.6 or Sonnet 4.5.

What replaces budget_tokens now that it errors on Opus?

Adaptive thinking (thinking: {type:"adaptive"}) plus the effort parameter inside output_config. There is no exact token-count equivalent: the model decides how much to think per request, and effort (low through max) tunes overall depth and spend. On Sonnet 4.6, budget_tokens still works but is deprecated.

Why does the same prompt cost more tokens on Opus 4.8?

Opus 4.8 uses the tokenizer introduced with Opus 4.7, under which the same text produces roughly 1x–1.35x as many tokens (about 30% more on typical content, up to ~35%). Re-run the count_tokens endpoint against claude-opus-4-8 and give max_tokens and compaction triggers extra headroom. Sonnet 4.6 keeps the older tokenizer, so it is unaffected.

My thinking summaries disappeared after migrating to Opus — is that a bug?

No. On Opus 4.8 (and 4.7), thinking.display defaults to "omitted", so thinking blocks stream with an empty text field. Set display: "summarized" in your thinking config to restore visible reasoning. The field name is unchanged; only the default flipped.


Track the AI tools you actually use
Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.
See the live AI tracker →or set up your alerts

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *