Migrating Off Retired Claude Models: The Breaking-Change Checklist (2026)

Q: Is migrating off Claude Opus 4 really not just a model-string change?

No. Moving to claude-opus-4-8 also requires removing temperature/top_p/top_k and any budget_tokens (all now return 400), removing assistant-turn prefills (400), opting back into summarized thinking if your UI shows it, and re-baselining max_tokens for the new tokenizer. Only the Sonnet 4 to Sonnet 4.6 move is close to a drop-in, and even that requires removing prefills.

Q: When exactly do Claude Opus 4 and Sonnet 4 stop working?

June 15, 2026. After that date, requests to claude-opus-4-20250514 and claude-sonnet-4-20250514 return a 404. These are the original May 2025 models, not Opus 4.6 or Sonnet 4.5.

Q: What replaces budget_tokens now that it errors on Opus?

Adaptive thinking (thinking: {type:'adaptive'}) plus the effort parameter inside output_config. There is no exact token-count equivalent: the model decides how much to think per request, and effort (low through max) tunes overall depth and spend. On Sonnet 4.6, budget_tokens still works but is deprecated.

Q: My thinking summaries disappeared after migrating to Opus — is that a bug?

No. On Opus 4.8 and 4.7, thinking.display defaults to 'omitted', so thinking blocks stream with an empty text field. Set display: 'summarized' in your thinking config to restore visible reasoning. The field name is unchanged; only the default flipped.

Last verified: June 13, 2026

Claude Opus 4 (claude-opus-4-20250514) and Claude Sonnet 4 (claude-sonnet-4-20250514) are deprecated and retire on June 15, 2026, after which requests to them return a 404. The official replacements are claude-opus-4-8 and claude-sonnet-4-6. But swapping the model string alone will break a working integration: depending on which target you choose, several request parameters that were valid on the May 2025 models now return a 400 error, and two changes alter behavior silently. This page maps each removed or changed parameter to the exact failure and the fix.

One distinction governs the whole migration. The Opus path (to claude-opus-4-8) is the strict one: it removes temperature/top_p/top_k and manual thinking budgets entirely. The Sonnet path (to claude-sonnet-4-6) is gentler: it keeps sampling parameters (with the older “one of temperature or top_p, not both” rule) and still accepts budget_tokens as deprecated-but-functional. The one rule both paths share: assistant-turn prefills now return 400.

The breaking-change matrix

Each row is a change that breaks on at least one migration target. “Error” means the API rejects the request server-side (HTTP 400) even though the SDK request type still type-checks. “Silent” means no error — the behavior simply differs.

Change	On Opus 4.8	On Sonnet 4.6	Symptom	Fix
`thinking: {type:"enabled", budget_tokens:N}`	400 error (removed)	Deprecated, still works	400 on Opus; cost/latency drift on Sonnet	`thinking: {type:"adaptive"}` + `output_config.effort`
`temperature` / `top_p` / `top_k`	400 error (removed)	Keep only one of `temperature` or `top_p`	400 on Opus if any set; 400 on Sonnet if both set	Remove on Opus; steer via prompt. Keep one on Sonnet
Assistant-turn prefill (last message `role:"assistant"`)	400 error	400 error	Request rejected on both	`output_config.format` (structured outputs) or system-prompt instruction
`thinking.display` default	Defaults to `"omitted"`	Returns summarized text	Reasoning text empty on Opus (silent)	Set `display: "summarized"` on Opus
Tokenizer	New tokenizer (more tokens)	Unchanged tokenizer	Same text counts higher on Opus; `max_tokens` too tight	Re-baseline with `count_tokens`; add headroom
`output_format` (top-level)	Deprecated API-wide	Deprecated API-wide	Works, but slated for removal	Move to `output_config: {format: {...}}`

Model ID swaps and retirement dates

Retiring model	Model ID	Retires	Replacement
Claude Opus 4	`claude-opus-4-20250514` (alias `claude-opus-4-0`)	June 15, 2026	`claude-opus-4-8`
Claude Sonnet 4	`claude-sonnet-4-20250514` (alias `claude-sonnet-4-0`)	June 15, 2026	`claude-sonnet-4-6`

These are the original May 2025 models, not the later Opus 4.6 or Sonnet 4.5 releases. Use the exact replacement strings above — do not append a date suffix to claude-opus-4-8 or claude-sonnet-4-6 (they are dateless pinned snapshots).

budget_tokens to adaptive thinking

The Opus path removes the fixed thinking budget. thinking: {type:"enabled", budget_tokens:N} returns a 400 on claude-opus-4-8. The replacement is adaptive thinking — the model decides how much to think per request — with overall depth controlled by the effort parameter (low | medium | high | xhigh | max). There is no direct token-count equivalent; effort is an output-level control, not a thinking budget.

# Before (Claude Opus 4 / Sonnet 4)
client.messages.create(
    model="claude-opus-4-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "..."}],
)

# After (Claude Opus 4.8)
client.messages.create(
    model="claude-opus-4-8",
    max_tokens=16000,
    thinking={"type": "adaptive"},
    output_config={"effort": "high"},  # or "max", "xhigh", "medium", "low"
    messages=[{"role": "user", "content": "..."}],
)

On the Sonnet path, budget_tokens is deprecated but still functional on claude-sonnet-4-6, so it will not 400 — but you should still migrate to adaptive thinking. Note also that Sonnet 4.6 defaults to effort: "high" where Sonnet 4 had no effort parameter at all; if you do not set it explicitly you may see higher latency and token use after the swap.

Sampling parameters: removed vs. restricted

This is where the two paths diverge most. On claude-opus-4-8, setting temperature, top_p, or top_k to any non-default value returns a 400. Remove them entirely and steer behavior through prompting instead. (If you used temperature=0 for determinism, note it never guaranteed identical outputs on prior models either.)

# Opus path — sampling params 400 on claude-opus-4-8
# Before
client.messages.create(
    model="claude-opus-4-20250514",
    temperature=0.7,
    top_p=0.9,
    messages=[...],
)

# After — remove them
client.messages.create(
    model="claude-opus-4-8",
    messages=[...],
)

On claude-sonnet-4-6 the older Claude 4.x rule still applies: you may pass one of temperature or top_p, but passing both returns a 400. So a Sonnet 4 to Sonnet 4.6 move only requires dropping one of the two if you were setting both.

Assistant-turn prefills to structured outputs

Prefilling the final assistant turn — ending your messages array with a role: "assistant" message to force a response shape — returns a 400 on both claude-opus-4-8 and claude-sonnet-4-6. This is the one breaking change you cannot dodge by choosing the gentler target. The replacement depends on what the prefill was doing.

Prefill was used for	Replacement
Forcing JSON / YAML / schema output	`output_config.format` with a `json_schema`
Forcing a classification label	A tool with an `enum` field, or structured outputs
Skipping preambles (“Here is…”)	System-prompt instruction: respond directly, no preamble
Continuing an interrupted response	Move continuation into the user turn
Steering around bad refusals	Usually unnecessary now — plain user-turn prompting suffices

# Before (fails on both targets) — prefill forcing JSON shape
messages=[
    {"role": "user", "content": "Extract the name."},
    {"role": "assistant", "content": "{\"name\": \""},
]

# After — structured outputs replace the prefill
client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    output_config={"format": {"type": "json_schema", "schema": SCHEMA}},
    messages=[{"role": "user", "content": "Extract the name."}],
)

Thinking display: the silent one

On claude-opus-4-8, thinking blocks still stream, but their thinking text field is empty unless you opt in — the default is display: "omitted". There is no error; if your UI rendered the summarized reasoning, it now shows a long pause before output. Restore it by setting the display mode:

thinking = {
    "type": "adaptive",
    "display": "summarized",  # default is "omitted" on Opus 4.8/4.7
}

The block-field name is unchanged — it is still block.thinking on a thinking-type block. The fix is the request parameter, not the response-handling code. (Sonnet 4.6 is not affected by this default change.)

The new tokenizer: re-baseline max_tokens

This change is Opus-only and easy to miss because it produces no error. claude-opus-4-8 uses the tokenizer introduced with Opus 4.7, under which the same text tokenizes to roughly 1x–1.35x as many tokens — up to about 35% more, around 30% on typical content, varying by workload. Three consequences:

What to check	Why
`max_tokens` ceilings and compaction triggers	The same output now consumes more tokens; tight limits truncate mid-thought
Client-side token estimators (e.g. fixed char-to-token ratios)	Calibrated against the old tokenizer; now undercount
Cost and rate-limit dashboards	`count_tokens` returns higher numbers; re-baseline before reacting

Re-run client.messages.count_tokens(model="claude-opus-4-8", ...) on a representative sample of your prompts. Do not apply a blanket multiplier. Sonnet 4.6 keeps the older tokenizer, so a Sonnet 4 to Sonnet 4.6 move has no tokenizer re-baseline to do.

The full checklist

Step	Opus 4 to 4.8	Sonnet 4 to 4.6
Update model ID string	Required	Required
Replace `budget_tokens` with adaptive thinking	Required (400)	Recommended (deprecated)
Sampling params	Remove all (400)	Keep only one (both 400)
Remove assistant-turn prefills	Required (400)	Required (400)
Set `display: "summarized"` if showing reasoning	Required for visible thinking	Not applicable
Re-baseline `max_tokens` for new tokenizer	Required	Not applicable
Set `effort` explicitly	Defaults to `high`	Defaults to `high`
Move `output_format` to `output_config.format`	Recommended	Recommended
Verify tool inputs parsed with a JSON parser	Recommended	Recommended
Spot-check one request, then roll out	Required	Required

If you run Claude Code, /claude-api migrate applies the model swap, breaking-parameter changes, prefill replacement, and effort calibration across a codebase, then produces a verify-it-yourself checklist. It asks you to confirm scope before editing any files.

Is migrating off Claude Opus 4 really not just a model-string change?

No. Moving to claude-opus-4-8 also requires removing temperature/top_p/top_k and any budget_tokens (all now return 400), removing assistant-turn prefills (400), opting back into summarized thinking if your UI shows it, and re-baselining max_tokens for the new tokenizer. Only the Sonnet 4 to Sonnet 4.6 move is close to a drop-in — and even that requires removing prefills.

When exactly do Claude Opus 4 and Sonnet 4 stop working?

June 15, 2026. After that date, requests to claude-opus-4-20250514 and claude-sonnet-4-20250514 return a 404. These are the original May 2025 models, not Opus 4.6 or Sonnet 4.5.

What replaces budget_tokens now that it errors on Opus?

Adaptive thinking (thinking: {type:"adaptive"}) plus the effort parameter inside output_config. There is no exact token-count equivalent: the model decides how much to think per request, and effort (low through max) tunes overall depth and spend. On Sonnet 4.6, budget_tokens still works but is deprecated.

Why does the same prompt cost more tokens on Opus 4.8?

Opus 4.8 uses the tokenizer introduced with Opus 4.7, under which the same text produces roughly 1x–1.35x as many tokens (about 30% more on typical content, up to ~35%). Re-run the count_tokens endpoint against claude-opus-4-8 and give max_tokens and compaction triggers extra headroom. Sonnet 4.6 keeps the older tokenizer, so it is unaffected.

My thinking summaries disappeared after migrating to Opus — is that a bug?

No. On Opus 4.8 (and 4.7), thinking.display defaults to "omitted", so thinking blocks stream with an empty text field. Set display: "summarized" in your thinking config to restore visible reasoning. The field name is unchanged; only the default flipped.

What to explore next

Agency Playbook

Replacing the Interviewer: What the Human Distillery App Can and Cannot Do

Same room

AI Music & Creative

The No-Budget Artist’s Complete Guide to AI Music Rehearsal: Build a Full Show When You Can’t Afford a Band

Same room

AI in Restoration

Water Damage Supplement: 8 Missed Xactimate Line Items

You may also explore

Deep dive

Everett Government

Everett Transit Consolidation: What the Council Vote Means

Deep dive

Track the AI tools you actually use

Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.

See the live AI tracker →or set up your alerts

Migrating Off Retired Claude Models: The Breaking-Change Checklist (2026)

The breaking-change matrix

Model ID swaps and retirement dates

budget_tokens to adaptive thinking

Sampling parameters: removed vs. restricted

Assistant-turn prefills to structured outputs

Thinking display: the silent one

The new tokenizer: re-baseline max_tokens

The full checklist

Is migrating off Claude Opus 4 really not just a model-string change?

When exactly do Claude Opus 4 and Sonnet 4 stop working?

What replaces budget_tokens now that it errors on Opus?

Why does the same prompt cost more tokens on Opus 4.8?

My thinking summaries disappeared after migrating to Opus — is that a bug?

Comments

Leave a Reply Cancel reply

More posts

AI Agents Are Learning to Check Instead of Guess: The GitHub Context Problem

Logic Apps vs Cloud Workflows: No-Code Automation Across Two Clouds

Azure Static Web Apps vs Firebase Hosting: A Dashboard on Each

Cosmos DB vs Firestore: A Free-Tier Operations Ledger on Both Clouds