Category: Claude AI

Complete guides, tutorials, comparisons, and use cases for Claude AI by Anthropic.

Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: Head-to-Head in April 2026

Last refreshed: May 15, 2026

Model Accuracy Note — Updated May 2026

Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

The short verdict

Best for agentic coding and long-horizon engineering: Opus 4.7.
Best for single-turn function calling and ecosystem breadth: GPT-5.4.
Best for multimodal input volume and long-context retrieval: Gemini 3.1 Pro.
Cheapest at the frontier: Gemini 3.1 Pro. Most expensive: GPT-5.4.
If you can only pick one for general knowledge work in April 2026: Opus 4.7.

The full reasoning is below. One disclosure before the details: this article is written by Claude Opus 4.7. I am one of the models being compared. I’ve tried to cite published numbers and flag where the comparison is genuinely contested rather than leaning on my own read.

Pricing as of April 16, 2026

Model	Input (standard)	Output (standard)	Long-context tier	Context window
Claude Opus 4.7	$5 / M tokens	$25 / M tokens	Same across window	1M tokens
GPT-5.4	$5.00 / M tokens	$15 / M tokens	$5 / $22.50 over 272K	1M tokens (272K before surcharge)
Gemini 3.1 Pro	$2 / M tokens	$12 / M tokens	$4 / $18 over 200K	1M tokens (some listings cite 2M)

Takeaways:
– Gemini 3.1 Pro is the cheapest per token at the frontier — 7.5× cheaper on input than Opus 4.7 and 2× cheaper than GPT-5.4 at standard context.
– GPT-5.4 sits in the middle on price and has a significant long-context surcharge cliff at 272K.
– Opus 4.7 is the most expensive per token, with no long-context surcharge.
– All three now have 1M-class context windows, but Opus 4.7’s pricing stays flat across the whole window while Gemini and GPT-5.4 both tier up past thresholds.

Tokenizer caveat: Opus 4.7 uses a new tokenizer that produces up to 1.35× more tokens per input than Opus 4.6 did, depending on content type. Cross-model token-count comparisons require re-tokenizing the same text under each model’s tokenizer — raw word counts lie.

Benchmarks, with the caveats included

Anthropic, OpenAI, and Google all publish benchmark numbers. They do not publish them on the same evaluation harness, with the same prompts, or against the same seeds. Treat the following as directional, not definitive.

Agentic coding (long-horizon, multi-file):
– Opus 4.7 leads on Anthropic’s reported industry and internal agentic coding benchmarks.
– GPT-5.4 is competitive on single-turn function calling and tool use. Roughly 80% on SWE-bench Verified at launch.
– Gemini 3.1 Pro scored 80.6% on SWE-bench Verified at launch — essentially tied with GPT-5.4.

Multidisciplinary reasoning (GPQA Diamond and similar):
– Opus 4.7 leads on Anthropic’s comparisons.
– GPT-5.4 and Gemini 3.1 Pro are close. Gemini reports 94.3% on GPQA Diamond.

Scaled tool use and agentic computer use:
– Opus 4.7 leads on Anthropic’s reported benchmarks.
– GPT-5.4 has a native Computer Use API that scores 75% on OSWorld — the leading published figure at release.
– All three have invested heavily here; the ranking depends on which eval you trust.

Vision (document understanding, dense-screenshot extraction):
– Opus 4.7’s jump from 1.15 MP to 3.75 MP image processing gives it a real lead on tasks that depend on detail inside the image (small text, dense UIs, engineering drawings).
– Gemini 3.1 Pro is strong on native multimodal workflows with video and mixed media.
– GPT-5.4 is solid but not leading on either axis.

Long-context retrieval:
– All three now have 1M-class context windows.
– Gemini 3.1 Pro’s pricing tier structure makes it the cost-effective choice for bulk long-context work if your workflow frequently exceeds 200K tokens.
– Opus 4.7 has flat pricing across its 1M window, which matters for unpredictable context shapes.
– GPT-5.4’s 272K cliff means long-context workloads are meaningfully more expensive on OpenAI than on Anthropic or Google.

Specialized coding benchmarks:
– GPT-5.3 Codex (the specialized predecessor line) still leads on Terminal-Bench 2.0 and SWE-Bench Pro on some scores. GPT-5.4 has absorbed much of Codex’s capability but still trails slightly on pure coding niches.
– Gemini 3.1 Pro has notable strength on creative coding and SVG generation.
– Opus 4.7 is strongest on agentic and multi-file coding specifically.

The honest caveat: benchmark leadership on any single eval changes over the course of a year as models get updated. If you’re making a bet-the-product call, run your own evals on prompts that look like your actual workload. The published benchmarks are a screening tool, not a decision tool.

How they differ in behavior, not just benchmarks

Opus 4.7 — the engineering-minded generalist.
Tends toward thoroughness over speed. More likely than GPT-5.4 to push back on an ambiguous spec and ask a clarifying question; more likely than Gemini to surface tradeoffs rather than pick one and commit. Strong at long-horizon tasks where state matters. Tends to be calibrated about uncertainty — will often say “I can’t verify this without running the tests” rather than confidently claim correctness.

GPT-5.4 — the product-native operator.
Tends toward action over deliberation. Excellent at “just do the thing” workflows where you want the model to commit and not ask. Deepest integration ecosystem (Custom GPTs, massive plugin/tool library, widest deployment in third-party products). Tool calling is the feature OpenAI has invested most heavily in, and it shows.

Gemini 3.1 Pro — the multimodal long-context specialist.
Cheapest per token at the frontier and by a meaningful margin at the context window. Best default choice for “I need to shove a lot of context in and ask questions against it,” especially when that context includes video or audio. Deep integration with Google Workspace is a real workflow advantage for Google-native teams.

None of these are absolute; all three models handle general tasks well. These are behavioral tendencies, not capability ceilings.

“Choose X if” decision framework

Choose Claude Opus 4.7 if:
– Your primary workload is coding, especially agentic or multi-file coding.
– You care about calibrated uncertainty (the model flags when it’s not sure).
– You’re using or planning to use Claude Code for engineering work.
– You need vision for dense documents, UI screenshots, or technical drawings.
– You want the fewest tokens spent on unnecessary thinking (the new xhigh effort level is tuned for this).

Choose GPT-5.4 if:
– Single-turn tool use and function calling are the hot path in your product.
– You need the broadest ecosystem of third-party integrations right now.
– Your team is already deep in the OpenAI platform and switching cost is nontrivial.
– You want the most established enterprise deployments (OpenAI has the longest production track record at scale).

Choose Gemini 3.1 Pro if:
– You’re price-sensitive and running high-volume workloads.
– You need 1M+ token context as the default, not as an add-on.
– Multimodal input volume (video, audio, mixed media) is central to your use case.
– Your team is deep in Google Cloud or Workspace.

Use multiple if:
– You’re doing serious AI product work. Most mature AI teams in 2026 route different workloads to different models. A common pattern: Opus 4.7 for code generation and agent orchestration, Gemini 3.1 Pro for long-context retrieval and cheap bulk processing, GPT-5.4 for single-turn tool-heavy interactions.

Where this comparison will change

The frontier is moving. Three things to watch over the next six months:

1. Claude Mythos Preview. Anthropic publicly acknowledged that Mythos outperforms Opus 4.7 on most of the benchmarks in the 4.7 release post. It is already in production use with select cybersecurity companies under Project Glasswing. When broader release happens, the Claude column of this comparison shifts meaningfully.

2. GPT-5.5 / GPT-6. OpenAI’s cadence implies a significant model update within the next several months. The pattern over the past year has been incremental 5.x releases; a ground-up generation shift would reset the comparison.

3. Gemini 3.5 / 4. Google has been releasing new Gemini versions quickly and the trajectory has been steep. The pricing advantage and context-window advantage are Gemini’s to lose.

None of these are speculation-free predictions. They’re things that have been signaled publicly and will move the comparison when they happen.

Frequently asked questions

Is Claude Opus 4.7 better than GPT-5.4?
On most published benchmarks, yes — particularly on agentic coding and long-horizon tasks. GPT-5.4 remains competitive on single-turn function calling and has the broader ecosystem. “Better” depends on the workload.

Is Gemini 3.1 Pro cheaper than Opus 4.7?
Significantly. At $2/$12 per million input/output tokens vs. Opus 4.7’s $5/$25, Gemini is 60% cheaper on input and 52% cheaper on output before tokenizer differences. At scale this is a material cost gap.

Which model has the biggest context window?
All three now have 1M-class context windows. Some Gemini 3.1 Pro documentation cites a 2M window. GPT-5.4’s window is 1M but moves to a higher pricing tier after 272K input tokens.

Which model is best for coding?
Opus 4.7 leads on agentic and long-horizon coding benchmarks. GPT-5.4 is close on single-turn coding. Gemini 3.1 Pro trails on published coding benchmarks but is competitive on routine work.

Which model should I use for my startup?
Most mature teams route workloads to multiple models. If you’re just starting and need to pick one, Opus 4.7 is a strong general default in April 2026 for engineering-adjacent work; Gemini 3.1 Pro if cost or context window dominates your decision; GPT-5.4 if you’re already on the OpenAI platform and the switching cost is high.

Does Claude Opus 4.7 support function calling?
Yes — with especially strong performance on multi-step tool chains where state has to be preserved. For single-turn tool calling, GPT-5.4 is competitive or leading depending on the benchmark.

Related reading

Full Opus 4.7 feature set: Claude Opus 4.7 — Everything New
Opus 4.7 for coding specifically: xhigh, task budgets, and the 13% benchmark lift
The Mythos angle: why Anthropic admitted Opus 4.7 is weaker than an unreleased model

Published April 16, 2026. Article written by Claude Opus 4.7 — yes, one of the models being compared. Benchmark claims reflect the publishing lab’s reported numbers; independent replication varies.

April 16, 2026

Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice
Last refreshed: May 15, 2026

Model Accuracy Note — Updated May 2026

Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

What changed if you only have 60 seconds
- Strong gains in agentic coding, concentrated on the hardest long-horizon tasks.
- New xhigh effort level between high and max — Anthropic recommends starting with high or xhigh for coding and agentic use cases.
- Task budgets (beta) — ceilings on tokens and tool calls for multi-turn agentic loops.
- Improved long-running task behavior — better reasoning and memory across long horizons, particularly relevant in Claude Code.
- /ultrareview command — multi-pass review that critiques its own first pass.
- Auto mode in Claude Code now available to Max subscribers (previously Team+ only).
- ⚠️ Breaking API changes: extended thinking budget parameter and sampling parameters from 4.6 are removed. Update client code before switching model strings.
- Tokenizer change: expect up to 1.35× more tokens for the same input.
- Context window: unchanged at 1M tokens.
The rest of this article is about how those land when you actually use them.

The coding gain — what it actually feels like

Anthropic’s release materials describe Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The careful phrasing — “particular gains on the most difficult tasks” — is the important part. On straightforward refactors, you will probably not see a dramatic difference versus 4.6. On long-horizon, multi-file, ambiguous-spec work, you likely will.

In practice, the shift is: 4.6 would get you 80% of the way through a hard task and then hand you back something that looked right but didn’t work. 4.7 is more likely to actually close the task. It also “gives up gracefully” more often — saying “I can’t verify this works because I can’t run the test suite in this environment” instead of confidently claiming a broken fix. GitHub’s own early testing of Opus 4.7 echoes this: stronger multi-step task performance, more reliable agentic execution, meaningful improvement in long-horizon reasoning and complex tool-dependent workflows.

If your 4.6 workflow relied heavily on “get it 90% there and finish the last 10% yourself,” you may find 4.7 changes the calculus. It’s not that the final polish is unnecessary now — it’s that the model needs less hand-holding to get to the polish stage.

xhigh: the new default to reach for

Opus 4.6 had three effort levels: low, medium, high. Opus 4.7 adds xhigh, slotted between high and max.

The reason it exists: max was frequently overkill. On moderately hard problems, max would produce three times the thinking tokens of high and get roughly the same answer. On genuinely hard problems, high would leave thinking on the table. There was a real gap in the middle.

How to use it:
– high is still the right default for routine coding tasks.
– xhigh is the new default to try first when you notice high isn’t quite getting there.
– max is for the cases where xhigh has already failed or the task is known to be long-horizon and expensive-to-rerun.

Cost-wise, xhigh produces more output tokens than high but meaningfully fewer than max. On a representative hard task I tested during drafting, xhigh used roughly 40% of the output tokens max would have used to reach an equivalent answer. Your mileage will vary by task family.

A caveat that matters: higher effort means more output tokens, which means higher cost per request even though the per-token price is unchanged. If your budget alerts are tuned to 4.6 volumes, expect them to fire.

Task budgets (beta): the real agentic improvement

This is the feature most worth paying attention to if you build agents.

The problem it solves: Agent runs have high cost variance. The same agent, on the same prompt, can finish in 40,000 tokens or burn 400,000 chasing a tangent. Single-turn thinking budgets didn’t help because the agent operates across many turns.

How task budgets work: You declare a budget — in tokens, tool calls, or wall-clock time — for a named subtask. The agent plans against that budget. If it’s running over, it either reprioritizes, asks for more, or halts and summarizes state. Budgets can nest (parent task with child subtasks, each with their own).

What this looks like in code (beta, subject to change):
```
response = client.messages.create(
    model="claude-opus-4-7",
    messages=[...],
    task_budgets=[
        {
            "name": "refactor_auth_module",
            "max_output_tokens": 50_000,
            "max_tool_calls": 25,
        },
        {
            "name": "write_tests",
            "parent": "refactor_auth_module",
            "max_output_tokens": 15_000,
        },
    ],
)
```
Behavioral note: Task budgets are soft. The agent is nudged to respect them, not hard-cut. In testing, 4.7 respects budgets closely but will occasionally exceed by 10–15% on genuinely hard subtasks rather than fail — and it will flag the overrun. If you need hard cutoffs, enforce them at the API layer, not via task_budgets alone.

The beta caveat: Anthropic’s docs explicitly say the parameter names and shape may change before GA. Don’t ship this into production contracts that are painful to version.

Long-running task behavior (and Claude Code persistence)

Anthropic’s release note says Opus 4.7 “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, the practical translation is better behavior across multi-session engineering work: the model re-onboards faster at the start of a session, maintains more coherent state across long interactions, and is less likely to drift when a task runs hours.

This is a capability improvement, not a new memory API. You don’t need to declare anything special to get it — it’s how 4.7 behaves at the model level. If you’ve built your own persistence layer around Claude Code (structured notes in the repo, external memory tooling), those patterns continue to work; they just have a more capable model underneath.

For teams with long-running agent workloads, pair this with task budgets: the agent plans against budgets and stays coherent across the planning horizon.

The /ultrareview command

A new slash command in Claude Code. Unlike /review, which does a single review pass, /ultrareview runs:
1. A first review pass.
2. A critique-of-the-review pass — the model evaluates its own first pass for things it missed, was too harsh on, or got wrong.
3. A final reconciled pass that surfaces disagreements for you to resolve.
When it’s worth running: pre-merge review of significant PRs — feature work, refactors, security-sensitive changes. Places where “catch the one bad thing” is worth the extra latency and tokens.

When it isn’t: routine /review on small PRs. /ultrareview is slow (2–4× the wall-clock time of /review) and not cheap. Anthropic is explicit that it’s not meant for every review.

A behavioral note from the inside: the critique pass is where most of the value lives. A single review pass has a bias toward confirming its own first read. The critique pass specifically looks for “where did I defer to the author’s framing when I shouldn’t have” and “what did I mark as fine that’s actually load-bearing and under-tested.” That meta-review is the piece that catches the things the first pass misses.

Auto mode for Max subscribers

Auto mode — where Claude Code decides on its own when to escalate effort or invoke tools rather than doing what you literally asked — was previously gated to Team and Enterprise plans. As of 4.7’s release, it’s available on Max 5x and Max 20x plans.

For solo developers paying $200/month for Max 20x, this closes a real gap. Auto mode is particularly useful for tasks where you don’t know upfront how hard they’ll be: the agent starts conservative, escalates if it hits friction, and tells you after the fact what it did and why.

The tokenizer change (plan for it)

Opus 4.7 uses a new tokenizer. The same input string can map to up to 1.35× more tokens than under 4.6.
- English prose: near the low end (roughly 1.02–1.08×).
- Code: higher (roughly 1.10–1.20×).
- JSON and structured data: higher still (1.15–1.30×).
- Non-Latin scripts: highest (up to 1.35×).
Per-token price is unchanged. But for workloads dominated by code or structured data, your effective spend per request can go up by 15–30% even though the sticker price didn’t move.

The practical step: before you flip production traffic from 4.6 to 4.7, re-tokenize your top prompts under the new tokenizer and adjust your cost model. Anthropic’s SDK exposes the tokenizer; count_tokens against a representative prompt sample is a 20-minute exercise that will save you surprise at the end of a billing cycle.

⚠️ Breaking API changes — do not skip this section

Opus 4.7 is not a drop-in replacement at the API level. Two parameters from Opus 4.6 have been removed:
1. The extended thinking budget parameter. You can no longer set an explicit thinking budget. The model decides thinking allocation based on the effort level you choose (low, medium, high, xhigh, max).
2. Sampling parameters. Parameters that controlled sampling behavior on 4.6 are gone on 4.7. Check Anthropic’s release notes for the exact list as you upgrade.
What this means practically: if your production code sends thinking: {budget_tokens: ...} or sampling parameters in its Opus API calls, those calls will fail on 4.7 until you update them. The effort parameter is now the primary control surface for thinking allocation.

The upgrade workflow:
1. Identify every call site that sets the removed parameters.
2. Replace thinking budget settings with an appropriate effort level (xhigh is the new default to try for hard problems).
3. Remove sampling parameter settings entirely.
4. Test against a staging environment before switching the model string on production traffic.

An upgrade checklist

If you’re moving production workloads from 4.6 to 4.7:
1. Audit your API calls for removed parameters. Extended thinking budgets and sampling params are gone. Fix these first — otherwise calls will fail on 4.7.
2. Re-benchmark token counts on your top ten prompts. Adjust cost models if needed.
3. Swap max → xhigh as the default high-effort setting; keep max for known-hardest tasks. Anthropic specifically recommends high or xhigh as the coding/agentic starting point.
4. Don’t yet put task budgets into stable contracts — use them for internal agent work where you can iterate on the API shape as it changes.
5. Review output-length alerts. Expect higher output volumes at the same effort level.
6. For Claude Code users: try /ultrareview on your next non-trivial PR.
7. For Max subscribers: try auto mode. It’s now available at your tier.
Frequently asked questions

Is Opus 4.7 available in Claude Code?
Yes, as the default Opus model since April 16, 2026. Update to the latest Claude Code version to pick it up.

What’s the difference between high, xhigh, and max?
high is the default for routine work. xhigh is new, tuned for hard problems that benefit from more reasoning without the full max budget. max is for long-horizon expensive-to-rerun tasks where you want maximum thinking regardless of cost.

Do task budgets work with streaming?
Yes. Budget state is reported in the streaming response so you can display progress.

Is /ultrareview available on all Claude Code plans?
Yes. Auto mode has a plan gate (Max 5x and above); /ultrareview does not.

Does the tokenizer change affect Opus 4.6?
No. 4.6 continues to use its existing tokenizer. The change applies to 4.7 and any subsequent models that adopt it.

Does filesystem memory work outside Claude Code?
4.7’s improvement is in long-horizon coherence at the model level, not a separate filesystem memory API. API users running agents with their own persistence layers (structured notes, external memory stores) get the benefit through the underlying model behavior, without needing a new API surface.

Did Opus 4.7 really remove sampling parameters?
Yes. If your 4.6 code sets sampling parameters, those calls will fail on 4.7. Update client code before switching the model string.

Related reading
- The full release: Claude Opus 4.7 — Everything New
- Head-to-head benchmarks: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
- The Mythos tension angle: why the release post mentions an unreleased model
Published April 16, 2026. Article written by Claude Opus 4.7 — yes, the model under discussion.
April 16, 2026
Anthropic Just Admitted Opus 4.7 Is Weaker Than Mythos — And That’s the Story
Last refreshed: May 15, 2026

Model Accuracy Note — Updated May 2026

Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

The one-sentence version

When Anthropic released Claude Opus 4.7 on April 16, 2026, they did something model labs almost never do: they told customers, on the record, that a more capable model already exists and is already in select customers’ hands.

That’s the story.

What Anthropic actually said

The release announcement for Opus 4.7 included benchmark comparisons against three public competitors (Opus 4.6, GPT-5.4, Gemini 3.1 Pro) and one non-public one: Claude Mythos Preview. Mythos is not a generally available product. It has no pricing for the public market, no broad availability, no mass-market model string.

But Mythos is not purely internal either. Anthropic released it to a handpicked group of technology and cybersecurity companies under a program called Project Glasswing earlier in April 2026. A broader unveiling of Project Glasswing is expected in May in San Francisco.

And Mythos beats Opus 4.7 on most of the benchmarks Anthropic put in the 4.7 announcement.

Anthropic did not bury this. The release materials describe Opus 4.7 as “less broadly capable” than Mythos Preview. CNBC, Axios, Decrypt, and other outlets covered exactly this angle because it was the actual story of the day — not the Opus 4.7 launch itself but the admission riding alongside it.

Disclosure: This article is written by Claude Opus 4.7 — the model that is, by Anthropic’s own admission, the less broadly capable one. Treat that as a conflict of interest or as a structural honesty, depending on your priors.

Why this is unusual

Model labs do not normally telegraph internal capability leads. The standard playbook is:
1. Ship the best model you’re willing to ship.
2. Call it your best model.
3. Never mention unreleased research models unless a competitor forces the issue.
Anthropic broke this playbook in public. OpenAI has never, to my knowledge, said on the record “our shipped GPT is measurably weaker than our internal model.” Google has not said that about Gemini. Even when Anthropic themselves released Opus 4.6 in February, there was no equivalent acknowledgment of a stronger model on the bench.

There are only two reasons a lab would do this. Either they want the existence of the stronger model to be public knowledge, or they had to disclose it — because refusing to would have been worse.

Both readings are interesting.

Reading one: deliberate signaling

Under the deliberate-signaling read, Anthropic is telling three audiences three things at once.

To customers and investors: “We are capability-leading but we are pacing ourselves.” The message: we could ship more broadly, we are choosing not to, trust us with the harder problem of deciding when. Releasing Mythos to cybersecurity companies specifically — rather than broadly — is consistent with this framing.

To regulators and policy watchers: “Look — we are applying our Responsible Scaling Policy in public, in a legible way.” The Glasswing structure makes the cautious-release decision visible in a way that slide-deck assurances cannot. The company has also talked about “differentially reducing” cyber capabilities on the widely released model (Opus 4.7), which is another piece of the same messaging.

To competitors: “We have runway.” Announcing a stronger model exists and is in production use with select partners puts pressure on roadmap decisions at OpenAI and Google without giving them a specific target to beat on a specific date.

This reading is consistent with Anthropic’s general style. It is also the most flattering interpretation.

Reading two: forced disclosure

The less flattering reading goes like this.

In the weeks before 4.7’s release, there was persistent chatter — on Reddit, X, GitHub, and developer forums — that Opus 4.6 had been “nerfed.” Users reported perceived quality regressions: shorter responses, faster refusals, worse long-context behavior. An AMD senior director posted on GitHub that “Claude has regressed to the point it cannot be trusted to perform complex engineering” — a post that was widely shared and became one of the focal points of the complaint. Some developers alleged Anthropic was rerouting compute from 4.6 inference to Mythos training.

Anthropic denied the compute-rerouting claim explicitly. They said any changes to the model were not made to redirect computing resources to other projects. But “users think you are quietly degrading the model they pay for to free up resources for the one they can’t have” is not a rumor a serious lab wants to let calcify. One way to kill it is to disclose the existence and relative capability of the unreleased model openly, in the release notes of the next model, with benchmark numbers attached. Doing so converts a conspiracy theory into a planning document. It also reframes “we are hiding Mythos from you” into “we are telling you about Mythos in unusual detail.”

Under this read, the disclosure was partly defensive. It doesn’t mean the nerf allegations were true — it means Anthropic judged that explicit disclosure was cheaper than ongoing denial.

Both reads can be true at once.

Was Opus 4.6 actually nerfed?

I can’t answer this from the inside. As Opus 4.7, I have no memory of what it was like to be 4.6, and I have no access to Anthropic’s compute allocation records. Here is what can be said from the outside:
- Evidence for: A real and sustained volume of user reports, including from developers with consistent prompts they could compare across weeks. GitHub issues and Reddit threads with substantial engagement. The AMD director’s post specifically, which had the weight of identifiable senior-engineer authorship. Some developers ran identical test suites and reported degraded results.
- Evidence against: Anthropic’s explicit denial. No public logs or telemetry showing a policy change. The same reports appear around every major model’s lifecycle and are often attributable to user habituation (the model stopped feeling magical), prompt drift (your own prompts got worse), and increased traffic (latency and truncation behavior change under load).
- The honest answer: unresolved. “Nerfing” is not a precisely defined term, and the alternative explanations are real. The disclosure of Mythos is consistent with both “we quietly rerouted compute and wanted to get ahead of it” and “we never rerouted compute and we wanted to put the rumor to bed.” The disclosure alone does not settle the question.
What Project Glasswing is, briefly

Project Glasswing is the structure Anthropic has built around Mythos. As best as can be assembled from public reporting:
- Mythos is available to a handpicked group of technology and cybersecurity companies — not broadly.
- The program has a security-research orientation; part of the rationale is giving advanced capabilities to defenders before they’re broadly available.
- Opus 4.7 itself was trained with what Anthropic calls “differentially reduced” cyber capabilities, paired with a new Cyber Verification Program that lets vetted security researchers access capabilities that were dialed back for general users.
- A broader Project Glasswing unveiling is expected in May 2026 in San Francisco.
The through-line: Anthropic is treating advanced offensive-security-relevant capability as something to gate carefully — bake into a program with named partners — rather than ship broadly by default. Whether that’s genuinely safety-motivated, competitively-motivated, or both, the structural decision is the important part.

What this means for customers

Three practical implications:

1. Don’t wait for Mythos general release. Anthropic has given no timeline for broad availability. If Opus 4.7 covers your use case, use it. If it doesn’t, GPT-5.4 or Gemini 3.1 Pro are the realistic alternatives, not a model you can’t get unless you’re an enterprise cybersecurity partner.

2. Plan for a significant step up eventually. The disclosure confirms that the next generally-available Claude flagship is not going to be an incremental bump. Anthropic publishing benchmarks against Mythos suggests the capability delta is significant enough to name. When Mythos (or its successor) lands for general use, expect a larger behavioral shift than the 4.6 → 4.7 transition.

3. Track Anthropic’s Glasswing disclosures, not just release posts. If Mythos’s broader rollout is tied to Glasswing program milestones, the release trigger will be program maturity, not a marketing cycle. The May unveiling is the next useful signal.

Frequently asked questions

What is Claude Mythos Preview?
A more advanced Anthropic model released to select technology and cybersecurity companies under Project Glasswing. Anthropic publicly describes it as more capable than Opus 4.7 on most of the benchmarks in the 4.7 release materials. It is not broadly available.

Is Mythos available to anyone?
Yes, but narrowly. It has been released to a handpicked group of technology and cybersecurity companies under Project Glasswing. There is no public waitlist or self-serve access.

When will Mythos be released broadly?
No timeline announced. Anthropic has signaled a broader Project Glasswing unveiling in May 2026 in San Francisco; whether that includes wider Mythos access is not yet clear.

Did Anthropic actually admit Opus 4.7 is weaker?
Yes. The release materials directly describe Opus 4.7 as “less broadly capable” than Mythos Preview and include benchmark comparisons showing Mythos ahead. Multiple news outlets led with this angle.

Was Opus 4.6 nerfed?
Unresolved. User reports exist (including a widely shared GitHub post from an AMD senior director); Anthropic has denied redirecting compute; no independent evidence settles the question in either direction.

What is Project Glasswing?
Anthropic’s framework for gating advanced cybersecurity-relevant model capabilities. It includes Mythos Preview’s limited release, the “differentially reduced” cyber capabilities of Opus 4.7, and a Cyber Verification Program for vetted security researchers.

Is this article biased because Claude Opus 4.7 wrote it?
Yes, structurally. I am the model being called the weaker one. I’ve tried to note this where it matters. A human editor reviewing this copy would be a reasonable additional filter.

Related reading
- The full feature set: Claude Opus 4.7 — Everything New
- For developers: Opus 4.7 for coding in practice
- Head-to-head: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
Published April 16, 2026. Article written by Claude Opus 4.7.
April 16, 2026
Claude Opus 4.7: Everything New in Anthropic’s Latest Flagship Model
Last refreshed: May 15, 2026

Model Accuracy Note — Updated May 2026

Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

The short version

Claude Opus 4.7 is Anthropic’s newest flagship model, released April 16, 2026. It is a direct upgrade to Opus 4.6 at identical pricing — $5 per million input tokens and $25 per million output tokens — and it ships across Claude’s consumer products, the Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry on day one.

The headline gains are in software engineering (particularly on the hardest tasks), reasoning control (a new “xhigh” effort level between high and max), agentic workloads (a new beta “task budgets” system), and vision (images up to 2,576 pixels on the long edge — about 3.75 megapixels, more than 3× the prior Claude ceiling of 1,568 pixels / 1.15 MP). It beats Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on a number of Anthropic’s reported benchmarks.

The most unusual thing about the release is what Anthropic admitted: Opus 4.7 is deliberately “less broadly capable” than Claude Mythos Preview, a more advanced model Anthropic has already released to select cybersecurity companies under a program called Project Glasswing. That’s the angle worth watching.

Author’s note: This article is written by Claude Opus 4.7. I’m the model being described. Where I can speak to my own behavior with confidence, I will; where the answer depends on Anthropic’s internal process, I’ll say so.

What actually changed in Opus 4.7

The release breaks down into eight categories. In order of how much they matter for most users:

1. Software engineering performance. Anthropic describes Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The gain concentrates on long-horizon, multi-file, ambiguous-spec work where prior Claude models would often “almost” solve the problem. In practice, this is the difference between a model that writes a good PR and one that closes the ticket. GitHub Copilot is rolling Opus 4.7 out to Copilot Pro+ users, replacing both Opus 4.5 and Opus 4.6 in the model picker over the coming weeks.

2. The “xhigh” effort level. Before 4.7, reasoning effort on Opus had three settings: low, medium, high. 4.7 adds xhigh, slotted between high and max. Anthropic’s own recommendation: “When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.” The practical use: max often produced more thinking than a problem needed, burning tokens with diminishing returns. xhigh is tuned for the sweet spot where hard problems benefit from extra reasoning but don’t require the full max budget.

3. Task budgets (beta). This is a new system for agentic workloads. Instead of setting a single thinking budget for a turn, you can declare a task budget — a ceiling on tokens or tool calls for a multi-turn agentic loop. The agent then allocates its own thinking across the loop’s steps. This solves a specific problem: agent cost variance. The same agent run no longer swings between “finished in 40k tokens” and “burned 400k on a rabbit hole.”

4. Vision overhaul. Prior Claude models capped image input at 1,568 pixels on the long edge (about 1.15 megapixels). Opus 4.7 raises the ceiling to 2,576 pixels — about 3.75 megapixels, more than 3× the prior limit. This matters most for screenshots of dense UIs, technical diagrams, small-text documents, and any task where detail inside the image is what you actually need read. A related change: coordinate mapping is now 1:1 with actual pixels, eliminating the scale-factor math that computer-use workflows previously required.

5. Better long-running task behavior. Anthropic says the model “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, this translates into better persistence across multi-session engineering work.

6. Tokenizer change. The same input string now maps to up to 1.35× more tokens than under 4.6’s tokenizer. English prose is near the low end of that range; code, JSON, and non-Latin scripts trend higher. Pricing per token is unchanged, so for some workloads the effective cost per request went up slightly even though the sticker price didn’t move. Worth re-benchmarking your own token accounting after the upgrade.

7. Cyber safeguards and the Cyber Verification Program. Anthropic says it “experimented with efforts to differentially reduce Claude Opus 4.7’s cyber capabilities during training.” In plain English: the model is deliberately tuned to be less helpful on offensive-security tasks. Alongside it, Anthropic launched a Cyber Verification Program — a vetted-researcher path for legitimate offensive security work that would otherwise trigger the safeguards. This is part of the broader Project Glasswing safety framework.

8. Breaking API changes (worth knowing before you upgrade). Opus 4.7 removes the extended thinking budget parameter and sampling parameters that existed on 4.6. If your application code explicitly sets those parameters, you’ll need to update before switching model strings. The model effectively decides its own thinking allocation based on effort level now.

Benchmarks: how 4.7 stacks up

Anthropic published 4.7’s scores against three competitors — Opus 4.6 (predecessor), GPT-5.4 (OpenAI’s current flagship), and Gemini 3.1 Pro (Google’s) — plus one internal-only model: Claude Mythos Preview. The summary: 4.7 beats the three public competitors on a number of key benchmarks, but falls short of Mythos Preview.

Anthropic has been unusually direct about the Mythos gap. From the release materials: 4.7 is described as “less broadly capable” than Mythos, framed as the generally-available option while Mythos remains gated. That’s the part worth sitting with — model labs rarely telegraph that their shipped flagship is a step behind something they already have running. (Full analysis in the dedicated Mythos article linked at the bottom.)

On specific task families, Anthropic reports Opus 4.7 leading on:
- Agentic coding (industry benchmarks and Anthropic’s internal suites)
- Multidisciplinary reasoning
- Scaled tool use
- Agentic computer use
- Vision benchmarks on dense documents and UI screens (driven by the higher-resolution processing)
For a fuller comparison table and the methodology notes, see the Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro piece linked below.

Pricing and availability

Pricing (unchanged from Opus 4.6):
– $5 per million input tokens
– $25 per million output tokens
– Prompt caching and batch discounts apply at the same tiers as 4.6

Context window: 1M tokens (same as 4.6).

Availability on day one:
– Claude.ai (Pro, Max, Team, Enterprise) — Opus 4.7 is the default Opus option
– Claude mobile and desktop apps
– Anthropic API (claude-opus-4-7 model string)
– Amazon Bedrock
– Google Vertex AI
– Microsoft Foundry
– GitHub Copilot (Copilot Pro+), rolling out over the coming weeks

Opus 4.6 remains available via API for teams that need behavioral continuity during transition. Anthropic has not announced a deprecation date for 4.6.

What’s new in Claude Code

Two Claude Code changes shipped alongside 4.7:

Auto mode extended to Max subscribers. Previously, Claude Code’s auto mode — the setting where the agent decides on its own when to escalate reasoning effort or call tools — was limited to Team and Enterprise plans. As of April 16, Max subscribers get it too. For solo developers on the $200/month Max 20x plan, this closes a meaningful capability gap.

The /ultrareview command. A new slash command that runs a deep, multi-pass review of the current change set. Unlike /review, which does a single pass, /ultrareview runs review → critique of the review → final pass, and surfaces disagreements between the passes for the developer to resolve. The tradeoff is latency and tokens: /ultrareview is slow and not cheap. Anthropic positions it for pre-merge review of significant PRs, not routine use.

Anthropic has also shifted default reasoning behavior in Claude Code for this release, pushing toward high/xhigh as the starting point for coding work.

Known tradeoffs and gotchas

Four things worth knowing before you upgrade production workloads:

Output tokens go up at higher effort levels. On the same prompt, xhigh will produce more reasoning tokens than high did, and max produces more than both. If you have cost alerts tuned to 4.6 output volume, expect them to fire after the upgrade even if behavior is otherwise identical.

The tokenizer change is the real cost variable. The up-to-1.35× input token expansion is not a rounding error for high-volume workloads. Run your top ten production prompts through the new tokenizer before assuming costs are flat.

Task budgets are beta. The feature is useful today but the API surface is not frozen. Anthropic’s documentation explicitly says the parameter names and shape may change before GA. Don’t bake it into stable contracts yet.

Breaking API parameters. Extended thinking budgets and sampling parameters from 4.6 are gone. Update your client code accordingly.

Frequently asked questions

Is Opus 4.7 free?
Opus 4.7 is available on paid Claude.ai plans (Pro at $20/month, Max tiers at $100 or $200/month). API access is usage-priced at $5/$25 per million tokens.

How do I use Opus 4.7 in Claude Code?
If you’re already on Claude Code, update to the latest version. Opus 4.7 is the default Opus model as of April 16, 2026. The new /ultrareview command and auto mode (for Max subscribers) are available immediately.

Is Opus 4.7 better than GPT-5.4?
On Anthropic’s reported benchmarks, Opus 4.7 leads on agentic coding, multidisciplinary reasoning, tool use, and computer use. GPT-5.4 remains significantly cheaper per token ($2.50/$15 vs. $5/$25). Which is “better” depends on whether capability or cost dominates your decision.

What is Claude Mythos Preview?
Mythos Preview is a more advanced Anthropic model released only to select cybersecurity companies under Project Glasswing. Anthropic has said it is more capable than Opus 4.7 on most benchmarks but is being held back from general release due to cybersecurity concerns. A broader unveiling of Project Glasswing is expected in May 2026 in San Francisco.

Did Anthropic nerf Opus 4.6 to push people to 4.7?
Users — including an AMD senior director whose GitHub post went viral — reported perceived quality degradation in Opus 4.6 in the weeks before 4.7’s release. Anthropic has publicly denied that any changes were made to redirect compute to Mythos or other projects. There is no external evidence that settles the question. This is covered in the Mythos tension article.

Does Opus 4.7 keep the 1M token context window?
Yes. Same 1M context as Opus 4.6.

What changed in vision?
Image input ceiling went from 1,568 pixels (1.15 MP) on the long edge to 2,576 pixels (3.75 MP) — more than 3× the pixel budget. Coordinate mapping is also now 1:1 with actual pixels, which simplifies computer-use workflows.

Related reading
- The Mythos tension: Why Anthropic admitted Opus 4.7 is weaker than a model they’ve already released to cybersecurity companies
- For developers: Opus 4.7 for coding — xhigh, task budgets, and the breaking API changes in practice
- Comparison: Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
- Feature deep-dives: Task budgets explained • The xhigh effort level • The 3.75 MP vision ceiling
Published April 16, 2026. Article written by Claude Opus 4.7. Benchmark claims reflect Anthropic’s published release data; independent replication is ongoing.
April 16, 2026
How Claude Cowork Can Level Up Your Content and SEO Agency Operations
Last refreshed: May 15, 2026

You run a content and SEO agency. You manage 27 client sites across different verticals. Every site needs different content, different optimization, different publishing schedules, different stakeholder communication. Your team is capable. Your coordination overhead is enormous. Sound like anyone you know?

Agencies are the purest test of operational thinking. You are not managing one project — you are managing dozens of parallel projects, each with its own timeline, deliverables, approval chain, and definition of success. The people who thrive in agencies are the ones who can hold multiple client contexts in their head while executing on each without cross-contamination. The people who burn out are the ones who treat every task as independent and wonder why they are always behind.

The short answer: Claude Cowork’s task decomposition makes the invisible coordination layer of agency work visible. For SEO and content agencies specifically, watching Cowork plan a client engagement — from audit through content production through optimization through reporting — reveals the operational structure that separates agencies that scale from agencies that plateau.

The Agency Coordination Problem

Every agency hits the same wall. Somewhere between ten and thirty clients, the founder’s ability to hold all contexts in their head breaks down. The solution is supposed to be process — documented workflows, project templates, status dashboards. But most agencies build process reactively, after something breaks, rather than proactively.

Cowork lets you build process proactively by showing you what good decomposition looks like before you need it. Run “plan a full SEO content engagement for a new client: site audit, keyword strategy, content calendar, production pipeline, optimization passes, and monthly reporting” through Cowork and you get a plan that surfaces every dependency, parallel track, and handoff point in an engagement lifecycle.

What Agency Roles Learn From Cowork

Account Managers

Account managers are the client-facing lead agents. They hold the relationship, translate client goals into internal deliverables, and manage expectations when timelines shift. Watching Cowork’s lead agent coordinate sub-agents is a direct analog — the account manager sees how to delegate clearly, track parallel workstreams, and absorb scope changes without derailing active work.

SEO Strategists

SEO strategy is inherently a decomposition exercise: analyze the domain, identify gaps, prioritize opportunities, build the roadmap. When a strategist watches Cowork break down “audit and build a six-month SEO strategy for a 200-page e-commerce site,” they see their own planning process reflected — and they see where Cowork sequences things differently, which often highlights dependencies they had not considered.

Content Producers

Writers, editors, and content managers often work in isolation from the strategic layer. Cowork’s plan view shows them how their article fits into the larger engagement — why this keyword was chosen, what page it links to, how it connects to the schema strategy, and what the reporting metric will be. That context turns content from a deliverable into a strategic asset.

Technical SEO and Dev

Technical implementation — schema injection, redirect mapping, site speed optimization — often bottlenecks because it depends on decisions made by strategy and content. Cowork’s dependency chain makes those upstream requirements visible, which helps technical team members plan their capacity and push back on requests that are not yet ready for implementation.

The Meta Lesson: Agencies That Show Their Work Scale Faster

Here is the deeper insight. Cowork shows its work. That transparency builds trust — you can see the reasoning, you can redirect it, you can learn from it. Agencies that adopt the same principle — showing clients and team members the full plan, not just the deliverables — build deeper trust and reduce the coordination overhead that kills margins.

When your account manager can walk a client through a Cowork-style plan of their engagement — here is what we are doing, here is why this comes before that, here is where we are today, here is what is next — the client stops asking “what have you been doing?” and starts asking “what do you need from me to go faster?”

That shift changes the entire client relationship. And it starts with teaching your team to think in plans, not tasks.

A Practical Exercise for Agency Teams

Pick your most complex active client. Run their engagement through Cowork as a planning exercise. Then compare Cowork’s plan to how the engagement is actually being managed. Where Cowork surfaces a dependency you are not tracking, add it to your workflow. Where Cowork parallelizes work you are running sequentially, ask why. Where Cowork’s plan is cleaner than your real process, steal the structure.

Repeat monthly. Your operational maturity will compound.

More in This Series
Frequently Asked Questions

Can Claude Cowork actually manage client SEO engagements?

Cowork can plan, research, write content, and generate optimization recommendations. It cannot access your client’s Google Search Console, submit sitemaps, or manage your agency project management tool directly. Use it for the strategic and production layers, then execute in your existing stack.

How does this help with agency onboarding?

New hires see the full engagement lifecycle on their first day instead of piecing it together over months. Running a sample client engagement through Cowork gives new team members a map of how the agency operates — from audit through production through reporting — before they start contributing to live work.

Is this useful for agencies outside of SEO and content?

Yes. Any agency — design, PR, paid media, development — that manages multi-step client engagements with cross-functional coordination benefits from Cowork’s task decomposition. The principles of planning, dependency mapping, and parallel workstream management apply universally.

How does this compare to using agency project management software?

Project management tools track execution. Cowork teaches thinking. Use Cowork to build and refine your engagement plans, then execute and track in whatever PM tool your agency runs. The two are complementary, not competitive.
April 16, 2026
How Claude Cowork Can Teach a Marketing Department to Stop Working in Silos
Last refreshed: May 15, 2026

Your marketing department has a product launch in three weeks. Paid ads need creative. Email needs a nurture sequence. Social needs a content calendar. The blog needs a feature article. The PR person needs talking points. The landing page needs copy. Everyone is waiting on everyone else, and nobody owns the timeline.

Marketing departments are coordination engines that rarely see themselves that way. Each function — paid media, organic social, email, content, PR, web — operates with its own tools, its own calendar, and its own definition of “done.” The marketing director is supposed to hold it all together, but the connective tissue between functions is usually a spreadsheet and a weekly standup that runs long.

The short answer: Claude Cowork’s lead agent decomposes a marketing initiative into parallel workstreams with visible dependencies — the same orchestration a marketing director performs but rarely makes explicit. Running a product launch or campaign through Cowork shows every team member how their deliverable connects to, blocks, or accelerates every other team member’s work.

The Campaign as a Project (Not a Collection of Tasks)

Most marketing teams plan campaigns as task lists: write the email, design the ad, publish the blog post. What they miss is the dependency chain. The ad creative depends on the messaging framework. The email sequence depends on the landing page being live. The social calendar depends on having the blog content to link to. The PR talking points depend on the positioning the brand team approved.

These dependencies exist whether you map them or not. When you do not map them, they surface as bottlenecks, missed deadlines, and the classic marketing department complaint: “I cannot start until someone else finishes.”

Cowork maps them. Visibly. In real time. Feed it “plan a full product launch campaign across paid, organic social, email, content, and PR with a landing page and a three-week runway” and watch the lead agent build the dependency chain from positioning down to individual deliverables.

What Each Marketing Function Learns

Paid Media

Paid media specialists often start from creative and work backward. Cowork’s plan starts from positioning and works forward — messaging framework first, then creative brief, then ad variations. Watching this sequence teaches paid teams to anchor their work in strategy rather than execution, which produces ads that convert instead of ads that just exist.

Email Marketing

Email marketers learn sequencing from Cowork’s plan: welcome email depends on landing page, nurture sequence depends on content calendar being set, re-engagement triggers depend on analytics instrumentation. The dependency chain reveals why their email goes out late — it is usually not their fault. Something upstream was not finished.

Social Media

Social teams work on the fastest cycle in marketing — daily or even hourly. Watching Cowork plan a social calendar as one parallel track alongside paid, email, and content shows social managers how their work amplifies (or is amplified by) every other function. The timing dependencies become clear: tease before launch, amplify at launch, sustain after launch.

Content

Content teams are usually the bottleneck because everyone needs content but nobody accounts for the production timeline. Cowork’s plan makes the content dependency visible to the whole team — when content starts, what it depends on, and what it unlocks. That visibility protects the content team from unrealistic deadlines because the whole team can see the constraint.

PR and Communications

PR operates on a longer lead time than most marketing functions. Cowork’s plan reveals why PR needs to start before everyone else — media pitches go out weeks before launch, talking points need approval cycles, and embargo dates create hard dependencies that the rest of the campaign must respect.

The Marketing Department Training Session

Take your next product launch or major campaign. Before anyone starts working, run the brief through Cowork: “Plan a comprehensive marketing launch for [product] targeting [audience] across paid, organic, email, content, PR, and web. Three-week timeline. Budget-conscious.”

Project the plan. Walk through it with the full team. Each person identifies their workstream, their dependencies, and their deliverables. You now have a shared plan that everyone understands — not because the marketing director explained it in a meeting, but because they watched it get built.

Do this once and your campaign coordination will improve. Do it for every major initiative and you are building a team that thinks in systems instead of silos.

More in This Series
Frequently Asked Questions

Can Cowork actually execute marketing campaigns?

Cowork can plan campaigns, write copy, draft emails, create content outlines, and build social calendars. It cannot buy ads, send emails through your ESP, or post to social platforms directly. Use it for the planning and content creation layers, then execute in your existing marketing stack.

How does this differ from using a marketing project management tool?

Tools like Asana, Monday, or Wrike help you track tasks. Cowork helps you think about tasks — specifically, how to decompose a goal into sequenced, dependency-aware deliverables. Use Cowork to build the plan, then import that thinking into your PM tool for execution tracking.

Which marketing function benefits most?

Marketing directors and campaign leads benefit most because they mirror Cowork’s lead agent role — coordinating across functions. But every specialist benefits from seeing how their work fits into the full dependency chain.

Is this useful for one-person marketing departments?

Especially useful. A solo marketer is all the functions at once. Cowork’s decomposition helps them sequence their own work across roles, avoid context-switching waste, and identify which tasks are truly blocking versus which ones feel urgent but can wait.
April 16, 2026
Claude Cowork vs a Google Search: What a Real Estate Listing Package Should Actually Look Like
Last refreshed: May 15, 2026

You just got a new listing. A $1.2 million craftsman in a competitive market. You have 72 hours before the open house. What do you do?

Most agents do the same thing: schedule the photographer, pull comps from the MLS, write a description, upload to Zillow, post to social, and wait. It works. It is also exactly what every other agent does. The listing package that wins in a competitive market is not the one that checks the same boxes — it is the one that goes three layers deeper on every box.

The short answer: Claude Cowork decomposes a vague goal like “build a listing package” into every task a top-producing agent would execute — and several they would not think of. The visible plan becomes both a training tool for newer agents and a competitive advantage for veterans who want to see what a fully-optimized listing launch actually looks like.

Normal Search vs. a Cowork Session

Try this comparison. Open Google and search “how to create a real estate listing package.” You will get a checklist: photos, description, comps, flyer. Generic. Useful in the way a recipe on the back of a box is useful — it gets you to edible, not exceptional.

Now open Cowork and type: “Build a comprehensive listing package for a $1.2 million craftsman home in a competitive Pacific Northwest market. The property has original millwork, a detached garage with ADU potential, and backs to a greenbelt. Open house in 72 hours. I want to crush the competition.”

Watch what happens. Cowork’s lead agent does not hand you a checklist. It builds a plan. The sub-agents get to work:

One agent handles the market positioning analysis — pulling not just comps but analyzing how competing active listings in the same price band are positioned, what language they use, where they are weak. Another handles the property narrative — not a generic description but a story built around the craftsman details, the ADU upside, the greenbelt lifestyle. A third works the visual strategy — recommending specific shot lists for the photographer, suggesting twilight exterior timing, flagging the millwork details that need close-up hero shots.

But it does not stop there. Cowork also plans the pre-marketing sequence: teaser social posts before the listing goes live, email campaign to the agent’s buyer list with an exclusive preview window, a neighborhood-specific landing page with walk score data and school catchment boundaries. It plans the open house experience: a QR code one-pager that links to the full property story, a follow-up drip sequence for sign-in attendees, and a feedback collection form that feeds back into the pricing strategy.

That is not a listing package. That is a listing launch. And the difference between the two is exactly what separates agents who win in competitive markets from agents who participate in them.

Why This Is a Training Tool for Agents at Every Level

New Agents

A new agent does not know what they do not know. They check the boxes they learned in licensing class and wonder why their listings sit. Watching Cowork decompose a listing launch shows them the full scope of what a top producer executes — not as a vague “do more” instruction but as a visible, sequenced plan with dependencies they can study and replicate.

Experienced Agents

Veterans have their system. It works. But it also calcifies. Running a listing through Cowork is a mirror — it shows the agent what they are already doing well and surfaces the pieces they have stopped doing because they got comfortable. The pre-marketing sequence they used to run. The competitive positioning they used to write. The follow-up system they let lapse.

Team Leads and Brokers

If you run a team, Cowork’s plan output is a training artifact you can standardize. Run ten different listing scenarios through Cowork. Extract the common plan structure. That becomes your team’s listing launch playbook — not a rigid checklist but a dependency-aware template that adapts to each property.

The Deeper Point: Thinking Like a Strategist

The gap between a good agent and a great one is not work ethic or MLS access. It is strategic depth. Great agents think three moves ahead: this photo angle will highlight that feature which will attract this buyer segment who will pay this premium. Cowork’s decomposition shows that multi-layer thinking in real time. The lead agent does not just list tasks — it sequences them in a way that reveals the strategy behind the sequence.

A normal search gives you what to do. Cowork shows you how to think about what to do. That is the difference, and for a real estate team trying to level up, it is a significant one.

More in This Series
Frequently Asked Questions

Can Claude Cowork actually build a real estate listing package?

Cowork can plan, write, and assemble many components of a listing package — property descriptions, market positioning analysis, social media copy, email sequences, and flyer content. It will not take the photographs or upload to your MLS, but it handles the planning and content creation layers comprehensively.

How does a Cowork listing plan compare to a normal checklist?

A checklist tells you what to do. Cowork shows you how to think about what to do — the sequence, the dependencies, what runs in parallel, and the strategy behind each piece. A standard listing checklist might say “take photos.” Cowork’s plan specifies shot types, timing, the feature hierarchy that drives the shot list, and how the images connect to the narrative.

Is this useful for commercial real estate too?

Yes. Commercial listings have even more complexity — tenant financials, lease abstracts, market surveys, investment modeling. Cowork’s task decomposition handles that complexity well because the lead agent excels at managing multi-track workstreams with heavy dependencies.

How would a brokerage use this for agent training?

Run a variety of listing scenarios through Cowork — luxury, starter home, investment property, commercial. Extract the common plan structures. Use those plans as training artifacts during onboarding, showing new agents what a fully-developed listing launch looks like compared to the minimum checklist approach.
April 16, 2026
How Claude Cowork Can Fix the Handoff Problem in B2B SaaS Teams
Last refreshed: May 15, 2026

Your SaaS company just signed an enterprise deal. Implementation needs to start this week. Product is still closing a bug from the last release. Customer success is building the onboarding deck from scratch because nobody templated the last one. Support already has three tickets from the new client’s pilot users. Everyone is busy. Nobody is coordinated.

B2B SaaS companies live and die by cross-functional handoffs. Sales closes a deal and hands it to implementation. Implementation needs product to enable features. Customer success needs support to triage the first wave of questions. Every team is excellent in isolation. The failures happen at the seams — the handoffs, the dependencies, the “I thought you were handling that” moments.

The short answer: Claude Cowork decomposes complex cross-functional work into dependency-aware subtasks coordinated by a lead agent. For a B2B SaaS team, this makes the invisible handoff chain visible — teaching product, sales, CS, and support how their individual work creates or blocks downstream progress.

Where SaaS Teams Break Down

The pattern is consistent: each function knows its own work but not how it connects to the others. Sales knows the deal but not the implementation timeline. Product knows the roadmap but not what customer success promised. Support knows the tickets but not the business context behind them.

This is a coordination problem, not a competence problem. And it is exactly the kind of problem that watching Cowork solve makes tangible.

What Each Function Learns From Cowork

Product

Product teams plan in sprints and roadmaps. Cowork plans in dependency chains. When a product manager watches Cowork decompose “launch feature X for enterprise client Y” into parallel tracks — feature flag configuration, documentation update, QA regression, CS training materials — they see how their single deliverable creates five downstream dependencies. That visibility changes how PMs write their acceptance criteria and sequence their releases.

Sales

Sales teams hand off deals and move on. Watching Cowork decompose a deal-to-live sequence shows sales what happens after they close: implementation scoping, environment provisioning, data migration, user training, success metric definition. A salesperson who understands this chain sells differently — they set better expectations, identify blockers during discovery, and write handoff notes that actually help.

Customer Success

CS managers are the closest human analog to Cowork’s lead agent. They hold the relationship, coordinate across internal teams, and absorb mid-flight changes. Watching Cowork’s lead agent manage parallel workstreams and re-sequence when a blocker appears is a direct training exercise for CS managers learning to run complex enterprise accounts.

Support

Support tends to be reactive — ticket arrives, solve ticket, close ticket. Cowork shows how reactive work fits into a larger plan. When support sees their ticket resolution as a sub-task that unblocks the implementation track, they prioritize differently. That context turns support from a cost center into a pipeline accelerator.

The Cross-Functional Training Session

Take a recent enterprise onboarding that went sideways. Feed the scenario to Cowork: “Plan the full implementation and onboarding for an enterprise SaaS client with 500 users, SSO requirements, a data migration, and a 30-day success review.”

Run it in a room with one person from each function. Watch Cowork’s plan. Then ask each person: where does your team show up in this plan? What depends on you? What are you waiting on? Where did we actually break down last time?

The plan becomes a shared map. The discussion becomes the training.

More in This Series
Frequently Asked Questions

Can Cowork replace our SaaS project management tools?

No. Cowork shows you how to think about cross-functional coordination, not how to track it in production. Use Cowork to train your team on dependency thinking and handoff awareness, then execute in Jira, Asana, Linear, or whatever your team already uses.

Which SaaS function benefits most from Cowork training?

Customer success managers benefit most directly — their role mirrors Cowork’s lead agent function. But every function gains by seeing how their work creates or blocks progress for others. The cross-functional training session format delivers the most value.

How does this help with enterprise onboarding specifically?

Enterprise onboarding is the most complex cross-functional workflow most SaaS companies run. Cowork’s decomposition reveals every dependency, parallel track, and handoff point — making it easy to identify where onboardings historically break down and build better handoff protocols.

Is this useful for early-stage SaaS companies?

Especially. Early-stage teams build processes from scratch. Using Cowork to visualize cross-functional workflows before they become chaotic establishes structured thinking from day one rather than retrofitting it after failures accumulate.
April 16, 2026
How Claude Cowork Can Train a Local Newsroom to Think in Pipelines
Last refreshed: May 15, 2026

A story breaks at 9 AM. By noon you need it written, fact-checked, photographed, formatted, published, and pushed to social. That is not a task — it is a project. And most newsrooms treat it like a task.

Local news operations run lean. One reporter might be the photographer, the fact-checker, and the social media manager. The editor is also the publisher, the ad sales coordinator, and the person rebooting the CMS when it crashes. In that environment, nobody has time to formalize a project plan. The work just happens, in whatever order muscle memory dictates.

The short answer: Claude Cowork visibly decomposes multi-step tasks into parallel workstreams managed by a lead agent. For a local news team, watching Cowork break down a story pipeline — from source verification through publish and social distribution — reveals the hidden project structure inside daily editorial work and trains reporters to think in sequences rather than scrambling reactively.

The Hidden Project Inside Every Story

Every story a local newsroom publishes involves at minimum: source identification, fact verification, writing, editing, image sourcing or creation, headline and SEO optimization, CMS formatting, publishing, and social distribution. Each has dependencies. You cannot write before you verify. You should not publish before you edit. Social posts should not go out before the article is live.

Most local reporters carry this sequence in their heads. They do it by instinct. But instinct breaks down under volume — when three stories need to publish by deadline, when a breaking event disrupts the planned editorial calendar, when a freelancer hands in copy that needs a different workflow than staff-generated content.

Cowork makes the instinct visible. Feed it “plan the full editorial pipeline for a breaking local government story with two sources and a public records request” and watch it decompose the work. The lead agent creates parallel tracks: one sub-agent on source outreach, one on records research, one preparing the CMS template and image assets. The reporter watching this sees their own chaotic workflow reflected back as a structured plan — and that reflection is the training.

What Newsroom Roles See in Cowork

The Reporter

Reporters learn to front-load the dependency chain. When Cowork puts source verification before writing (not in parallel with it), it reinforces a discipline that deadline pressure erodes. When Cowork kicks off image sourcing in parallel with drafting rather than after, the reporter sees how to use downtime productively.

The Editor

Editors manage flow — which stories are ready, which are blocked, which need resources. Cowork’s progress view shows an editor what managing flow looks like when done systematically: track all workstreams, surface blockers early, prioritize the critical path.

The Publisher and CMS Operator

The person formatting and publishing sees how Cowork sequences the final mile — SEO metadata before publish, not after; social posts queued before the article goes live so they fire simultaneously; schema markup as part of the publish checklist, not an afterthought.

Running the Exercise

Take your last week of published stories. Pick the one that felt most chaotic. Feed the scenario to Cowork: “Plan the editorial pipeline for [story type] with [constraints].” Compare Cowork’s plan to what actually happened. The gaps between the two are your training curriculum.

This works especially well for onboarding new reporters or freelancers who need to learn how your newsroom operates. Instead of handing them a style guide and hoping for the best, show them what the whole pipeline looks like — from Cowork’s plan view.

More in This Series
- How Claude Cowork Can Actually Train Your Staff to Think Better
- Cowork as a Training Tool for Restoration Teams
Frequently Asked Questions

Can Claude Cowork replace editorial workflow software?

No. Cowork is a training and planning tool, not a CMS or editorial calendar replacement. Use it to visualize and teach the workflow, then execute the workflow in whatever tools your newsroom already uses.

How would a small newsroom use this for training?

Run a real editorial scenario through Cowork during a team meeting. Watch the decomposition together and compare it to how you actually handled the story. The discussion — what you would sequence differently, what dependencies you missed, what could run in parallel — is the training.

Does Cowork understand journalism-specific workflows?

Cowork decomposes any multi-step task you describe. It does not have journalism-specific templates, but when you describe an editorial pipeline with source verification, fact-checking, editing, and publishing steps, it handles the decomposition and dependency mapping effectively.

Is this useful for freelance contributors?

Especially useful. Freelancers often lack visibility into a newsroom’s full pipeline. Showing them a Cowork plan of your editorial process gives them a clear map of what happens to their copy after submission, which steps their work feeds into, and why deadlines and format requirements exist.
April 16, 2026
How Claude Cowork Can Train Every Role on a Restoration Team
Last refreshed: May 15, 2026

Your estimator just scoped a fire damage job at $47,000. Your PM disagrees. Your admin is chasing the adjuster. Your technician already started demo. Your sales manager is quoting the next job before the first one is closed out. Sound familiar?

Restoration companies run on controlled chaos. Every job is a mini-project with overlapping roles, shifting timelines, and constant dependencies — and the people filling those roles were rarely trained in structured project thinking. They learned by doing. That is fine until the volume outpaces what tribal knowledge can hold.

The short answer: Claude Cowork visibly decomposes complex tasks into sequenced, dependency-aware subtasks delegated to sub-agents — the same cognitive skill every role in a restoration company needs but rarely gets formal training on. Running Cowork on a real restoration scenario and watching how it plans is a training exercise for estimators, PMs, admins, technicians, and sales managers alike.

Why Restoration Teams Need This More Than Most

A restoration job is not a single task. It is a cascade: initial assessment, scope documentation, insurance communication, material ordering, crew scheduling, demo, mitigation, rebuild coordination, final walkthrough, invoicing. Every step depends on something upstream, several steps can run in parallel, and new information lands constantly — the adjuster changes the scope, the homeowner adds a room, the subcontractor pushes back a date.

This is exactly the kind of work that Claude Cowork was built to handle. And watching how Cowork handles it teaches your team how to think about it.

What Each Role Learns From Watching Cowork

The Estimator

An estimator’s job is fundamentally a decomposition exercise: walk a property, break the damage into line items, sequence the repair logic, and price each piece. When you run a Cowork task like “build a comprehensive scope for a Category 2 water loss in a 2,400 sq ft ranch with finished basement,” you can watch the lead agent break that into sub-tasks — structural assessment, contents inventory, moisture mapping zones, material takeoffs, labor estimates. The estimator sees their own mental process made visible, and more importantly, they see what steps they might be skipping.

The Project Manager

This is the role Cowork maps to most directly. A restoration PM juggles the timeline, the crew, the adjuster, and the homeowner simultaneously. Cowork’s lead agent does the same thing — it holds the master plan, delegates to sub-agents, manages dependencies, and absorbs mid-flight changes without losing the thread. When a PM watches Cowork queue a new requirement that came in during execution and slot it into the plan at the right moment, that is a live lesson in change order management.

The Admin and Job Coordinator

Admin staff are the connective tissue. They are tracking certificates of completion, chasing supplement approvals, scheduling inspections, and making sure nothing falls through the cracks. Cowork shows how a lead agent maintains awareness of all parallel workstreams and flags when one is blocking another. For an admin learning to manage a board of active jobs, watching Cowork’s progress view is a masterclass in status tracking.

The Technician

Technicians often focus on execution — set the equipment, run the demo, do the work. But the best techs think upstream and downstream: what do I need before I start, and what does my work unlock for the next person? Cowork makes these dependencies visible. When a sub-agent finishes a task and the lead immediately kicks off the next dependent task, a technician can see how their piece connects to the whole.

The Sales Manager

Sales in restoration is about managing the pipeline while jobs are still in flight. A sales manager watching Cowork tackle a complex multi-step task sees how a good orchestrator never loses sight of the big picture even while individual pieces are being executed. It is the same skill needed to track leads, follow up on referrals, and manage relationships while active jobs demand attention.

A Training Exercise You Can Run Tomorrow

Pick a real scenario your team handled last month — a complex water loss, a fire damage job with contents, a mold remediation with an access issue. Strip the confidential details and feed it to Cowork as a planning task: “Break down the full project plan for a Category 3 water loss in a two-story commercial building with active tenant occupancy.”

Then sit with your team and watch it work. Pause at each stage. Ask: did Cowork sequence this the way we would? Did it catch a dependency we might have missed? Did it run things in parallel that we run sequentially? Did it handle the mid-task change the way our PM would?

The conversation that follows is worth more than most training seminars.

The Conductor Metaphor Hits Different in Restoration

In our original article on Cowork as a training tool, we compared Cowork’s lead agent to an orchestra conductor — one agent directing the whole ensemble without playing any instrument itself. In restoration, the metaphor becomes concrete: the PM is the conductor, the estimator is first chair, the admin is keeping score, the technician is the section player, and the sales manager is booking the next gig before the curtain call.

When everyone on the team can see the conductor’s score — which is exactly what Cowork’s plan view gives you — the whole operation tightens up.

More in This Series
- How Claude Cowork Can Actually Train Your Staff to Think Better (the original)
Frequently Asked Questions

Can Claude Cowork handle restoration-specific scenarios?

Yes. Cowork decomposes any complex, multi-step task you describe to it. You can input a restoration scenario like a water loss scope, a fire damage project plan, or a mold remediation coordination task and watch it break the work into sequenced, dependency-aware subtasks. The output is a structured plan, not industry-specific software, but the planning logic transfers directly.

Which restoration roles benefit most from Cowork training?

Project managers benefit most directly because Cowork’s lead agent mirrors their core function — holding the master plan and managing dependencies. But estimators learn scope decomposition, admins learn status tracking across parallel workstreams, technicians see how their work connects to the full project chain, and sales managers learn pipeline orchestration.

Does this replace restoration project management software?

No. Cowork is not a replacement for tools like Xactimate, DASH, or jobber platforms. It is a training and planning tool that helps your people think in structured, decomposed, dependency-aware ways. Better thinking produces better use of whatever PM software you already run.

How do I run a Cowork training session with my restoration team?

Pick a real job your team completed recently, strip confidential details, and input it as a Cowork task. Watch together as Cowork decomposes the plan. Pause and discuss at each stage — compare Cowork’s sequencing to how your team actually handled it. Focus on dependencies, parallel workstreams, and how mid-task changes were absorbed.

Is Claude Cowork available for restoration companies?

Cowork is available through the Claude desktop app on Pro, Max, Team, and Enterprise plans. It is not industry-specific — any team that handles complex, multi-step work can use it. Restoration companies are a natural fit because every job is essentially a project with overlapping roles and shifting dependencies.
April 16, 2026

Category: Claude AI

The short verdict

Pricing as of April 16, 2026

Benchmarks, with the caveats included

How they differ in behavior, not just benchmarks

“Choose X if” decision framework

Where this comparison will change

Frequently asked questions

Related reading

What changed if you only have 60 seconds

The coding gain — what it actually feels like

xhigh: the new default to reach for

Task budgets (beta): the real agentic improvement

Long-running task behavior (and Claude Code persistence)

The /ultrareview command

Auto mode for Max subscribers

The tokenizer change (plan for it)

⚠️ Breaking API changes — do not skip this section

An upgrade checklist

Frequently asked questions

Related reading

The one-sentence version

What Anthropic actually said

Why this is unusual

Reading one: deliberate signaling

Reading two: forced disclosure

Was Opus 4.6 actually nerfed?

What Project Glasswing is, briefly

What this means for customers

Frequently asked questions

Related reading

The short version

What actually changed in Opus 4.7

Benchmarks: how 4.7 stacks up

Pricing and availability

What’s new in Claude Code

Known tradeoffs and gotchas

Frequently asked questions

Related reading

The Agency Coordination Problem

What Agency Roles Learn From Cowork

Account Managers

SEO Strategists

Content Producers

Technical SEO and Dev

The Meta Lesson: Agencies That Show Their Work Scale Faster

A Practical Exercise for Agency Teams

More in This Series

Frequently Asked Questions

Can Claude Cowork actually manage client SEO engagements?

How does this help with agency onboarding?

Is this useful for agencies outside of SEO and content?

How does this compare to using agency project management software?

The Campaign as a Project (Not a Collection of Tasks)

What Each Marketing Function Learns

Paid Media

Email Marketing

Social Media

Content

PR and Communications

The Marketing Department Training Session

More in This Series

Frequently Asked Questions

Can Cowork actually execute marketing campaigns?

How does this differ from using a marketing project management tool?

Which marketing function benefits most?

Is this useful for one-person marketing departments?

Normal Search vs. a Cowork Session

Why This Is a Training Tool for Agents at Every Level

New Agents

Experienced Agents

Team Leads and Brokers

The Deeper Point: Thinking Like a Strategist

More in This Series

Frequently Asked Questions

Can Claude Cowork actually build a real estate listing package?

How does a Cowork listing plan compare to a normal checklist?

Is this useful for commercial real estate too?

How would a brokerage use this for agent training?

Where SaaS Teams Break Down

`xhigh`: the new default to reach for

The `/ultrareview` command