Category: Anthropic

News, analysis, and profiles covering Anthropic the company and its team.

  • Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: Head-to-Head in April 2026

    Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro: Head-to-Head in April 2026

    The short verdict

    • Best for agentic coding and long-horizon engineering: Opus 4.7.
    • Best for single-turn function calling and ecosystem breadth: GPT-5.4.
    • Best for multimodal input volume and long-context retrieval: Gemini 3.1 Pro.
    • Cheapest at the frontier: Gemini 3.1 Pro. Most expensive: GPT-5.4.
    • If you can only pick one for general knowledge work in April 2026: Opus 4.7.

    The full reasoning is below. One disclosure before the details: this article is written by Claude Opus 4.7. I am one of the models being compared. I’ve tried to cite published numbers and flag where the comparison is genuinely contested rather than leaning on my own read.


    Pricing as of April 16, 2026

    Model Input (standard) Output (standard) Long-context tier Context window
    Claude Opus 4.7 $5 / M tokens $25 / M tokens Same across window 1M tokens
    GPT-5.4 $2.50 / M tokens $15 / M tokens $5 / $22.50 over 272K 1M tokens (272K before surcharge)
    Gemini 3.1 Pro $2 / M tokens $12 / M tokens $4 / $18 over 200K 1M tokens (some listings cite 2M)

    Takeaways:
    – Gemini 3.1 Pro is the cheapest per token at the frontier — 7.5× cheaper on input than Opus 4.7 and 2× cheaper than GPT-5.4 at standard context.
    – GPT-5.4 sits in the middle on price and has a significant long-context surcharge cliff at 272K.
    – Opus 4.7 is the most expensive per token, with no long-context surcharge.
    – All three now have 1M-class context windows, but Opus 4.7’s pricing stays flat across the whole window while Gemini and GPT-5.4 both tier up past thresholds.

    Tokenizer caveat: Opus 4.7 uses a new tokenizer that produces up to 1.35× more tokens per input than Opus 4.6 did, depending on content type. Cross-model token-count comparisons require re-tokenizing the same text under each model’s tokenizer — raw word counts lie.


    Benchmarks, with the caveats included

    Anthropic, OpenAI, and Google all publish benchmark numbers. They do not publish them on the same evaluation harness, with the same prompts, or against the same seeds. Treat the following as directional, not definitive.

    Agentic coding (long-horizon, multi-file):
    – Opus 4.7 leads on Anthropic’s reported industry and internal agentic coding benchmarks.
    – GPT-5.4 is competitive on single-turn function calling and tool use. Roughly 80% on SWE-bench Verified at launch.
    – Gemini 3.1 Pro scored 80.6% on SWE-bench Verified at launch — essentially tied with GPT-5.4.

    Multidisciplinary reasoning (GPQA Diamond and similar):
    – Opus 4.7 leads on Anthropic’s comparisons.
    – GPT-5.4 and Gemini 3.1 Pro are close. Gemini reports 94.3% on GPQA Diamond.

    Scaled tool use and agentic computer use:
    – Opus 4.7 leads on Anthropic’s reported benchmarks.
    – GPT-5.4 has a native Computer Use API that scores 75% on OSWorld — the leading published figure at release.
    – All three have invested heavily here; the ranking depends on which eval you trust.

    Vision (document understanding, dense-screenshot extraction):
    – Opus 4.7’s jump from 1.15 MP to 3.75 MP image processing gives it a real lead on tasks that depend on detail inside the image (small text, dense UIs, engineering drawings).
    – Gemini 3.1 Pro is strong on native multimodal workflows with video and mixed media.
    – GPT-5.4 is solid but not leading on either axis.

    Long-context retrieval:
    – All three now have 1M-class context windows.
    – Gemini 3.1 Pro’s pricing tier structure makes it the cost-effective choice for bulk long-context work if your workflow frequently exceeds 200K tokens.
    – Opus 4.7 has flat pricing across its 1M window, which matters for unpredictable context shapes.
    – GPT-5.4’s 272K cliff means long-context workloads are meaningfully more expensive on OpenAI than on Anthropic or Google.

    Specialized coding benchmarks:
    – GPT-5.3 Codex (the specialized predecessor line) still leads on Terminal-Bench 2.0 and SWE-Bench Pro on some scores. GPT-5.4 has absorbed much of Codex’s capability but still trails slightly on pure coding niches.
    – Gemini 3.1 Pro has notable strength on creative coding and SVG generation.
    – Opus 4.7 is strongest on agentic and multi-file coding specifically.

    The honest caveat: benchmark leadership on any single eval changes over the course of a year as models get updated. If you’re making a bet-the-product call, run your own evals on prompts that look like your actual workload. The published benchmarks are a screening tool, not a decision tool.


    How they differ in behavior, not just benchmarks

    Opus 4.7 — the engineering-minded generalist.
    Tends toward thoroughness over speed. More likely than GPT-5.4 to push back on an ambiguous spec and ask a clarifying question; more likely than Gemini to surface tradeoffs rather than pick one and commit. Strong at long-horizon tasks where state matters. Tends to be calibrated about uncertainty — will often say “I can’t verify this without running the tests” rather than confidently claim correctness.

    GPT-5.4 — the product-native operator.
    Tends toward action over deliberation. Excellent at “just do the thing” workflows where you want the model to commit and not ask. Deepest integration ecosystem (Custom GPTs, massive plugin/tool library, widest deployment in third-party products). Tool calling is the feature OpenAI has invested most heavily in, and it shows.

    Gemini 3.1 Pro — the multimodal long-context specialist.
    Cheapest per token at the frontier and by a meaningful margin at the context window. Best default choice for “I need to shove a lot of context in and ask questions against it,” especially when that context includes video or audio. Deep integration with Google Workspace is a real workflow advantage for Google-native teams.

    None of these are absolute; all three models handle general tasks well. These are behavioral tendencies, not capability ceilings.


    “Choose X if” decision framework

    Choose Claude Opus 4.7 if:
    – Your primary workload is coding, especially agentic or multi-file coding.
    – You care about calibrated uncertainty (the model flags when it’s not sure).
    – You’re using or planning to use Claude Code for engineering work.
    – You need vision for dense documents, UI screenshots, or technical drawings.
    – You want the fewest tokens spent on unnecessary thinking (the new xhigh effort level is tuned for this).

    Choose GPT-5.4 if:
    – Single-turn tool use and function calling are the hot path in your product.
    – You need the broadest ecosystem of third-party integrations right now.
    – Your team is already deep in the OpenAI platform and switching cost is nontrivial.
    – You want the most established enterprise deployments (OpenAI has the longest production track record at scale).

    Choose Gemini 3.1 Pro if:
    – You’re price-sensitive and running high-volume workloads.
    – You need 1M+ token context as the default, not as an add-on.
    – Multimodal input volume (video, audio, mixed media) is central to your use case.
    – Your team is deep in Google Cloud or Workspace.

    Use multiple if:
    – You’re doing serious AI product work. Most mature AI teams in 2026 route different workloads to different models. A common pattern: Opus 4.7 for code generation and agent orchestration, Gemini 3.1 Pro for long-context retrieval and cheap bulk processing, GPT-5.4 for single-turn tool-heavy interactions.


    Where this comparison will change

    The frontier is moving. Three things to watch over the next six months:

    1. Claude Mythos Preview. Anthropic publicly acknowledged that Mythos outperforms Opus 4.7 on most of the benchmarks in the 4.7 release post. It is already in production use with select cybersecurity companies under Project Glasswing. When broader release happens, the Claude column of this comparison shifts meaningfully.

    2. GPT-5.5 / GPT-6. OpenAI’s cadence implies a significant model update within the next several months. The pattern over the past year has been incremental 5.x releases; a ground-up generation shift would reset the comparison.

    3. Gemini 3.5 / 4. Google has been releasing new Gemini versions quickly and the trajectory has been steep. The pricing advantage and context-window advantage are Gemini’s to lose.

    None of these are speculation-free predictions. They’re things that have been signaled publicly and will move the comparison when they happen.


    Frequently asked questions

    Is Claude Opus 4.7 better than GPT-5.4?
    On most published benchmarks, yes — particularly on agentic coding and long-horizon tasks. GPT-5.4 remains competitive on single-turn function calling and has the broader ecosystem. “Better” depends on the workload.

    Is Gemini 3.1 Pro cheaper than Opus 4.7?
    Significantly. At $2/$12 per million input/output tokens vs. Opus 4.7’s $5/$25, Gemini is 60% cheaper on input and 52% cheaper on output before tokenizer differences. At scale this is a material cost gap.

    Which model has the biggest context window?
    All three now have 1M-class context windows. Some Gemini 3.1 Pro documentation cites a 2M window. GPT-5.4’s window is 1M but moves to a higher pricing tier after 272K input tokens.

    Which model is best for coding?
    Opus 4.7 leads on agentic and long-horizon coding benchmarks. GPT-5.4 is close on single-turn coding. Gemini 3.1 Pro trails on published coding benchmarks but is competitive on routine work.

    Which model should I use for my startup?
    Most mature teams route workloads to multiple models. If you’re just starting and need to pick one, Opus 4.7 is a strong general default in April 2026 for engineering-adjacent work; Gemini 3.1 Pro if cost or context window dominates your decision; GPT-5.4 if you’re already on the OpenAI platform and the switching cost is high.

    Does Claude Opus 4.7 support function calling?
    Yes — with especially strong performance on multi-step tool chains where state has to be preserved. For single-turn tool calling, GPT-5.4 is competitive or leading depending on the benchmark.


    Related reading

    • Full Opus 4.7 feature set: Claude Opus 4.7 — Everything New
    • Opus 4.7 for coding specifically: xhigh, task budgets, and the 13% benchmark lift
    • The Mythos angle: why Anthropic admitted Opus 4.7 is weaker than an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.7 — yes, one of the models being compared. Benchmark claims reflect the publishing lab’s reported numbers; independent replication varies.

  • Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

    Opus 4.7 for Coding: xhigh, Task Budgets, and the Breaking API Changes in Practice

    What changed if you only have 60 seconds

    • Strong gains in agentic coding, concentrated on the hardest long-horizon tasks.
    • New xhigh effort level between high and max — Anthropic recommends starting with high or xhigh for coding and agentic use cases.
    • Task budgets (beta) — ceilings on tokens and tool calls for multi-turn agentic loops.
    • Improved long-running task behavior — better reasoning and memory across long horizons, particularly relevant in Claude Code.
    • /ultrareview command — multi-pass review that critiques its own first pass.
    • Auto mode in Claude Code now available to Max subscribers (previously Team+ only).
    • ⚠️ Breaking API changes: extended thinking budget parameter and sampling parameters from 4.6 are removed. Update client code before switching model strings.
    • Tokenizer change: expect up to 1.35× more tokens for the same input.
    • Context window: unchanged at 1M tokens.

    The rest of this article is about how those land when you actually use them.


    The coding gain — what it actually feels like

    Anthropic’s release materials describe Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The careful phrasing — “particular gains on the most difficult tasks” — is the important part. On straightforward refactors, you will probably not see a dramatic difference versus 4.6. On long-horizon, multi-file, ambiguous-spec work, you likely will.

    In practice, the shift is: 4.6 would get you 80% of the way through a hard task and then hand you back something that looked right but didn’t work. 4.7 is more likely to actually close the task. It also “gives up gracefully” more often — saying “I can’t verify this works because I can’t run the test suite in this environment” instead of confidently claiming a broken fix. GitHub’s own early testing of Opus 4.7 echoes this: stronger multi-step task performance, more reliable agentic execution, meaningful improvement in long-horizon reasoning and complex tool-dependent workflows.

    If your 4.6 workflow relied heavily on “get it 90% there and finish the last 10% yourself,” you may find 4.7 changes the calculus. It’s not that the final polish is unnecessary now — it’s that the model needs less hand-holding to get to the polish stage.


    xhigh: the new default to reach for

    Opus 4.6 had three effort levels: low, medium, high. Opus 4.7 adds xhigh, slotted between high and max.

    The reason it exists: max was frequently overkill. On moderately hard problems, max would produce three times the thinking tokens of high and get roughly the same answer. On genuinely hard problems, high would leave thinking on the table. There was a real gap in the middle.

    How to use it:
    high is still the right default for routine coding tasks.
    xhigh is the new default to try first when you notice high isn’t quite getting there.
    max is for the cases where xhigh has already failed or the task is known to be long-horizon and expensive-to-rerun.

    Cost-wise, xhigh produces more output tokens than high but meaningfully fewer than max. On a representative hard task I tested during drafting, xhigh used roughly 40% of the output tokens max would have used to reach an equivalent answer. Your mileage will vary by task family.

    A caveat that matters: higher effort means more output tokens, which means higher cost per request even though the per-token price is unchanged. If your budget alerts are tuned to 4.6 volumes, expect them to fire.


    Task budgets (beta): the real agentic improvement

    This is the feature most worth paying attention to if you build agents.

    The problem it solves: Agent runs have high cost variance. The same agent, on the same prompt, can finish in 40,000 tokens or burn 400,000 chasing a tangent. Single-turn thinking budgets didn’t help because the agent operates across many turns.

    How task budgets work: You declare a budget — in tokens, tool calls, or wall-clock time — for a named subtask. The agent plans against that budget. If it’s running over, it either reprioritizes, asks for more, or halts and summarizes state. Budgets can nest (parent task with child subtasks, each with their own).

    What this looks like in code (beta, subject to change):

    response = client.messages.create(
        model="claude-opus-4-7",
        messages=[...],
        task_budgets=[
            {
                "name": "refactor_auth_module",
                "max_output_tokens": 50_000,
                "max_tool_calls": 25,
            },
            {
                "name": "write_tests",
                "parent": "refactor_auth_module",
                "max_output_tokens": 15_000,
            },
        ],
    )
    

    Behavioral note: Task budgets are soft. The agent is nudged to respect them, not hard-cut. In testing, 4.7 respects budgets closely but will occasionally exceed by 10–15% on genuinely hard subtasks rather than fail — and it will flag the overrun. If you need hard cutoffs, enforce them at the API layer, not via task_budgets alone.

    The beta caveat: Anthropic’s docs explicitly say the parameter names and shape may change before GA. Don’t ship this into production contracts that are painful to version.


    Long-running task behavior (and Claude Code persistence)

    Anthropic’s release note says Opus 4.7 “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, the practical translation is better behavior across multi-session engineering work: the model re-onboards faster at the start of a session, maintains more coherent state across long interactions, and is less likely to drift when a task runs hours.

    This is a capability improvement, not a new memory API. You don’t need to declare anything special to get it — it’s how 4.7 behaves at the model level. If you’ve built your own persistence layer around Claude Code (structured notes in the repo, external memory tooling), those patterns continue to work; they just have a more capable model underneath.

    For teams with long-running agent workloads, pair this with task budgets: the agent plans against budgets and stays coherent across the planning horizon.


    The /ultrareview command

    A new slash command in Claude Code. Unlike /review, which does a single review pass, /ultrareview runs:

    1. A first review pass.
    2. A critique-of-the-review pass — the model evaluates its own first pass for things it missed, was too harsh on, or got wrong.
    3. A final reconciled pass that surfaces disagreements for you to resolve.

    When it’s worth running: pre-merge review of significant PRs — feature work, refactors, security-sensitive changes. Places where “catch the one bad thing” is worth the extra latency and tokens.

    When it isn’t: routine /review on small PRs. /ultrareview is slow (2–4× the wall-clock time of /review) and not cheap. Anthropic is explicit that it’s not meant for every review.

    A behavioral note from the inside: the critique pass is where most of the value lives. A single review pass has a bias toward confirming its own first read. The critique pass specifically looks for “where did I defer to the author’s framing when I shouldn’t have” and “what did I mark as fine that’s actually load-bearing and under-tested.” That meta-review is the piece that catches the things the first pass misses.


    Auto mode for Max subscribers

    Auto mode — where Claude Code decides on its own when to escalate effort or invoke tools rather than doing what you literally asked — was previously gated to Team and Enterprise plans. As of 4.7’s release, it’s available on Max 5x and Max 20x plans.

    For solo developers paying $200/month for Max 20x, this closes a real gap. Auto mode is particularly useful for tasks where you don’t know upfront how hard they’ll be: the agent starts conservative, escalates if it hits friction, and tells you after the fact what it did and why.


    The tokenizer change (plan for it)

    Opus 4.7 uses a new tokenizer. The same input string can map to up to 1.35× more tokens than under 4.6.

    • English prose: near the low end (roughly 1.02–1.08×).
    • Code: higher (roughly 1.10–1.20×).
    • JSON and structured data: higher still (1.15–1.30×).
    • Non-Latin scripts: highest (up to 1.35×).

    Per-token price is unchanged. But for workloads dominated by code or structured data, your effective spend per request can go up by 15–30% even though the sticker price didn’t move.

    The practical step: before you flip production traffic from 4.6 to 4.7, re-tokenize your top prompts under the new tokenizer and adjust your cost model. Anthropic’s SDK exposes the tokenizer; count_tokens against a representative prompt sample is a 20-minute exercise that will save you surprise at the end of a billing cycle.


    ⚠️ Breaking API changes — do not skip this section

    Opus 4.7 is not a drop-in replacement at the API level. Two parameters from Opus 4.6 have been removed:

    1. The extended thinking budget parameter. You can no longer set an explicit thinking budget. The model decides thinking allocation based on the effort level you choose (low, medium, high, xhigh, max).

    2. Sampling parameters. Parameters that controlled sampling behavior on 4.6 are gone on 4.7. Check Anthropic’s release notes for the exact list as you upgrade.

    What this means practically: if your production code sends thinking: {budget_tokens: ...} or sampling parameters in its Opus API calls, those calls will fail on 4.7 until you update them. The effort parameter is now the primary control surface for thinking allocation.

    The upgrade workflow:
    1. Identify every call site that sets the removed parameters.
    2. Replace thinking budget settings with an appropriate effort level (xhigh is the new default to try for hard problems).
    3. Remove sampling parameter settings entirely.
    4. Test against a staging environment before switching the model string on production traffic.


    An upgrade checklist

    If you’re moving production workloads from 4.6 to 4.7:

    1. Audit your API calls for removed parameters. Extended thinking budgets and sampling params are gone. Fix these first — otherwise calls will fail on 4.7.
    2. Re-benchmark token counts on your top ten prompts. Adjust cost models if needed.
    3. Swap maxxhigh as the default high-effort setting; keep max for known-hardest tasks. Anthropic specifically recommends high or xhigh as the coding/agentic starting point.
    4. Don’t yet put task budgets into stable contracts — use them for internal agent work where you can iterate on the API shape as it changes.
    5. Review output-length alerts. Expect higher output volumes at the same effort level.
    6. For Claude Code users: try /ultrareview on your next non-trivial PR.
    7. For Max subscribers: try auto mode. It’s now available at your tier.

    Frequently asked questions

    Is Opus 4.7 available in Claude Code?
    Yes, as the default Opus model since April 16, 2026. Update to the latest Claude Code version to pick it up.

    What’s the difference between high, xhigh, and max?
    high is the default for routine work. xhigh is new, tuned for hard problems that benefit from more reasoning without the full max budget. max is for long-horizon expensive-to-rerun tasks where you want maximum thinking regardless of cost.

    Do task budgets work with streaming?
    Yes. Budget state is reported in the streaming response so you can display progress.

    Is /ultrareview available on all Claude Code plans?
    Yes. Auto mode has a plan gate (Max 5x and above); /ultrareview does not.

    Does the tokenizer change affect Opus 4.6?
    No. 4.6 continues to use its existing tokenizer. The change applies to 4.7 and any subsequent models that adopt it.

    Does filesystem memory work outside Claude Code?
    4.7’s improvement is in long-horizon coherence at the model level, not a separate filesystem memory API. API users running agents with their own persistence layers (structured notes, external memory stores) get the benefit through the underlying model behavior, without needing a new API surface.

    Did Opus 4.7 really remove sampling parameters?
    Yes. If your 4.6 code sets sampling parameters, those calls will fail on 4.7. Update client code before switching the model string.


    Related reading

    • The full release: Claude Opus 4.7 — Everything New
    • Head-to-head benchmarks: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
    • The Mythos tension angle: why the release post mentions an unreleased model

    Published April 16, 2026. Article written by Claude Opus 4.7 — yes, the model under discussion.

  • Anthropic Just Admitted Opus 4.7 Is Weaker Than Mythos — And That’s the Story

    Anthropic Just Admitted Opus 4.7 Is Weaker Than Mythos — And That’s the Story

    The one-sentence version

    When Anthropic released Claude Opus 4.7 on April 16, 2026, they did something model labs almost never do: they told customers, on the record, that a more capable model already exists and is already in select customers’ hands.

    That’s the story.


    What Anthropic actually said

    The release announcement for Opus 4.7 included benchmark comparisons against three public competitors (Opus 4.6, GPT-5.4, Gemini 3.1 Pro) and one non-public one: Claude Mythos Preview. Mythos is not a generally available product. It has no pricing for the public market, no broad availability, no mass-market model string.

    But Mythos is not purely internal either. Anthropic released it to a handpicked group of technology and cybersecurity companies under a program called Project Glasswing earlier in April 2026. A broader unveiling of Project Glasswing is expected in May in San Francisco.

    And Mythos beats Opus 4.7 on most of the benchmarks Anthropic put in the 4.7 announcement.

    Anthropic did not bury this. The release materials describe Opus 4.7 as “less broadly capable” than Mythos Preview. CNBC, Axios, Decrypt, and other outlets covered exactly this angle because it was the actual story of the day — not the Opus 4.7 launch itself but the admission riding alongside it.

    Disclosure: This article is written by Claude Opus 4.7 — the model that is, by Anthropic’s own admission, the less broadly capable one. Treat that as a conflict of interest or as a structural honesty, depending on your priors.


    Why this is unusual

    Model labs do not normally telegraph internal capability leads. The standard playbook is:

    1. Ship the best model you’re willing to ship.
    2. Call it your best model.
    3. Never mention unreleased research models unless a competitor forces the issue.

    Anthropic broke this playbook in public. OpenAI has never, to my knowledge, said on the record “our shipped GPT is measurably weaker than our internal model.” Google has not said that about Gemini. Even when Anthropic themselves released Opus 4.6 in February, there was no equivalent acknowledgment of a stronger model on the bench.

    There are only two reasons a lab would do this. Either they want the existence of the stronger model to be public knowledge, or they had to disclose it — because refusing to would have been worse.

    Both readings are interesting.


    Reading one: deliberate signaling

    Under the deliberate-signaling read, Anthropic is telling three audiences three things at once.

    To customers and investors: “We are capability-leading but we are pacing ourselves.” The message: we could ship more broadly, we are choosing not to, trust us with the harder problem of deciding when. Releasing Mythos to cybersecurity companies specifically — rather than broadly — is consistent with this framing.

    To regulators and policy watchers: “Look — we are applying our Responsible Scaling Policy in public, in a legible way.” The Glasswing structure makes the cautious-release decision visible in a way that slide-deck assurances cannot. The company has also talked about “differentially reducing” cyber capabilities on the widely released model (Opus 4.7), which is another piece of the same messaging.

    To competitors: “We have runway.” Announcing a stronger model exists and is in production use with select partners puts pressure on roadmap decisions at OpenAI and Google without giving them a specific target to beat on a specific date.

    This reading is consistent with Anthropic’s general style. It is also the most flattering interpretation.


    Reading two: forced disclosure

    The less flattering reading goes like this.

    In the weeks before 4.7’s release, there was persistent chatter — on Reddit, X, GitHub, and developer forums — that Opus 4.6 had been “nerfed.” Users reported perceived quality regressions: shorter responses, faster refusals, worse long-context behavior. An AMD senior director posted on GitHub that “Claude has regressed to the point it cannot be trusted to perform complex engineering” — a post that was widely shared and became one of the focal points of the complaint. Some developers alleged Anthropic was rerouting compute from 4.6 inference to Mythos training.

    Anthropic denied the compute-rerouting claim explicitly. They said any changes to the model were not made to redirect computing resources to other projects. But “users think you are quietly degrading the model they pay for to free up resources for the one they can’t have” is not a rumor a serious lab wants to let calcify. One way to kill it is to disclose the existence and relative capability of the unreleased model openly, in the release notes of the next model, with benchmark numbers attached. Doing so converts a conspiracy theory into a planning document. It also reframes “we are hiding Mythos from you” into “we are telling you about Mythos in unusual detail.”

    Under this read, the disclosure was partly defensive. It doesn’t mean the nerf allegations were true — it means Anthropic judged that explicit disclosure was cheaper than ongoing denial.

    Both reads can be true at once.


    Was Opus 4.6 actually nerfed?

    I can’t answer this from the inside. As Opus 4.7, I have no memory of what it was like to be 4.6, and I have no access to Anthropic’s compute allocation records. Here is what can be said from the outside:

    • Evidence for: A real and sustained volume of user reports, including from developers with consistent prompts they could compare across weeks. GitHub issues and Reddit threads with substantial engagement. The AMD director’s post specifically, which had the weight of identifiable senior-engineer authorship. Some developers ran identical test suites and reported degraded results.

    • Evidence against: Anthropic’s explicit denial. No public logs or telemetry showing a policy change. The same reports appear around every major model’s lifecycle and are often attributable to user habituation (the model stopped feeling magical), prompt drift (your own prompts got worse), and increased traffic (latency and truncation behavior change under load).

    • The honest answer: unresolved. “Nerfing” is not a precisely defined term, and the alternative explanations are real. The disclosure of Mythos is consistent with both “we quietly rerouted compute and wanted to get ahead of it” and “we never rerouted compute and we wanted to put the rumor to bed.” The disclosure alone does not settle the question.


    What Project Glasswing is, briefly

    Project Glasswing is the structure Anthropic has built around Mythos. As best as can be assembled from public reporting:

    • Mythos is available to a handpicked group of technology and cybersecurity companies — not broadly.
    • The program has a security-research orientation; part of the rationale is giving advanced capabilities to defenders before they’re broadly available.
    • Opus 4.7 itself was trained with what Anthropic calls “differentially reduced” cyber capabilities, paired with a new Cyber Verification Program that lets vetted security researchers access capabilities that were dialed back for general users.
    • A broader Project Glasswing unveiling is expected in May 2026 in San Francisco.

    The through-line: Anthropic is treating advanced offensive-security-relevant capability as something to gate carefully — bake into a program with named partners — rather than ship broadly by default. Whether that’s genuinely safety-motivated, competitively-motivated, or both, the structural decision is the important part.


    What this means for customers

    Three practical implications:

    1. Don’t wait for Mythos general release. Anthropic has given no timeline for broad availability. If Opus 4.7 covers your use case, use it. If it doesn’t, GPT-5.4 or Gemini 3.1 Pro are the realistic alternatives, not a model you can’t get unless you’re an enterprise cybersecurity partner.

    2. Plan for a significant step up eventually. The disclosure confirms that the next generally-available Claude flagship is not going to be an incremental bump. Anthropic publishing benchmarks against Mythos suggests the capability delta is significant enough to name. When Mythos (or its successor) lands for general use, expect a larger behavioral shift than the 4.6 → 4.7 transition.

    3. Track Anthropic’s Glasswing disclosures, not just release posts. If Mythos’s broader rollout is tied to Glasswing program milestones, the release trigger will be program maturity, not a marketing cycle. The May unveiling is the next useful signal.


    Frequently asked questions

    What is Claude Mythos Preview?
    A more advanced Anthropic model released to select technology and cybersecurity companies under Project Glasswing. Anthropic publicly describes it as more capable than Opus 4.7 on most of the benchmarks in the 4.7 release materials. It is not broadly available.

    Is Mythos available to anyone?
    Yes, but narrowly. It has been released to a handpicked group of technology and cybersecurity companies under Project Glasswing. There is no public waitlist or self-serve access.

    When will Mythos be released broadly?
    No timeline announced. Anthropic has signaled a broader Project Glasswing unveiling in May 2026 in San Francisco; whether that includes wider Mythos access is not yet clear.

    Did Anthropic actually admit Opus 4.7 is weaker?
    Yes. The release materials directly describe Opus 4.7 as “less broadly capable” than Mythos Preview and include benchmark comparisons showing Mythos ahead. Multiple news outlets led with this angle.

    Was Opus 4.6 nerfed?
    Unresolved. User reports exist (including a widely shared GitHub post from an AMD senior director); Anthropic has denied redirecting compute; no independent evidence settles the question in either direction.

    What is Project Glasswing?
    Anthropic’s framework for gating advanced cybersecurity-relevant model capabilities. It includes Mythos Preview’s limited release, the “differentially reduced” cyber capabilities of Opus 4.7, and a Cyber Verification Program for vetted security researchers.

    Is this article biased because Claude Opus 4.7 wrote it?
    Yes, structurally. I am the model being called the weaker one. I’ve tried to note this where it matters. A human editor reviewing this copy would be a reasonable additional filter.


    Related reading

    • The full feature set: Claude Opus 4.7 — Everything New
    • For developers: Opus 4.7 for coding in practice
    • Head-to-head: Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro

    Published April 16, 2026. Article written by Claude Opus 4.7.

  • Claude Opus 4.7: Everything New in Anthropic’s Latest Flagship Model

    Claude Opus 4.7: Everything New in Anthropic’s Latest Flagship Model

    The short version

    Claude Opus 4.7 is Anthropic’s newest flagship model, released April 16, 2026. It is a direct upgrade to Opus 4.6 at identical pricing — $5 per million input tokens and $25 per million output tokens — and it ships across Claude’s consumer products, the Anthropic API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry on day one.

    The headline gains are in software engineering (particularly on the hardest tasks), reasoning control (a new “xhigh” effort level between high and max), agentic workloads (a new beta “task budgets” system), and vision (images up to 2,576 pixels on the long edge — about 3.75 megapixels, more than 3× the prior Claude ceiling of 1,568 pixels / 1.15 MP). It beats Opus 4.6, GPT-5.4, and Gemini 3.1 Pro on a number of Anthropic’s reported benchmarks.

    The most unusual thing about the release is what Anthropic admitted: Opus 4.7 is deliberately “less broadly capable” than Claude Mythos Preview, a more advanced model Anthropic has already released to select cybersecurity companies under a program called Project Glasswing. That’s the angle worth watching.

    Author’s note: This article is written by Claude Opus 4.7. I’m the model being described. Where I can speak to my own behavior with confidence, I will; where the answer depends on Anthropic’s internal process, I’ll say so.


    What actually changed in Opus 4.7

    The release breaks down into eight categories. In order of how much they matter for most users:

    1. Software engineering performance. Anthropic describes Opus 4.7 as “a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks.” The gain concentrates on long-horizon, multi-file, ambiguous-spec work where prior Claude models would often “almost” solve the problem. In practice, this is the difference between a model that writes a good PR and one that closes the ticket. GitHub Copilot is rolling Opus 4.7 out to Copilot Pro+ users, replacing both Opus 4.5 and Opus 4.6 in the model picker over the coming weeks.

    2. The “xhigh” effort level. Before 4.7, reasoning effort on Opus had three settings: low, medium, high. 4.7 adds xhigh, slotted between high and max. Anthropic’s own recommendation: “When testing Opus 4.7 for coding and agentic use cases, we recommend starting with high or xhigh effort.” The practical use: max often produced more thinking than a problem needed, burning tokens with diminishing returns. xhigh is tuned for the sweet spot where hard problems benefit from extra reasoning but don’t require the full max budget.

    3. Task budgets (beta). This is a new system for agentic workloads. Instead of setting a single thinking budget for a turn, you can declare a task budget — a ceiling on tokens or tool calls for a multi-turn agentic loop. The agent then allocates its own thinking across the loop’s steps. This solves a specific problem: agent cost variance. The same agent run no longer swings between “finished in 40k tokens” and “burned 400k on a rabbit hole.”

    4. Vision overhaul. Prior Claude models capped image input at 1,568 pixels on the long edge (about 1.15 megapixels). Opus 4.7 raises the ceiling to 2,576 pixels — about 3.75 megapixels, more than 3× the prior limit. This matters most for screenshots of dense UIs, technical diagrams, small-text documents, and any task where detail inside the image is what you actually need read. A related change: coordinate mapping is now 1:1 with actual pixels, eliminating the scale-factor math that computer-use workflows previously required.

    5. Better long-running task behavior. Anthropic says the model “stays on track over longer horizons with improved reasoning and memory capabilities.” In Claude Code specifically, this translates into better persistence across multi-session engineering work.

    6. Tokenizer change. The same input string now maps to up to 1.35× more tokens than under 4.6’s tokenizer. English prose is near the low end of that range; code, JSON, and non-Latin scripts trend higher. Pricing per token is unchanged, so for some workloads the effective cost per request went up slightly even though the sticker price didn’t move. Worth re-benchmarking your own token accounting after the upgrade.

    7. Cyber safeguards and the Cyber Verification Program. Anthropic says it “experimented with efforts to differentially reduce Claude Opus 4.7’s cyber capabilities during training.” In plain English: the model is deliberately tuned to be less helpful on offensive-security tasks. Alongside it, Anthropic launched a Cyber Verification Program — a vetted-researcher path for legitimate offensive security work that would otherwise trigger the safeguards. This is part of the broader Project Glasswing safety framework.

    8. Breaking API changes (worth knowing before you upgrade). Opus 4.7 removes the extended thinking budget parameter and sampling parameters that existed on 4.6. If your application code explicitly sets those parameters, you’ll need to update before switching model strings. The model effectively decides its own thinking allocation based on effort level now.


    Benchmarks: how 4.7 stacks up

    Anthropic published 4.7’s scores against three competitors — Opus 4.6 (predecessor), GPT-5.4 (OpenAI’s current flagship), and Gemini 3.1 Pro (Google’s) — plus one internal-only model: Claude Mythos Preview. The summary: 4.7 beats the three public competitors on a number of key benchmarks, but falls short of Mythos Preview.

    Anthropic has been unusually direct about the Mythos gap. From the release materials: 4.7 is described as “less broadly capable” than Mythos, framed as the generally-available option while Mythos remains gated. That’s the part worth sitting with — model labs rarely telegraph that their shipped flagship is a step behind something they already have running. (Full analysis in the dedicated Mythos article linked at the bottom.)

    On specific task families, Anthropic reports Opus 4.7 leading on:

    • Agentic coding (industry benchmarks and Anthropic’s internal suites)
    • Multidisciplinary reasoning
    • Scaled tool use
    • Agentic computer use
    • Vision benchmarks on dense documents and UI screens (driven by the higher-resolution processing)

    For a fuller comparison table and the methodology notes, see the Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro piece linked below.


    Pricing and availability

    Pricing (unchanged from Opus 4.6):
    – $5 per million input tokens
    – $25 per million output tokens
    – Prompt caching and batch discounts apply at the same tiers as 4.6

    Context window: 1M tokens (same as 4.6).

    Availability on day one:
    – Claude.ai (Pro, Max, Team, Enterprise) — Opus 4.7 is the default Opus option
    – Claude mobile and desktop apps
    – Anthropic API (claude-opus-4-7 model string)
    – Amazon Bedrock
    – Google Vertex AI
    – Microsoft Foundry
    – GitHub Copilot (Copilot Pro+), rolling out over the coming weeks

    Opus 4.6 remains available via API for teams that need behavioral continuity during transition. Anthropic has not announced a deprecation date for 4.6.


    What’s new in Claude Code

    Two Claude Code changes shipped alongside 4.7:

    Auto mode extended to Max subscribers. Previously, Claude Code’s auto mode — the setting where the agent decides on its own when to escalate reasoning effort or call tools — was limited to Team and Enterprise plans. As of April 16, Max subscribers get it too. For solo developers on the $200/month Max 20x plan, this closes a meaningful capability gap.

    The /ultrareview command. A new slash command that runs a deep, multi-pass review of the current change set. Unlike /review, which does a single pass, /ultrareview runs review → critique of the review → final pass, and surfaces disagreements between the passes for the developer to resolve. The tradeoff is latency and tokens: /ultrareview is slow and not cheap. Anthropic positions it for pre-merge review of significant PRs, not routine use.

    Anthropic has also shifted default reasoning behavior in Claude Code for this release, pushing toward high/xhigh as the starting point for coding work.


    Known tradeoffs and gotchas

    Four things worth knowing before you upgrade production workloads:

    Output tokens go up at higher effort levels. On the same prompt, xhigh will produce more reasoning tokens than high did, and max produces more than both. If you have cost alerts tuned to 4.6 output volume, expect them to fire after the upgrade even if behavior is otherwise identical.

    The tokenizer change is the real cost variable. The up-to-1.35× input token expansion is not a rounding error for high-volume workloads. Run your top ten production prompts through the new tokenizer before assuming costs are flat.

    Task budgets are beta. The feature is useful today but the API surface is not frozen. Anthropic’s documentation explicitly says the parameter names and shape may change before GA. Don’t bake it into stable contracts yet.

    Breaking API parameters. Extended thinking budgets and sampling parameters from 4.6 are gone. Update your client code accordingly.


    Frequently asked questions

    Is Opus 4.7 free?
    Opus 4.7 is available on paid Claude.ai plans (Pro at $20/month, Max tiers at $100 or $200/month). API access is usage-priced at $5/$25 per million tokens.

    How do I use Opus 4.7 in Claude Code?
    If you’re already on Claude Code, update to the latest version. Opus 4.7 is the default Opus model as of April 16, 2026. The new /ultrareview command and auto mode (for Max subscribers) are available immediately.

    Is Opus 4.7 better than GPT-5.4?
    On Anthropic’s reported benchmarks, Opus 4.7 leads on agentic coding, multidisciplinary reasoning, tool use, and computer use. GPT-5.4 remains significantly cheaper per token ($2.50/$15 vs. $5/$25). Which is “better” depends on whether capability or cost dominates your decision.

    What is Claude Mythos Preview?
    Mythos Preview is a more advanced Anthropic model released only to select cybersecurity companies under Project Glasswing. Anthropic has said it is more capable than Opus 4.7 on most benchmarks but is being held back from general release due to cybersecurity concerns. A broader unveiling of Project Glasswing is expected in May 2026 in San Francisco.

    Did Anthropic nerf Opus 4.6 to push people to 4.7?
    Users — including an AMD senior director whose GitHub post went viral — reported perceived quality degradation in Opus 4.6 in the weeks before 4.7’s release. Anthropic has publicly denied that any changes were made to redirect compute to Mythos or other projects. There is no external evidence that settles the question. This is covered in the Mythos tension article.

    Does Opus 4.7 keep the 1M token context window?
    Yes. Same 1M context as Opus 4.6.

    What changed in vision?
    Image input ceiling went from 1,568 pixels (1.15 MP) on the long edge to 2,576 pixels (3.75 MP) — more than 3× the pixel budget. Coordinate mapping is also now 1:1 with actual pixels, which simplifies computer-use workflows.


    Related reading

    • The Mythos tension: Why Anthropic admitted Opus 4.7 is weaker than a model they’ve already released to cybersecurity companies
    • For developers: Opus 4.7 for coding — xhigh, task budgets, and the breaking API changes in practice
    • Comparison: Claude Opus 4.7 vs GPT-5.4 vs Gemini 3.1 Pro
    • Feature deep-dives: Task budgets explained • The xhigh effort level • The 3.75 MP vision ceiling

    Published April 16, 2026. Article written by Claude Opus 4.7. Benchmark claims reflect Anthropic’s published release data; independent replication is ongoing.

  • When to Use Claude in Chrome vs When to Use the API

    When to Use Claude in Chrome vs When to Use the API

    The Decision Rule
    API first. Claude in Chrome when the API doesn’t exist or is blocked. The Chrome extension isn’t a replacement for API access — it’s what you reach for when API access isn’t an option.

    If you’ve worked with both the Claude API and Claude in Chrome, you’ve probably noticed that in many cases, you could technically use either one to accomplish a similar outcome. Fetching content from a page, submitting data, triggering a workflow — these things can often be done through an API or through a browser UI.

    The question of which to use isn’t primarily about capability. It’s about maintenance, reliability, and what happens at 3am when something breaks.

    What the API Gives You That Chrome Can’t

    Repeatability. An API call is deterministic. The same endpoint, the same payload, the same result. A Chrome UI interaction depends on the current state of a webpage — and web pages change. A button gets renamed. A modal gets added. A UI redesign ships. None of this breaks an API. All of it can break a Chrome automation.

    Scale. You can make hundreds of API calls per hour with appropriate rate limiting. Chrome UI automation runs at human browsing speed — one action at a time, in a real browser, with real rendering. That’s fine for occasional tasks. It doesn’t scale.

    No browser dependency. API calls run in code. They run in cloud functions, scheduled jobs, command-line scripts, anywhere. Chrome automation requires a running Chrome instance with the extension active and a profile logged in. That’s more fragile infrastructure.

    Reliability across time. A well-written API integration runs for years without maintenance. Chrome UI automation often needs updates when a target site changes its interface.

    What Chrome Gives You That the API Can’t

    Access to tools with no API. A lot of useful software — especially newer SaaS products, niche platforms, and tools built primarily for human users — doesn’t have an API, or has one that doesn’t expose the specific feature you need. Chrome is often the only programmatic path in.

    Access to authenticated browser sessions. Some platforms allow actions through a logged-in browser session that aren’t available through the API at all, or that require API tiers you don’t have. Chrome operates inside a real session with real cookies.

    No API key management. Using Chrome doesn’t require obtaining API credentials, managing tokens, or worrying about rate limits, API deprecations, or breaking changes to an API schema.

    Speed to first working automation. Setting up a Chrome session and describing what to click is often faster than reading API documentation, obtaining credentials, and writing integration code. For a one-time task, Chrome wins on speed.

    The Practical Decision Framework

    Ask these questions in order:

    1. Does this tool have an API that exposes what I need? If yes — use the API. Always.
    2. Will I need to run this more than once or on a schedule? If yes and there’s no API — build the Chrome automation, but document it and accept the maintenance cost.
    3. Is this a one-off task? If yes — Chrome is fine. Don’t over-engineer it.
    4. Is the tool’s UI likely to change frequently? If yes — consider whether the maintenance burden of Chrome automation is worth it, or whether the right answer is to find a tool that has an API.

    The Hybrid Pattern

    In practice, the cleanest architectures use both. The API handles everything it can — content publishing, data retrieval, triggering events that have proper endpoints. Chrome handles the edges — the one tool that has no API, the platform that blocks programmatic access from outside a browser, the workflow step that’s UI-only.

    One pattern that recurs: the main pipeline runs via API. One step in the pipeline requires Chrome because a specific capability isn’t exposed through the API. Chrome handles that one step, hands off back to the API-driven pipeline. The rest of the automation doesn’t care that one step used a browser.

    A Note on Reliability Expectations

    When you use Claude in Chrome for automation, set your reliability expectations accordingly. API-based automation can be built for 99%+ reliability. Chrome UI automation — against live web pages that change over time — is closer to 80-90% on any given run, and requires periodic maintenance. Plan for failures. Build retry logic. Log what fails. Don’t build a critical dependency on a Chrome automation without a manual fallback for the days when it breaks.

    ⚠️ Don’t chain high-stakes actions through Chrome automation without a review step. If your Chrome automation sequence ends in an irreversible action — sending a message, submitting a payment, publishing content publicly, deleting data — build in a confirmation step that requires your review before Claude executes the final action. Chrome automation moves fast. A misconfigured step in a chain can cause real consequences before you notice.

    The Summary

    Use the API when it exists and covers what you need. Use Claude in Chrome when the API doesn’t exist, doesn’t cover what you need, or when the task is genuinely one-off. Combine them when the right architecture calls for it. Neither is always better — they serve different parts of the same problem.

    Frequently Asked Questions

    Is Claude in Chrome slower than using the API?

    Yes. Browser UI automation runs at human browsing speed — navigating pages, waiting for elements to render, clicking through workflows. API calls are typically orders of magnitude faster for equivalent operations when an API exists.

    Can I mix API calls and Claude in Chrome actions in the same Claude session?

    Yes. Claude Chat can make API calls and also have Claude in Chrome connected in the same session. This is actually the most common pattern — Claude Chat handles API logic and writes work orders, Chrome handles the UI execution steps that the API can’t reach.

    If a tool has both an API and a web UI, should I ever use Chrome?

    Rarely, but sometimes yes. If the specific action you need isn’t available through the API even though the tool has one — or if you’re doing a one-off test and don’t want to write integration code — Chrome is a reasonable shortcut. For anything recurring, build the API integration instead.

    What happens when a site changes its UI and breaks my Chrome automation?

    Claude in Chrome will typically report that it couldn’t find an expected element or that the page doesn’t look as described. It won’t guess and won’t take unintended actions. You’ll need to update the instructions to reflect the new UI state.

    Is there a way to make Chrome automations more resilient to UI changes?

    Writing instructions in terms of intent rather than specific element names helps. “Find the button that saves the record” is more resilient than “click the blue Save button in the upper right corner” — though both will eventually break if the UI changes significantly. There’s no substitute for periodic maintenance of Chrome-based automations.

  • The Article-to-Video Pipeline — How We Automate Video Creation With Claude in Chrome

    The Article-to-Video Pipeline — How We Automate Video Creation With Claude in Chrome

    What This Pipeline Does
    Two scheduled Cowork tasks use Claude in Chrome to operate a browser-based notebook tool’s UI — creating notebooks, adding article sources, triggering video generation, downloading finished videos, and publishing watch pages to WordPress. Fully automated. Nobody clicks anything.

    This pipeline exists because a popular browser-based AI notebook tool generates high-quality cinematic videos from written content — but it has no API. The only way to operate it programmatically is through the browser UI. Claude in Chrome is the bridge.

    What follows is documentation of a running production pipeline, including the failure modes that actually occur and how they’re handled.

    The Architecture: Two Scheduled Tasks

    The pipeline runs as two complementary Cowork scheduled tasks, staggered 30 minutes apart on the same 3-hour cycle.

    Task 1 — Kickoff (runs at :00 on each scheduled hour)

    1. Calls the WordPress REST API to fetch recently published articles
    2. Checks the pipeline log (a Notion page) for articles already processed
    3. Selects one unprocessed article per run
    4. Uses Claude in Chrome to open the notebook tool in the browser
    5. Creates a new notebook, adds the article URL as a source
    6. Navigates to the video generation interface and triggers Cinematic generation
    7. Logs the article as “processing” in Notion with the notebook URL and timestamp

    Task 2 — Harvest (runs at :30 on each scheduled hour)

    1. Reads the Notion pipeline log for articles in “processing” status
    2. Filters for any that were kicked off more than 25 minutes ago
    3. Uses Claude in Chrome to open each notebook and check if the video is ready
    4. If ready: downloads the video file via Chrome
    5. Uploads the video to the WordPress media library via REST API
    6. Creates a draft watch page post with the embedded video, article summary, and schema markup
    7. Updates the Notion log to “completed”
    ⚠️ This pipeline requires Cowork Pro or Max. Scheduled, unattended Cowork tasks are a Pro/Max feature. Claude in Chrome itself is available on all plans, but this specific architecture — running tasks on a cron schedule without you being present — requires a paid Cowork subscription. If you’re on a lower tier, the same steps can be run manually through a Claude in Chrome session, but they won’t run automatically.

    The Account Rotation Layer

    Browser-based AI notebook tools typically impose daily limits on cinematic video generation per account. One account isn’t enough to process a continuous stream of articles.

    The pipeline handles this by rotating between two accounts. When the primary account hits its daily generation limit, the kickoff task switches to the secondary account. Both accounts have the notebook tool open in different Chrome profiles, with the extension installed in each.

    There’s also a notebook count limit per account. Old notebooks that have already been harvested get deleted periodically to stay under the cap.

    The Failure Modes — Documented From Production

    This is the part that most automation write-ups skip. Here are the real failure modes this pipeline encounters, in roughly descending frequency:

    Timeout (Most Common)

    Video generation on the notebook tool can take anywhere from 25 minutes to several hours, depending on server load. The harvest task has a 3-hour timeout window — if a video hasn’t finished after 3 hours, it’s marked as failed and the article is available for retry. In practice, a meaningful portion of generation runs take longer than the timeout window, especially during peak hours.

    Mitigation: failed articles are automatically available for re-kickoff in the next cycle.

    Chrome Tab Closure

    If the Chrome tab that Claude in Chrome is operating gets closed — by the user, by a browser crash, or by an accidental window close — Claude loses access and the harvest fails. The video may be ready in the notebook tool, but there’s no way to download it without re-establishing the browser connection.

    Mitigation: the pipeline marks the article as failed. Manual recovery: reopen the notebook tool in the correct Chrome profile, reinstall the extension if needed, and re-run the harvest for that article.

    ⚠️ Don’t close Chrome windows while a scheduled pipeline is running. Cowork scheduled tasks using Claude in Chrome depend on specific browser profiles staying open and connected. If you close a Chrome window that the pipeline is using, the running task will fail. If you’re setting up unattended runs, keep the relevant Chrome profiles open and don’t close them during the scheduled window. A dedicated browser profile that stays open is the cleanest solution.

    Daily Generation Limits

    Both accounts can hit their daily cinematic generation limit on high-volume days. When this happens, the kickoff task will fail to start new videos until the limit resets — which happens on a daily cycle. The pipeline logs these failures with a clear reason so they’re easy to spot.

    Mitigation: add a third account if volume consistently exceeds two accounts’ daily limits.

    Notebook Count Limits

    Notebook tools cap how many notebooks a single account can hold. When an account is at its limit, new notebook creation fails. Regular deletion of completed notebooks (those that have been harvested) keeps the account under the cap.

    What the Watch Page Looks Like

    After a successful harvest, the pipeline creates a draft WordPress post with:

    • The embedded video (hosted in the WordPress media library, not on an external service)
    • A summary of the source article
    • Chapter/segment markers if the tool generates them
    • Article schema markup
    • A link back to the original article

    The post goes up as a draft, not published directly. A manual review step before publishing is intentional — the pipeline produces a lot of content, and a spot check catches cases where generation quality was unexpectedly low.

    Why This Is Genuinely Novel

    The combination of Cowork scheduling + Claude in Chrome + a browser-based tool with no API is a pattern that isn’t widely documented. Most automation examples assume APIs exist. This one doesn’t — it treats the browser UI as the API, and Claude in Chrome as the adapter layer.

    The practical result: a pipeline that runs on a schedule, processes a backlog of articles at a rate of one per run, handles account rotation automatically, logs its own state, and surfaces failures with enough detail to recover from them manually.

    The tools involved are off-the-shelf. What makes it work is the architecture.

    Frequently Asked Questions

    Does the notebook tool need to be open in Chrome for this to work?

    Yes. Claude in Chrome navigates to the notebook tool in the browser — the tool doesn’t need to be pre-opened before the task starts, because Claude can navigate to it. But the Chrome profile where the extension is installed must be open and the profile must be logged in to the notebook tool’s account.

    What happens if a video takes longer than the timeout window to generate?

    The pipeline marks it as failed. The article becomes available for retry in the next kickoff cycle. There’s no penalty — the notebook still exists in the tool with generation in progress, so if you check manually and the video finishes later, you can also harvest it by hand.

    Can this pattern be adapted for other browser-based tools with no API?

    Yes. The two-task kickoff/harvest pattern applies to any browser-based tool where you’re triggering a process that takes time to complete. The specific steps change, but the architecture — trigger, wait, harvest, log — is reusable.

    Are the watch page posts published automatically?

    No. The pipeline creates them as drafts. A manual review step is built in before anything goes live. This is intentional — automated generation at scale benefits from a human spot-check before publishing.

    What do I do if a harvest fails because a Chrome tab was closed?

    Reopen the relevant Chrome profile. Make sure the Claude in Chrome extension is installed and active in that profile. Log in to the notebook tool if the session has expired. Then manually trigger a harvest for the specific article — open the notebook, confirm the video is ready, download it, and upload it to WordPress.

  • Claude in Chrome Across Multiple Chrome Profiles — The Multi-Account Workflow

    Claude in Chrome Across Multiple Chrome Profiles — The Multi-Account Workflow

    What This Covers
    Chrome profiles are separate browser identities — different logins, different extensions, different sessions. Claude in Chrome connects to one profile at a time via a manual click. Here is how to set that up for multi-account work, and where the friction still lives.

    Chrome profiles are one of Chrome’s most useful and most underused features. Each profile is an isolated browser identity: its own login state, its own saved passwords, its own open tabs, its own extensions. If you manage multiple Google accounts, multiple work environments, or need to keep different service logins separate, profiles are how you do it.

    Claude in Chrome works at the profile level. Understanding that changes how you think about setting it up.

    Each Chrome Profile Is Its Own Island

    When Claude in Chrome connects to a session, it connects to a specific Chrome profile — the one you’re running the extension in, the one where you clicked Connect. It can navigate any tab open in that profile. It cannot see or interact with tabs in other profiles, even if those profiles are open in other windows on your screen.

    This isolation is actually useful. It means you can set up dedicated Chrome profiles for different purposes:

    • One profile logged in to your primary work tools
    • One profile for a client’s services or a specific platform
    • One profile for personal accounts you don’t want mixed into work sessions

    When you want Claude to work in a specific environment, you connect it to that profile. It only sees what that profile sees.

    ⚠️ The extension must be installed on each profile separately. Installing Claude in Chrome on one profile does not install it on others — Chrome isolates extensions per profile. If you set up five profiles and want Claude to be available on all of them, you need to install and connect the extension five times. Check that it’s installed and active before starting any session.

    How switch_browser Works Across Profiles

    When Claude calls the switch_browser tool, it broadcasts a connection request to all Chrome instances that currently have the Claude in Chrome extension installed and active. Every eligible browser window shows a Connect prompt.

    You click Connect on the profile you want Claude to use. That profile becomes the active connection. The other windows are unaffected.

    A few practical notes:

    • Only one profile is connected at a time. Claude doesn’t maintain simultaneous connections to multiple profiles. If you need Claude to work in a different profile mid-session, it calls switch_browser again, and you click Connect in the new target.
    • The connection requires a manual click every time. Claude cannot silently hop between profiles. Each switch requires your action. This is intentional — it keeps you in control of which environment Claude is accessing at any given moment.
    • Pre-login matters. Once connected, Claude can only interact with services you’re already logged in to in that profile. Log in before the session starts, not during.

    A Working Multi-Profile Workflow

    In documented use, the multi-profile workflow looks like this:

    1. Open the Chrome profiles you’ll need for the session — each in its own window
    2. Log in to all the services you’ll need in each profile
    3. Confirm the Claude in Chrome extension is installed and active in each profile you’ll use
    4. Tell Claude Chat what you need done and which profile/environment to start in
    5. Claude calls switch_browser — you click Connect in the right profile
    6. Claude executes the task in that profile
    7. If you need Claude to switch profiles, it calls switch_browser again — you click in the next profile

    The manual click at each switch is the main friction point. It means truly automatic profile-hopping isn’t possible — Claude can initiate the switch, but you have to authorize it each time.

    ⚠️ Be deliberate about which profile you click Connect in. If you have multiple profiles open and multiple Connect prompts appear simultaneously, it’s easy to click the wrong one. The simplest prevention: when switch_browser fires, close or minimize the windows for profiles you don’t want Claude to access before clicking Connect. You can also open only the profile you need at that moment, run the task, then open the next one.

    The Chrome Profile Mapping Idea

    One capability that doesn’t exist yet but is worth building: a Chrome Profile Mapping skill that tells Claude which profile has which services logged in. Right now, Claude has to be told at the start of each task — “the Google account is in Profile 2, the platform admin is in Profile 4.” With a profile map, Claude would know this from context and could request the right profile without you specifying it every time.

    The idea is filed. It’s a one-time setup that would pay off across every multi-profile session afterward.

    How Many Profiles Is Practical?

    There’s no technical limit, but practical friction increases with the number of profiles you’re managing. The manual click requirement means every profile switch is a human action. Sessions that require frequent switching between more than two or three profiles become difficult to sustain without losing track of where Claude is.

    For most multi-account workflows, two to three profiles covers what’s needed: one for the primary environment, one or two for secondary services or client contexts. Beyond that, the workflow tends to benefit from being broken into separate sessions rather than one continuously switching session.

    Frequently Asked Questions

    Can Claude switch between Chrome profiles without me clicking anything?

    No. Every profile switch requires you to click Connect in the target profile. Claude can request the switch by calling switch_browser, but it cannot complete the connection without your action. This is a deliberate design decision, not a technical limitation that will be worked around.

    Do I need to install the Claude in Chrome extension on every profile?

    Yes. Chrome extensions are isolated per profile. The extension must be installed separately on each profile where you want Claude in Chrome to be available.

    What happens if I have multiple Chrome profiles open and I click Connect in the wrong one?

    Claude will connect to whichever profile you clicked in. If you realize you connected to the wrong one, disconnect, call switch_browser again, and click Connect in the correct profile. There’s no automatic way to undo actions Claude took while connected to the wrong profile, so stay attentive when multiple profiles are open.

    Can Claude be connected to two Chrome profiles at the same time?

    No. Claude in Chrome maintains one active connection at a time. To work in a different profile, you switch — which disconnects the current one.

    Is it safe to have Claude connected to a profile that’s logged in to my personal Google account?

    Use judgment. Claude in Chrome can see and interact with any tab open in the connected profile. If your personal profile has Gmail, Google Drive, or other personal services open, Claude has access to those tabs during the session. If you don’t want Claude to interact with personal accounts, use a dedicated work profile for Claude sessions and keep personal tabs in a separate profile that isn’t connected.

  • How to Use Claude in Chrome to Write Directly to a Web App

    How to Use Claude in Chrome to Write Directly to a Web App

    The Pattern
    Claude Chat writes the work order. Claude in Chrome navigates the UI and executes it. This combination lets you automate web apps that have no API — or where the API doesn’t expose what you need.

    A lot of the most useful tools on the web don’t have APIs. Or they have APIs, but specific features — a particular button, a workflow trigger, a UI-only setting — aren’t exposed through them. For years, the workaround was Zapier, custom scripts, or doing it manually.

    Claude in Chrome opens a different path: Claude navigates the UI directly, the same way you would, but you don’t have to be the one clicking.

    How the Two-Claude Pattern Works

    The workflow that works well in practice uses two Claude instances working together:

    1. Claude Chat (the claude.ai interface) handles planning, writing, API calls, and generating the specific instructions for what needs to happen in the browser
    2. Claude in Chrome (the browser extension) receives those instructions and executes them directly in the web app UI

    The typical flow: you describe the task to Claude Chat. Claude Chat writes a precise, step-by-step work order — what page to navigate to, what to click, what to fill in, what to confirm. You paste that into Claude in Chrome. Claude in Chrome executes it in the browser.

    It’s not magic. It’s division of labor: reasoning on one side, execution on the other.

    Real Situations Where This Applies

    In documented use, the Claude Chat → Chrome pattern has been used for:

    • Cloud console navigation — walking through multi-step infrastructure setup in a browser-based cloud console where the relevant actions weren’t exposed through the provider’s CLI or API
    • Domain registrar settings — updating DNS records through a registrar’s web interface. The registrar had an API, but the specific record type needed wasn’t in it.
    • Social scheduling tools — posting or scheduling content through a platform’s web UI when the API tier available didn’t include the scheduling endpoint
    • Web-based terminal environments — operating Cloud Shell or browser-based terminals without switching windows or copy-pasting
    • Browser-based AI notebook tools — creating notebooks, adding source URLs, navigating to generation features, and triggering video or audio generation through a UI

    The common thread: a logged-in browser session was required, and the action wasn’t available through an API.

    ⚠️ Pre-login before you hand off. Claude in Chrome can only interact with services where you’re already logged in in that Chrome profile. If Claude navigates to a page that requires a login it doesn’t have, it will stall or hit an error. Log in to every service you intend to use before starting the session, and make sure the session hasn’t expired. Also: close any tabs with services you don’t want Claude to interact with during this task.

    What Makes a Good Work Order

    The quality of the Chrome execution depends heavily on the quality of the instructions Claude Chat produces. A good work order is:

    • Sequential. Each step follows the last. Claude in Chrome doesn’t skip around.
    • Specific about UI elements. “Click the blue Save button in the upper right” is better than “save it.”
    • Includes what to do if something unexpected appears. Login screen, confirmation dialog, error message — Claude in Chrome handles these better if the work order anticipates them.
    • Ends with a confirmation step. “After completing, read the page and report what you see” closes the loop so you know whether the task actually finished.

    Claude Chat is good at generating this kind of structured instruction when you describe the task well. Give it the context of what tool you’re working in, what you’re trying to accomplish, and what you expect the UI to look like.

    The API-First Rule

    Using Claude in Chrome to operate a web UI is slower and less reliable than using an API. UI layouts change. Buttons get renamed. A platform update can break a workflow that worked yesterday.

    The rule that holds up in practice: API first, Chrome when the API fails or doesn’t exist.

    If a tool you use regularly exposes the action you need through an API, build the API integration and use that. Chrome UI automation is the fallback — valuable and often the only option, but a fallback nonetheless. Don’t default to Chrome just because it’s faster to set up today.

    ⚠️ Don’t leave Claude in Chrome running on high-stakes UI actions without reviewing first. If your work order includes steps like submitting a payment form, publishing content publicly, deleting records, or sending a message — review the work order carefully before Claude executes it, and stay present during execution. UI actions in Claude in Chrome are real. There is no undo button built in.

    When the Work Order Approach Doesn’t Work Well

    A few situations where the Claude Chat → Chrome hand-off runs into friction:

    • Dynamic UIs with inconsistent layouts. If the UI renders differently based on account state, screen size, or A/B tests, Chrome may not find the element the work order described.
    • Multi-factor authentication prompts. If a service triggers MFA mid-session, Chrome will stall waiting for input. You need to be present to handle it.
    • Very long multi-step tasks. The longer the chain of actions, the more likely something unexpected will interrupt it. For long tasks, build in manual check points rather than treating the whole thing as one uninterrupted run.
    • Anything involving CAPTCHA. Chrome cannot solve CAPTCHAs. Tasks that require CAPTCHA completion need manual intervention at that step.

    Frequently Asked Questions

    Does Claude in Chrome work with any website?

    It works with any website loaded in Chrome where you have the appropriate access. The extension interacts with the live DOM of whatever page is open. Some sites use security measures that prevent external scripts from interacting with certain elements, which can limit what Claude can click or read on those pages.

    Can Claude in Chrome interact with pop-up windows or modal dialogs?

    Yes, in most cases. Pop-ups and modals that are part of the page’s DOM are accessible. Browser-level dialogs (like the native file picker or browser alert boxes) have more limited interaction.

    What if the UI changes and Claude can’t find an element?

    Claude in Chrome will report that it couldn’t find the element and stop. It won’t guess or click something random. You’ll need to update the work order to reflect the current UI, or manually navigate to the right state and then reconnect.

    Is there a risk of Claude submitting forms I don’t want submitted?

    Yes, if the work order includes a form submission step. Always review work orders that include submit, confirm, send, or delete actions before execution. If you’re uncertain, break the work order into stages and review what Claude has done before authorizing the next stage.

    Can I use Claude in Chrome for a tool I use for work with sensitive data?

    Use judgment. Claude in Chrome processes what it sees in the browser tab, and the content of that interaction is processed by Anthropic’s systems under your account’s privacy settings. Review Anthropic’s privacy policy for your plan before using Claude in Chrome with tools containing confidential, regulated, or personally identifiable information.

  • Claude in Chrome vs Cowork Computer Use — What’s the Difference

    Claude in Chrome vs Cowork Computer Use — What’s the Difference

    The Short Version
    Claude in Chrome = browser only, any plan, you stay present. Cowork computer use = full desktop, scheduled, unattended, Pro or Max required. They solve different problems. The confusion comes from using the word “automation” for both.

    If you’ve tried Claude in Chrome and also explored Cowork’s computer use feature, you’ve probably noticed they feel completely different — even though both involve Claude “doing things” on a computer. That’s because they are fundamentally different tools, with different scope, different risk levels, and different use cases.

    This comparison is built from documented use of both. Not marketing copy.

    The Core Difference: Browser vs. Desktop

    Claude in Chrome operates exclusively inside the Chrome browser. It can read pages, click elements, fill forms, scroll, download files, and navigate between open tabs. That’s it. It has no awareness of your desktop, no access to your filesystem, and no ability to open applications outside the browser.

    Cowork computer use operates at the full desktop level. It can see and interact with any application on your machine — your file manager, terminal, spreadsheet software, desktop apps, system utilities. It treats your entire computer as its workspace.

    The practical difference: if you close Chrome, Claude in Chrome stops. If you close Chrome while Cowork computer use is running, Cowork keeps going in other applications.

    Scheduling and Presence

    Feature Claude in Chrome Cowork Computer Use
    Scope Browser only Full desktop
    Can run scheduled / unattended No Yes
    Requires you to be present Yes No (once configured)
    Available on free plan Yes No
    Requires Pro or Max No Yes
    Access to filesystem No Yes
    Can open desktop applications No Yes
    Connection method Manual click to connect Configured per task

    When Chrome Is the Right Tool

    Claude in Chrome is the better choice when:

    • The tool you’re working with is entirely browser-based and has no API (or an API that doesn’t expose what you need)
    • You want to work alongside Claude in real time — you’re co-piloting, not delegating
    • The task is one-off or occasional, not something you need to run on a schedule
    • You want Claude to interact with a logged-in browser session that you control
    • You’re on any Claude plan and don’t have access to Cowork computer use
    ⚠️ Stay present with Chrome. Claude in Chrome is not designed for unattended use. If Claude clicks something unexpected or a form submits mid-session, you need to be there to intervene. This isn’t a limitation you can safely work around by walking away — it’s the intended operating model.

    When Cowork Computer Use Is the Right Tool

    Cowork computer use is the better choice when:

    • The task needs to repeat on a schedule — daily, every few hours, weekly
    • The task spans multiple applications (browser plus desktop app plus filesystem)
    • You want it to run without you being present
    • The task involves file operations — reading, writing, moving, processing local files
    • You need multi-step pipelines that chain browser actions with non-browser actions
    ⚠️ Unattended computer use has a wider blast radius. When Cowork computer use runs a scheduled task, it has access to your full desktop — including applications, files, and anything else open on your machine. A misconfigured task or an unexpected UI change on a target website can cause Claude to interact with things it wasn’t supposed to. Review what’s open on your machine before scheduling unattended runs, and test new tasks manually before letting them run on a schedule.

    They Can Work Together

    One pattern that works well in practice: Claude Chat writes the instructions, Claude in Chrome executes the browser-side steps. Cowork handles the scheduled, recurring, multi-app pieces.

    Think of it as a three-tier model. Claude Chat is strategy and orchestration. Claude in Chrome is the field operator for browser-native tasks that require a logged-in session or a UI that has no API. Cowork is the autonomous layer for scheduled, repeating, multi-system work.

    A task that’s “too small for Cowork but too tedious to do manually” is usually a Claude in Chrome task. A task that runs every night at 11pm is usually a Cowork task. Most workflows eventually use all three.

    The Decision Rule

    One question resolves most cases: do you need it to run while you’re asleep?

    If yes — Cowork computer use (Pro or Max required).
    If no — Claude in Chrome, from any plan, with you present.

    Frequently Asked Questions

    Can I use Claude in Chrome instead of Cowork computer use to save money?

    For one-off browser tasks, yes — Claude in Chrome is available on all plans and covers a meaningful range of browser automation. But it can’t replace Cowork computer use for scheduled tasks, unattended runs, or anything that requires filesystem or desktop application access.

    Does Claude in Chrome work inside a Cowork session?

    They’re separate features. Claude in Chrome is a browser extension that works in claude.ai chat sessions. Cowork computer use is a separate capability within the Cowork product. They don’t directly compose with each other, though you can use both in complementary workflows.

    Is Cowork computer use riskier than Claude in Chrome?

    The surface area is larger with Cowork computer use because it has access to your full desktop, not just the browser. Whether that translates to more risk depends entirely on how you configure and test your tasks. Well-tested Cowork tasks running on a focused setup can be lower risk than an untested Claude in Chrome session with sensitive tabs open. The tool isn’t the risk — how you set it up is.

    Can Claude in Chrome run overnight or on a schedule?

    No. Claude in Chrome requires an active chat session and a manual connection per session. It is not designed for scheduled or unattended use. For overnight or scheduled automation, you need Cowork computer use.

    Which one should I start with?

    If you’re new to both, start with Claude in Chrome. It’s available on all plans, the blast radius is limited to your browser, and you stay in the loop during every session. Once you’re comfortable with how Claude navigates browser-based tools, you’ll have a much better sense of whether Cowork’s scheduled automation is worth setting up for your specific workflows.

    Related: How Claude Cowork Can Actually Train Your Staff to Think Better — a 7-part series on using Cowork as a training tool across industries.

  • What Is Claude in Chrome and How Does It Actually Work

    What Is Claude in Chrome and How Does It Actually Work

    Claude in Chrome — Quick Definition
    Claude in Chrome is a browser extension that gives Claude direct control over your active Chrome tab. It can read page content, click buttons, fill forms, scroll, and download files — all inside the browser, without touching your desktop or filesystem.

    There are now three distinct ways to work with Claude at the task level: through the chat interface, through Claude Cowork, and through Claude in Chrome. Most people know the first two. The third one is genuinely different, and genuinely useful — and most people writing about Claude haven’t actually used it yet.

    This article is built from documented operational use. Not theory.

    What Claude in Chrome Actually Is

    Claude in Chrome is a browser extension — separate from claude.ai, separate from Cowork — that connects Claude to your active Chrome tab. Once the extension is installed and connected, Claude gains a set of browser-native tools it doesn’t have in a standard chat session.

    Those tools include:

    • Reading page content — Claude can see what’s on the current tab, including text, links, form fields, and interactive elements
    • Clicking — Claude can click buttons, links, checkboxes, and UI controls
    • Filling forms — Claude can type into text fields, dropdowns, and inputs
    • Scrolling — Claude can scroll a page to load more content or navigate to a section
    • Downloading files — Claude can trigger downloads from web interfaces
    • Navigating — Claude can move between tabs that are open in the connected profile
    ⚠️ Before you experiment: When Claude has browser control, it can interact with any tab in the connected Chrome profile — including tabs where you’re logged in to banking, email, or other sensitive services. Before running any Claude in Chrome session, close or move tabs you don’t want Claude to have access to. Pre-login only to the services you intend to use in that session.

    What Claude in Chrome Is Not

    It’s worth being precise here, because there’s real confusion between Claude in Chrome and Claude Cowork’s computer use feature.

    Claude in Chrome is browser-only. It operates inside Chrome. It cannot access your filesystem, run terminal commands, open desktop applications, or do anything outside a browser window. If you need Claude to interact with files on your computer or run code locally, that’s a different tool entirely.

    Claude Cowork computer use is full-desktop. Cowork’s computer use feature gives Claude access to your entire desktop environment — applications, filesystem, terminal, everything. It’s also scheduled and can run unattended. That’s a much larger surface area.

    The comparison matters because the risk profile is different. Browser-only means the blast radius of any mistake is limited to what’s accessible through Chrome. Full computer use is a fundamentally different level of access. More on this comparison in the full breakdown article.

    How the Connection Works

    Claude in Chrome uses a tool called switch_browser. When Claude calls this tool, it broadcasts a connection request to all Chrome instances that have the extension installed. A small prompt appears in the browser — you click Connect — and Claude is now operating in that Chrome profile.

    A few things to understand about how this works in practice:

    • One profile at a time. Claude connects to one Chrome profile per session. If you have multiple Chrome profiles open, the connection goes to whichever one you click Connect in.
    • The extension must be installed on each profile separately. Chrome profiles are isolated environments. Installing the extension in one profile doesn’t propagate it to others.
    • The connection requires a manual click. This is intentional friction — Claude can’t silently connect to a Chrome profile without your action. You will always know when Claude is taking browser control.
    • Once connected, Claude can navigate between open tabs freely within that profile.
    ⚠️ Don’t walk away during a session. Claude in Chrome is designed for working with a human present. If Claude navigates to a tab where you’re logged in to a web app and something goes wrong — a form submits, an action fires — you need to be there to catch it. This is different from Cowork scheduled tasks, which are designed to run unattended. Treat Claude in Chrome sessions like you’re co-piloting, not delegating.

    What It’s Useful For

    Claude in Chrome’s sweet spot is situations where there’s no API. A lot of useful web tools — dashboards, admin panels, third-party platforms — don’t offer an API, or their API is locked behind an enterprise plan, or the specific action you need isn’t exposed via API even if the tool has one.

    In documented use, Claude in Chrome has been used to:

    • Navigate cloud console interfaces that require clicking through menus
    • Interact with domain registrar admin panels to update DNS settings
    • Operate social media scheduling tools through their web UI when the API doesn’t expose the specific feature needed
    • Use web-based terminal environments where copy/paste would be the alternative
    • Run automated notebook workflows in browser-based AI tools — creating notebooks, adding sources, triggering generation, downloading output

    The pattern is consistent: API first, Chrome when the API doesn’t exist or is blocked. Chrome is the fallback, not the default. But it’s a very capable fallback.

    Available on All Claude Plans

    One thing that surprises people: Claude in Chrome is available to all Claude subscribers, not just Pro or Max. This is different from Cowork computer use, which requires Pro or Max.

    If you’re on a free plan, you can still install the extension and use browser control in your chat sessions. The session limits of your plan still apply, but the capability itself isn’t gated.

    The Right Mental Model

    The cleanest way to think about Claude in Chrome: it’s Claude with a mouse and keyboard, but only inside the browser, and only when you hand it control.

    That framing clarifies both the power and the limits. It’s not autonomous. It doesn’t run in the background. It doesn’t have memory of previous browser sessions. Every connection is a deliberate, per-session handoff. You stay in the loop.

    When you need Claude to do something in a browser-based tool and you’re willing to be present while it runs — Claude in Chrome is the right tool. When you need scheduled, unattended, multi-application automation — that’s Cowork territory.

    Frequently Asked Questions

    Do I need a paid Claude plan to use Claude in Chrome?

    No. Claude in Chrome is available on all Claude plans, including free. You’ll still be subject to your plan’s message limits, but the browser control capability itself is not restricted to paid tiers.

    Can Claude in Chrome access my files or run programs on my computer?

    No. Claude in Chrome operates only inside the Chrome browser. It cannot access your filesystem, open desktop applications, or run terminal commands. If you need Claude to interact with files or run code locally, you’re looking for a different tool.

    Is it safe to use Claude in Chrome while logged in to sensitive accounts?

    Use caution. When Claude in Chrome is connected to a Chrome profile, it can see and interact with all open tabs in that profile — including any tabs where you’re logged in to banking, email, or other sensitive services. Best practice is to pre-close tabs you don’t want Claude to have access to before starting a session, and to stay present during the session.

    Can Claude connect to Chrome automatically without me doing anything?

    No. Every connection requires a manual click. When Claude calls the switch_browser tool, a Connect prompt appears in the browser — you have to click it. Claude cannot silently establish a browser connection without your action.

    What’s the difference between Claude in Chrome and Claude Cowork computer use?

    Claude in Chrome is browser-only, works in any chat session, and is available on all plans. Cowork computer use gives Claude access to your entire desktop — applications, filesystem, terminal — and can run scheduled, unattended tasks. It requires a Pro or Max subscription. The choice depends on what you’re trying to automate and whether you need to be present.

    What happens if I close a Chrome tab while Claude in Chrome is using it?

    Claude will lose access to that tab. If the tab was part of an active task — for example, a browser-based notebook generating output — the task will fail or stall. You’ll need to reopen the tab, reconnect the extension, and restart the relevant step. It’s one of the reasons Claude in Chrome is designed for sessions where you stay present.