Tag: Claude Anthropic

  • The Exit Protocol: The Section of Your Digital Life You Haven’t Written Yet

    The Exit Protocol: The Section of Your Digital Life You Haven’t Written Yet

    Every tool you enter, you will someday leave. Most operators don’t plan the exit until the exit is already happening. This is the protocol written before the catastrophe, not after.

    Target keyword: digital exit protocol Secondary: tool exit strategy, digital legacy planning, AI tool offboarding, operator continuity planning Categories: AI Hygiene, AI Strategy, Notion Tags: exit-protocol, ai-hygiene, operator-playbook, continuity, digital-legacy


    Every tool you enter, you will someday leave.

    You don’t know which exit you’ll face first. The breach that ends a Tuesday. The policy change that ends a vendor relationship in thirty days. The voluntary migration to something better. The one nobody plans for — the terminal one, where you’re gone or incapacitated and someone else has to figure out how your digital life was organized.

    The cheapest time to plan any of those exits is at the moment of entry. The most expensive time is the moment the exit is already underway.

    Most operators never write this section of their digital life. They enter tools. They stack data. They accumulate credentials. They build automations that depend on twelve other automations that depend on accounts they don’t remember creating. And if you asked them today, “if this specific tool vanished tomorrow, what happens?” — the honest answer is usually I don’t know, I’ve never looked.

    That’s the section this article is about. The exit protocol. The will-and-testament layer of digital life, written before the catastrophe rather than after.

    I’m going to describe the four exits every operator faces, the runbook for each, and the pre-entry checklist that keeps the whole stack from becoming a trap you can’t get out of. None of this is theoretical — it’s the protocol I actually run, cleaned up enough to be useful to someone else building their own version.


    Why this matters more in 2026 than it did in 2020

    For most of the personal-computing era, “exit” meant closing a browser tab. You used a tool, you were done, you left. The consequences of not planning the exit were small because the surface was small.

    That’s not the shape of digital life in 2026. The operator running a real business now sits on top of a stack that typically includes:

    • A knowledge workspace (Notion, Obsidian, or similar) holding years of operational state
    • An AI layer (Claude, ChatGPT, or similar) with memory, projects, and connections to your workspace
    • A cloud provider account running compute, storage, and services
    • Web properties with published content and user data
    • Scheduling, CRM, and communication tools with their own data stores
    • A password manager sitting behind all of it
    • An identity root (usually a Google or Apple account) holding the keys

    Any one of these can end. By breach. By policy change. By price increase you can’t absorb. By vendor shutdown. By personal rupture that isn’t business at all. By death, which is the scenario nobody wants to write about and exactly the one that makes the planning most valuable.

    And every piece is entangled with the pieces above and below it. Your Notion workspace references your Gmail. Your Gmail authenticates your cloud provider. Your cloud provider runs the services your web properties depend on. Your password manager holds the recovery codes for everything. The stack is a single living system with many failure modes, and the only version of “exit planning” that works is the one that treats the stack as a whole.


    The seven questions

    Before you can plan an exit, you need to be able to answer seven questions about every tool in your stack. If you can’t answer them, the exit plan is a fiction.

    1. What lives there? Data, credentials, intellectual property. Not “everything” — specifically, what is in this tool that doesn’t exist anywhere else?

    2. Who else has access? Human collaborators. Service accounts. OAuth connections. API keys you gave out and forgot about. Every form of access is a potential inheritance path.

    3. How does it get out? The export surface. Format. Cadence. Whether the export includes everything or just some things. Whether the export requires the UI or has an API.

    4. What deletes on what trigger? Vendor retention policies. Your own rotation schedule. End-of-engagement deletion for client work. What happens to data if you stop paying.

    5. Who inherits what? Family. Team. Clients. The answer is usually “nobody, by default” — and that default is the whole problem.

    6. How do downstream systems keep working? If this tool ends, what else breaks? What continuity can be preserved without handing over live credentials to somebody who shouldn’t have them?

    7. How do I know the exit still works? Drill cadence. When was the last time you actually exported the data and opened the export on a clean machine to verify it was intact?

    If you answer these seven questions for every tool in your stack, you will find things that surprise you. Credentials that have been in live rotation for three years. Tools whose “export” button produces a file that can’t be opened by anything else. Dependencies on your Gmail that would make inheritance a nightmare. That’s fine — finding those things is the point. You can’t fix what you haven’t looked at.


    The four exit scenarios

    Every exit fits into one of four shapes. The shape determines the runbook. Getting this taxonomy right is what lets the rest of the protocol be specific.

    Sudden: breach or compromise

    The credential leaked. The account got taken over. A vendor breach exposed data you didn’t know was even there. Minutes matter. The goal is to contain the damage, not to plan the migration.

    Forced: policy or shutdown

    The vendor killed the product. The terms changed in a way you can’t live with. The price went up by an order of magnitude. Days to weeks, usually. The goal is to export cleanly and migrate to a successor before the window closes.

    Terminal: death or incapacity

    You are gone or can’t operate. Someone else has to keep things running or wind them down cleanly. This is the scenario most operators never plan for, and it’s the one with the highest cost if the plan doesn’t exist.

    Voluntary: better option or done

    You chose to leave. Migration to a new tool. End of a client engagement. Lifestyle change. Weeks to months of runway. The goal is a clean handoff with no orphan state left behind.

    Each of these has its own runbook. Running the wrong one for the situation is a common failure — treating a forced shutdown like a voluntary migration wastes the window; treating a breach like a forced shutdown fails to contain the damage.


    Runbook: Sudden

    The situation is: something leaked or got taken over. You find out either because a monitoring alert fired or because something visibly broke. Either way, the clock started before you noticed.

    1. Contain. Pull the compromised credential immediately. Rotate the key. Revoke every token you issued through that credential. Sign out of every active session. This is the first ten minutes.

    2. Scope. List every system the credential touched in the last thirty days. Assume the blast radius is wider than it looks — adjacent systems often share trust in ways you forgot about. The goal is to understand what the attacker could have done, not just what they did do.

    3. Notify. If client or customer data is in scope, notify according to your contracts and any applicable law. Today, not tomorrow. Breach disclosure windows are tight and getting tighter; the legal risk of delay is usually worse than the embarrassment of early notification.

    4. Rebuild. Issue a new credential. Scope it to minimum permissions. Never restore the old credential — the temptation to “reuse it once we figure out what happened” is how re-compromise works.

    5. Postmortem. Write it the same week. Not a blameless postmortem for PR purposes; a real one, for your own internal knowledge. What was the failure mode? What signal did you miss? What changes to the protocol would have caught it earlier? The postmortem is the only way the Sudden scenario makes the rest of the stack safer instead of just more anxious.


    Runbook: Forced

    A vendor is shutting down the product, changing the terms in an unacceptable way, or pricing you out. You have some window of runway — days to weeks — before the tool goes dark.

    1. Triage. How long until the tool goes dark? What is the critical-path data — the stuff that doesn’t exist anywhere else? Separate that from everything else.

    2. Export. Run the full export immediately, even before you’ve decided what to migrate to. A cold archive is cheap; a missed export window is permanent. This is the most common failure mode of the Forced scenario — operators wait until they’ve chosen a successor before exporting, and the window closes.

    3. Verify. Open the export on a clean machine. Not the one you usually work on. A clean machine, with no existing context, so you can confirm that the export is actually usable without the source system. Many “export” features produce files that look complete but reference data that only exists in the source system.

    4. Choose a successor. Match on data shape, not feature list. The data is the asset; the UI is rentable. A successor tool that imports your data cleanly but doesn’t have every feature you liked is a better choice than one with more features and a lossy import path.

    5. Cutover. Migrate. Run both systems in parallel for one full operational cycle. Then decommission the old one. The parallel cycle is where you discover what the export missed.


    Runbook: Terminal

    This is the runbook most operators never write. Writing it is the whole point of this article.

    If you are gone or can’t operate, someone else needs to know: what’s running, who depends on it, and how to either keep things going or wind them down cleanly. The default state — no plan — is a nightmare for whoever inherits the problem.

    The Terminal runbook has five components, and each one can be written in an evening. Don’t let the scope of the topic talk you out of writing the simple version now.

    Primary steward. One named person who becomes the point of contact if you can’t operate. Usually a spouse, partner, or trusted family member. They don’t need to understand how the stack works. They need to know where the instructions are and who the operational steward is.

    Operational steward. A named professional who can keep systems running during the transition. For technical infrastructure, this is typically a trusted developer or consultant who already knows your stack. For legal and financial, this is an attorney and accountant. Name them. Have the conversation with them before you need it.

    What the primary steward gets immediately. A one-page document describing the situation. Access to a password manager recovery kit. A list of active clients and the minimum needed to pause operations gracefully. Contact information for the operational steward. Nothing more than this. Specifically, they do not get live admin credentials to client systems, live cloud provider keys, or live AI project memory — those are inheritance paths that go through the operational steward or the attorney, not into a drawer.

    Trigger documents. A signed letter of instruction, stored with the attorney and copied to a trusted location at home. It references the operational runbook by URL or location. It names who is authorized to do what, under what conditions, for how long.

    Digital legacy settings. Most major platforms have inactive-account or legacy-contact features built in. Configure them. Google has Inactive Account Manager. Apple has Legacy Contact. Notion has workspace admin inheritance. Configuring these is fifteen minutes per platform and they do real work when they’re needed.

    Crucial: do not store live credentials in a will. Wills become public record in probate. The recovery path is a letter of instruction pointing at a password manager whose emergency kit is held by a trusted professional, not credentials written into a legal document.


    Runbook: Voluntary

    You chose to leave. Good. This is the least stressful exit because you have runway, you chose the timing, and the data is not under siege.

    1. Announce the exit window. To yourself. To your team. To any client whose work touches this tool. Set a specific date and commit to it.

    2. Freeze net-new. Stop adding data to the system being retired. New data goes to the successor; old data stays put until migration.

    3. Export and verify. Same as the Forced runbook. Full export, clean machine, integrity check.

    4. Migrate. Move data to the successor. Re-point automations, integrations, and any external references. Update documentation and internal links.

    5. Archive. Keep a cold copy of the old system’s export in durable storage, labeled with the exit date. Do not delete the original account for at least ninety days. Things you forgot about will surface during that window and you will want the ability to recover them.

    6. Decommission. Revoke remaining keys. Cancel billing. Close the account. Remove the tool from your password manager. Update any documentation that still mentioned it.


    The drill cadence (the thing that actually makes the protocol real)

    A protocol nobody practices is a protocol that doesn’t exist. The only way to know your exit plan works is to test it, repeatedly, on a schedule that makes failures cheap.

    Quarterly — thirty minutes. Pick one tool. Run its export. Open the export on a clean machine. Log the result. If the export is broken, fix it now, while there’s no emergency. Thirty minutes, four times a year. That’s two hours of investment to know your stack is actually recoverable.

    Semi-annual — two hours. Rotate every credential in the stack. Prune AI memory down to what’s actually load-bearing. Re-read the exit protocol end-to-end and update anything that’s drifted out of date. The credential rotation alone catches more problems than any other single practice in the hygiene layer.

    Annual — half a day. Run a full Terminal scenario dry run. Sit with your primary steward. Walk through the letter of instruction. Verify that your attorney has the current version. Update the digital legacy settings on every major platform. Confirm that the operational steward is still willing and available.

    These cadences add up to roughly eight hours of exit-related work per year. Eight hours against the cost of a stack that could otherwise catastrophically collapse on the worst day of your life. It’s a trade you want to make.


    The pre-entry checklist

    The most important protocol move is the one that happens before the tool enters the stack at all. Every new tool you adopt creates an exit you’ll eventually need. Planning it at entry is radically cheaper than planning it in crisis.

    Before adopting any new tool, answer these questions:

    What is the export format, and have you opened a sample export? If the vendor doesn’t offer export, or the export is a proprietary format nothing else reads, the tool is a data trap. Accept the tradeoff knowingly or pick a different tool.

    Is there an API that would let you back up without the UI? UI-only exports scale poorly. An API you can call on a schedule gives you durable backup without depending on the vendor to maintain the export feature.

    What is the vendor’s retention and deletion policy? How long does data stick around after you stop paying? What happens to the data if the vendor is acquired? What’s their policy on third-party data processing?

    What credentials or tokens will this tool hold, and where do they rotate? A tool that holds an OAuth token to your primary email is a very different risk profile from one that holds only its own password. Inventory the credentials at entry.

    If the vendor raises the price ten times, what is your Plan B? This question sounds paranoid. Vendors raise prices tenfold more often than you’d expect. Having a Plan B in mind at entry is very different from scrambling for one at the three-week mark of a forced migration.

    If you died tomorrow, how would someone downstream keep this working or shut it down cleanly? If the answer is “they couldn’t,” you haven’t finished adopting the tool. Keep this in mind particularly for anything where you’re the only person with access.

    Does this tool belong in your knowledge workspace, your compute layer, or neither? Not every new tool earns a place in the stack. Some are better rented briefly for a specific project and then left behind. The pre-entry moment is when you decide which tier this tool lives in.

    Seven questions. Fifteen minutes of thinking. The return on those fifteen minutes is everything you don’t have to untangle later.


    What this protocol is not

    Three clarifications to close the frame correctly.

    This isn’t paranoid. It’s ordinary due diligence applied to a category of risk that most operators have not caught up to yet. Every legal entity has a wind-down plan. Every serious business has a disaster recovery plan. The digital life of a one-human operator running a real business has the same obligations; it just hasn’t had them named before.

    This isn’t purely defensive. The exit protocol produces upside beyond catastrophe avoidance. The discipline of knowing what’s in every tool, who has access, and how to get data out makes the whole stack more coherent. Operators who run this protocol find themselves making cleaner choices about new tools, which means less sprawl, which means less hygiene debt. The protocol pays rent every month, not just when things break.

    This isn’t a one-time project. It’s a standing practice. The stack changes. Tools enter. Tools leave. Credentials rotate. Family situations evolve. The protocol is never finished; it’s maintained. That’s why the drill cadence matters. The one-time-project version of this decays into fiction within a year. The standing-practice version stays alive because it gets touched regularly.


    The one thing I’d want you to walk away with

    One sentence. If you only remember one, let it be this:

    Every tool you enter, you will someday leave — and the cheapest time to plan the leaving is at entry.

    If that sentence changes how you approach the next tool you consider adopting, it changed the shape of your stack. Not in a dramatic way. In the small, compounding way that good hygiene always works.

    The operators I know who have survived the roughest exits — the breaches, the vendor shutdowns, the personal emergencies — all share one thing in common. They planned the exit before they needed it. Not because they expected the catastrophe. Because they understood that the exit was coming, eventually, in some form, for every single thing they’d built, and that planning it in calm was radically cheaper than planning it in crisis.

    The exit is coming. For every tool. For every account. For every service. For every credential. Eventually.

    Plan it now.


    FAQ

    What’s the most important piece of this protocol if I only have an hour to spend?

    Write the one-page Terminal scenario letter. Name your primary steward. Name your operational steward. Put the password manager emergency kit in a place they can find. That one hour, invested now, is the highest-leverage thing in the entire protocol.

    I’m a solo operator with no family. Does the Terminal runbook still apply?

    Yes, and it’s more important for you than for operators with a family who would step in by default. You need an operational steward — a professional or trusted peer — who can wind things down if you can’t. Without that named person, client work will orphan in a way that creates real harm for people who depended on you.

    How often should I rotate credentials?

    Every six months at a minimum for anything load-bearing, immediately on any suspected compromise, and whenever someone with access leaves a collaboration. The Quarterly drill cadence catches stale credentials on a regular rhythm; full rotation on Semi-annual catches the long-tail.

    What about AI-specific exits — Claude, ChatGPT, Notion’s AI?

    Treat AI memory as a liability to be pruned, not an asset to be preserved. Export what’s genuinely valuable (artifacts, specific conversations you want as reference), then prune aggressively. AI memory that sits around accumulating is increasing your blast radius in every other exit scenario. The hygiene move is minimal memory, not maximum memory.

    Do I need an attorney for this?

    For the Terminal scenario specifically, yes. The letter of instruction and any trigger documents that grant authority in your absence are legal documents and should be reviewed by a professional. The rest of the protocol (exports, credential rotation, drill cadence) doesn’t need legal help.

    What about my password manager? What happens if I lose access to it?

    Every major password manager has an emergency access feature — a trusted contact who can request access to your vault after a waiting period. Configure it. It’s the single most important configuration item in the entire protocol, because the password manager is the root of recovery for everything else.

    How do I know when my export is actually complete?

    Open it on a different machine, in a different tool, and try to answer three specific questions using only the export: “What was the state of X project?”, “Who had access to Y?”, “When did Z happen?” If you can answer all three, the export is usable. If any question requires reaching back to the source system, the export is incomplete.

    What if my spouse or partner isn’t technical? Can they still be the primary steward?

    Yes. The primary steward’s job is not to operate the systems. Their job is to know where the instructions are and who to call. If you write the operational runbook clearly enough that a non-technical person can follow it to the operational steward, the division of responsibility works.


    Closing note

    The section of your digital life you haven’t written yet is the exit. Almost nobody writes it until they need it, and the moment you need it is the worst moment to write it.

    Write it now, in calm, with time to think. Don’t try to write it perfectly. A rough version that exists is infinitely better than a perfect version that doesn’t. The drill cadence will improve the rough version over years; the blank document never improves at all.

    If this article leads you to spend a single evening on a single runbook — even just the Terminal scenario, even just the one-page letter to your primary steward — it has done its job. The rest of the protocol can build from there.

    Every tool you enter, you will someday leave. Leave on purpose.


    Sources and further reading

    Related pieces from this body of work:

    On the Terminal scenario specifically, the Google Inactive Account Manager and Apple Legacy Contact features are both worth configuring today. Fifteen minutes apiece. Search your account settings for “inactive” or “legacy.”

  • What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

    What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

    Last refreshed: May 15, 2026

    Update — May 15, 2026: On May 13, 2026, Notion shipped the Notion Developer Platform (version 3.5), with Claude as a launch partner. The platform adds Workers, database sync, an External Agents API, and a Notion CLI. The patterns described in this article still work, but there is now a native, sanctioned alternative for some of what previously required custom MCP wiring or third-party automation. For the full breakdown of what changed and what it means for the Notion + Claude stack, see Notion Developer Platform Launch (May 13, 2026). For the underlying operating philosophy, see The Three-Legged Stack.

    I run both Notion Custom Agents and Claude every working day. I have opinions about when each one earns its place and when each one doesn’t. This article is those opinions, named clearly, with no vendor fingers on the scale.

    Most comparative writing about AI tools is written by people with an incentive to recommend one over the other — affiliate programs, platform partnerships, the writer’s own consulting practice specializing in one side. This piece doesn’t have that problem. I use both, I pay for both, and if one of them got replaced tomorrow, the pattern I run would survive with a different tool slotted into the same role. The tools are interchangeable. The judgment about which one to reach for is not.

    Here’s the honest map.


    The short version

    Use Notion Custom Agents when: the work is a recurring rhythm, the context lives in Notion, the output is a Notion page or database change, and you’re willing to spend credits on it running in the background.

    Use Claude when: the work needs real judgment, the context is complex or contested, the output is something that needs a human’s voice and review, or the workflow crosses enough systems that the agent’s world is too small.

    Those two sentences will save most operators ninety percent of the architecture mistakes I see people make. The rest of this article is specificity about why, because general rules only take you so far before you need to know what’s actually going on under the hood.


    Where Notion Custom Agents genuinely shine

    I’m going to start with the positive because anyone who only reads the critical part of a comparative article will walk away with a warped picture. Custom Agents are genuinely impressive when they fit the job.

    Recurring synthesis tasks across workspace data. The daily brief pattern I’ve written about works better in a Custom Agent than in Claude. The agent runs on schedule, reads the right pages, writes the synthesis back into the workspace, and is done. Claude can do this too, but Custom Agents do it without you remembering to prompt them. That’s the whole point of the “autonomous teammate” framing, and for rhythmic synthesis work, it genuinely delivers.

    Inbox triage. An agent watching a database with a clear decision tree — categorize incoming requests, assign a priority, route to the right owner — is a sweet-spot Custom Agent. It does the boring sort every day, flags the ones it’s unsure about, and keeps the pile from growing. Real teams are reportedly triaging at over 95% accuracy on inbound tickets with this pattern.

    Q&A over workspace knowledge. Agents that answer company policy questions in Slack or provide onboarding guidance for new hires are quietly some of the most valuable agents in production. They replace hours of repetitive answer-the-same-question work, and because the answers come from actual workspace content, the accuracy is high when the workspace is well-maintained.

    Database enrichment. An agent that watches for new rows in a database, looks up additional context, and fills in fields automatically is a beautiful fit. The agent is doing deterministic-adjacent work with just enough judgment to handle edge cases. This is exactly what Custom Agents were designed for.

    Autonomous reporting. Weekly sprint recaps, monthly OKR reports, Friday retrospectives. Reports that would otherwise require someone to sit down and write them, now drafted automatically from the workspace state.

    For these categories, Custom Agents are the right tool, and Claude is the wrong tool even though Claude would technically work. The wrong-tool-even-though-it-works framing matters because operators often default to Claude for everything, which is expensive in different ways.


    Where Notion Custom Agents break down

    Now the honest part. Custom Agents have real limits, and pretending otherwise is how operators get burned.

    1. Anything that requires serious reasoning across contested information

    Custom Agents are capable of synthesis, but the quality of their synthesis degrades when the inputs disagree with each other, when the right answer isn’t on the page, or when the task requires actually thinking through a problem rather than summarizing existing context.

    The signal that you’ve hit this limit: the agent produces an output that sounds plausible, reads well, and is subtly wrong. If you need to double-check every agent output in a category of work because you can’t trust the judgment, that category of work shouldn’t be going through an agent. Use Claude in a conversation where you can actually interrogate the reasoning.

    Specific examples where this shows up: strategic decisions, conflicting client feedback, legal or compliance-adjacent questions, anything that involves weighing tradeoffs. The agent will produce an answer. The answer will often be wrong in a specific way.

    2. Long-horizon work that needs to hold nuance across steps

    Custom Agents are designed for bounded tasks with clear inputs and clear outputs. When you try to use them for work that requires holding nuance across many steps — drafting a long document, executing a multi-stage strategic plan, navigating a complex workflow — the wheels come off.

    Part of this is architectural: agents have limited ability to carry state across runs in the way an extended Claude conversation can. Part of it is practical: the “one agent, one job” principle Notion itself recommends is a hard constraint, not a style guideline. When you try to make an agent do multiple things, you get an agent that does each of them worse than a single-purpose agent would.

    If the job you’re thinking about is genuinely one coherent thing that happens to have many steps, and the steps inform each other, it’s probably a Claude conversation, not a Custom Agent.

    3. Work that needs a specific human voice

    This one is more important than most operators realize. Agents write in a synthesized style. It’s a perfectly fine style. It’s also recognizable as a perfectly fine style, which is the problem.

    If the output is going to have your name on it — client communications, thought leadership, outbound that should sound like you — the agent’s default voice will flatten whatever was distinctive about your writing. You can push back on this with instructions, and good instructions help a lot. But the underlying truth is that Custom Agents optimize for “sounds like a competent business writer,” and competent business writing is a commodity. If you sell distinctiveness, the agent is a liability.

    Claude in a conversation, with your active voice-shaping, produces writing that can actually sound like you. Custom Agents optimize for a different thing.

    4. Anything requiring real-time web context

    Custom Agents can reach external tools via MCP, but they don’t have a general ability to browse the live web and integrate what they find into their reasoning. If the work requires recent news, real-time market data, or anything that isn’t in a known database the agent can query, the agent will either fail, hallucinate, or return stale information from whatever workspace snapshot it had.

    Claude — with web search enabled, with the ability to fetch arbitrary URLs, with research capabilities — handles this class of work dramatically better. The right architectural response: use Claude for anything with a live-web dependency, let Custom Agents handle the parts that don’t.

    5. Deep technical work

    Custom Agents can technically do technical work. They should mostly not be asked to. Writing code, debugging failures, analyzing logs, reasoning through system architecture — these live in Claude Code’s territory, not Custom Agents’ territory. The Custom Agent framework was built for operational workflows, and while it will attempt technical tasks, it attempts them at the quality of a generalist, not a specialist.

    The sign you’ve crossed this line: the agent is producing code or technical reasoning that a competent human reviewer would push back on. Move the work to Claude Code, which was built for exactly this.

    6. High-stakes writes with permanent consequences

    Agents execute. They don’t second-guess themselves. An agent configured to send emails will send emails. An agent configured to update client records will update client records. An agent configured to delete rows will delete rows.

    When the cost of the agent doing the wrong thing is high — sending a message you can’t unsend, overwriting data you can’t recover, triggering a payment you can’t reverse — the discipline is: don’t let the agent do it without human approval. Use “Always Ask” behavior. Use a draft-and-review pattern. Use anything that puts a human in the loop before the irreversible action.

    Operators who ship fast and iterate freely tend to underweight this category. The day you discover it’s been quietly overwriting the wrong database field for two weeks is the day you wish you’d built the review gate.

    7. Credit efficiency for genuinely reasoning-heavy work

    This one is practical rather than architectural. Starting May 4, 2026, Custom Agents run on Notion Credits at roughly $10 per 1,000 credits. Internal Notion data suggests Custom Agents run approximately 45–90 times per 1,000 credits for typical tasks — meaning tasks that require more steps, more tool calls, or more context cost proportionally more credits per run. That means simple recurring tasks are cheap. Complex reasoning-heavy tasks add up.

    If you’re building an agent that does heavy reasoning work many times per day, the credit cost can exceed what the same work would cost through Claude’s API directly, especially on higher-capability Claude models called directly without the Notion overhead. For high-frequency reasoning work, run the math before you commit to the agent architecture.


    Where Claude genuinely wins

    The other side of the honest comparison. Claude earns its place in categories where Custom Agents either can’t operate or operate poorly.

    Strategic thinking conversations. When you’re working through a decision, evaluating a tradeoff, or thinking through a strategy, Claude in an extended conversation is the right tool. The back-and-forth is the whole point. You can interrogate reasoning, push back on conclusions, reframe the problem mid-conversation. An agent that produces a one-shot answer, no matter how good, is the wrong shape for this kind of work.

    Drafting with voice. Writing that needs to sound like a specific person is Claude’s territory. You can load up Claude with context about your voice — past writing, tonal preferences, things to avoid — and get output that actually reads as yours. Notion Custom Agents will always produce generic-flavored writing. That’s fine for internal reports. It’s a problem for anything external.

    Code and technical work. Claude Code specifically is built for technical depth. It reads codebases, executes in a terminal, calls tools, iterates on failures. Custom Agents will flail at the same work.

    Research synthesis across live sources. Claude with web search and fetch capabilities handles “go read this, this, and this, and tell me what the current state actually is” in a way Custom Agents structurally can’t. Anything that requires reaching outside a known data universe is Claude.

    Work that crosses many systems. When a workflow needs to touch code, Notion, a database, an external API, and a human review, Claude Code with the right MCP servers connected coordinates across them better than a Custom Agent inside Notion does. The agent’s world is Notion-plus-connected-integrations. Claude’s world is wider.

    Anything requiring judgment about whether to proceed. Agents execute. Claude in a conversation can pause, check with you, and ask “should I actually do this?” That judgment layer is frequently the most important part of the workflow.


    The pattern that actually works (both, in the right places)

    The operators who get this right aren’t choosing one tool over the other. They’re running both, in specific roles, with clear handoffs.

    The pattern I run:

    Rhythmic operational work lives in Custom Agents. Morning briefs, triage, weekly reviews, database enrichment, Q&A over workspace knowledge. Things that happen repeatedly, have clear inputs, and produce workspace-shaped outputs.

    Judgment-heavy work lives in Claude conversations. Strategic decisions, drafting with voice, research, anything requiring back-and-forth. I do this work in Claude chat sessions with the Notion MCP wired in, so Claude has real context when I need it to.

    Technical work lives in Claude Code. Building scripts, managing infrastructure, debugging, writing code. Custom Agents don’t touch this.

    Handoffs are explicit. When I make a decision in Claude that needs to become operational, it lands as a task or brief in a Notion database, and from there a Custom Agent can pick it up. When a Custom Agent surfaces something that needs judgment, it creates an escalation entry that shows up on my Control Center, where I engage Claude to think through it.

    The two systems pass work back and forth through the workspace. Neither tries to do the other’s job. The seams are the Notion databases where state lives.

    This is not the vendor-shaped pattern. The vendor-shaped pattern says “Custom Agents can handle everything.” The operator-shaped pattern says “Custom Agents handle what they’re good at, and when the work exceeds their reach, another tool takes over with a clean handoff.”


    The decision tree, when you’re not sure

    For a specific piece of work, run these questions in order. Stop at the first “yes.”

    Does this task need a specific human voice, or could it be written by any competent person? If it needs your voice, reach for Claude. If it doesn’t, move on.

    Does this task require reasoning across contested or ambiguous information? If yes, Claude. If no, move on.

    Does this task need real-time web context, live external data, or information not already in a known database? If yes, Claude. If no, move on.

    Does this task involve code, system architecture, or technical depth? If yes, Claude Code. If no, move on.

    Does this task have high-stakes irreversible consequences? If yes, wrap it in a human-approval gate — either run it through Claude where the human is in the loop, or use Custom Agents with “Always Ask” behavior.

    Does this task happen repeatedly on a schedule or in response to workspace events? If yes, Custom Agent. This is the sweet spot.

    Is the output a Notion page, database row, or something that stays in the workspace? If yes, Custom Agent is usually the right call.

    Is the task bounded enough that it could be described in a couple of clear sentences? If yes, Custom Agent. If it’s sprawling, it’s probably too big for an agent.

    If you’re through the tree and still not sure, default to Claude. Claude is more expensive in money and cheaper in hidden cost than a Custom Agent running the wrong job.


    The failure modes I’ve seen

    Specific patterns that go wrong, in my observation:

    The “agent for everything” operator. Someone who just got access to Custom Agents and is building agents for tasks that don’t need agents. The agents mostly work. The ones that mostly work waste credits on tasks a template or a simple automation would handle. The ones that partially work produce quiet low-grade mistakes that accumulate.

    The “Claude for everything” operator. The inverse. Someone who got comfortable with Claude and hasn’t made the leap to letting agents handle the rhythmic work. They’re paying the context-loss tax every morning, doing the triage manually, writing every brief from scratch. Claude is too expensive a tool — in attention, if not dollars — to run routine work through.

    The operator who built one giant agent. Custom Agents are meant to be narrow. Someone violates the “one agent, one job” principle by building an agent that does inbox triage and database updates and weekly reports and client communications. The agent becomes hard to debug, expensive to run, and unreliable across its many hats. The fix is almost always breaking it into three or four single-purpose agents.

    The operator who didn’t build review gates. An agent sending emails without human approval. An agent deleting rows based on inferred criteria. An agent updating client-facing pages from an unchecked data source. The cost of the first real mistake exceeds the cost of the review gate that would have prevented it, every time.

    The operator who never checked credit consumption. Custom Agents consume credits based on model, steps, and context size. An operator who built ten agents and never looked at the dashboard ends up surprised when the monthly bill is much higher than expected. The fix is easy — Notion ships a credits dashboard — but it has to actually get checked.


    The timing honest note

    A piece of this article that ages. These comparisons are true in April 2026. Custom Agents are new enough that the feature set will expand significantly over the next year. Claude is evolving rapidly. The specific gaps I’ve named may close; new gaps may open in different directions.

    What won’t change is the pattern: some work wants a specialized tool, some work wants a general-purpose one. Some work is rhythmic, some is judgment-driven. Some work lives inside a workspace, some crosses systems. The vocabulary for when to use which tool will evolve; the underlying truth that different shapes of work deserve different tools will not.

    If you’re reading this in 2027 and Custom Agents have shipped fifteen new capabilities, the specific “can’t do” list will be shorter. The decision tree at the top of this article will still work. That’s the part worth holding onto.


    What I’m not saying

    A few clarifications because I want to be clear about what this article is and isn’t.

    I’m not saying Custom Agents are bad. They’re genuinely good at what they’re good at. They’re saving me hours per week on work I used to do manually.

    I’m not saying Claude is strictly better. Claude is more capable at a broader set of tasks, but it also costs more, requires active operator engagement, and can’t sit in the background running overnight rhythms the way Custom Agents can.

    I’m not saying there’s one right answer for every operator. Different operators with different businesses and different workflows will land on different splits. The decision tree helps, but it’s a starting point, not a conclusion.

    I’m not saying this is permanent. Tool landscapes change fast. Six months from now there may be categories where Custom Agents beat Claude that don’t exist today, and vice versa. What matters is developing the habit of asking “which tool is this work actually shaped for?” instead of defaulting to whichever one you learned first.


    The one thing I’d want you to walk away with

    If you read nothing else in this article, this is the sentence I’d want in your head:

    Rhythmic operational work wants an agent; judgment-heavy work wants a conversation.

    That distinction — rhythm versus judgment — cuts through almost every architecture question you’ll have when deciding what to route where. It’s not the only dimension that matters, but it’s the one that settles the most decisions correctly.

    Work that happens on a schedule or in response to an event, with bounded inputs and clear outputs? That’s rhythm. Build a Custom Agent.

    Work that requires thinking through tradeoffs, integrating disparate information, or producing output with specific voice and judgment? That’s a conversation. Engage Claude.

    Get that right for most of your workflows and the rest of the architecture tends to sort itself out.


    FAQ

    Can’t Custom Agents do everything Claude can do, just inside Notion? No. Custom Agents are optimized for bounded, rhythmic, workspace-shaped tasks. They can technically attempt work that requires deep reasoning, specific voice, or live external context, but the results degrade in predictable ways. Claude — in a conversation or in Claude Code — handles those categories better.

    Should I just use Claude for everything then? No. Rhythmic operational work — morning briefs, triage, weekly reports, database enrichment — is genuinely better in Custom Agents than in Claude, because the “autonomous teammate running while you sleep” property matters. The right answer is running both, in their respective sweet spots.

    What’s the cost comparison? Starting May 4, 2026, Custom Agents cost roughly $10 per 1,000 Notion Credits. Internal Notion data suggests agents run approximately 45–90 times per 1,000 credits depending on task complexity. Claude’s subscription pricing is flat. For high-frequency simple tasks, Custom Agents are usually cheaper. For heavy reasoning work done many times per day, running Claude directly can be more cost-efficient.

    What about Notion Agent (the personal one) versus Claude? Notion Agent is Notion’s on-demand personal AI — you prompt it, it responds. It’s fine for in-workspace tasks where you need AI help with content you’re already looking at. For deeper reasoning, complex drafting, or cross-tool work, Claude is more capable. Notion Agent is a good ambient utility; Claude is a general-purpose intelligence layer.

    Which should I learn first if I’m new to both? Claude. Learn to think with an AI as a thinking partner before you try to build autonomous agents. Once you understand what AI can and can’t do in a conversation, the design decisions for Custom Agents become much clearer. Jumping to Custom Agents without the Claude foundation is how operators end up with agents that don’t work as expected.

    Can Custom Agents use Claude models? Yes. Custom Agents let you pick the AI model they run on. Claude Sonnet 4.6 and Claude Opus 4.7 are both available, along with GPT-5 and various other models. This means the underlying intelligence of a Custom Agent can be Claude — you’re choosing between Claude-as-conversation (claude.ai, Claude Desktop, Claude Code) and Claude-as-embedded-agent (Custom Agent running Claude). Different interfaces, same underlying model in that case.

    What if I want Claude to work autonomously on a schedule like Custom Agents do? Possible, but requires more work. Claude Code can be scripted; you can run it on a cron job; you can set up headless workflows. But the “out of the box autonomous teammate” experience is Notion’s current strength, not Anthropic’s. If you want autonomous-background-work without building your own infrastructure, Custom Agents are easier.

    How do I decide for my specific situation? Run the decision tree in the article. If you’re still unsure, default to Claude — it’s the more general-purpose tool, and the cost of using the wrong tool for judgment-heavy work is higher than the cost of using the wrong tool for rhythmic work. You can always migrate a recurring workflow to a Custom Agent once you understand the shape.


    Closing note

    The honest comparison isn’t one tool versus the other. It’s understanding that different shapes of work want different shapes of tool, and that most operators lose more time to the mismatch than to any individual tool’s limitations.

    Custom Agents are good at being Custom Agents. Claude is good at being Claude. Neither is good at being the other. Use both, in the places each belongs, with clean handoffs between them, and the stack hums.

    Skip the vendor narratives. Read your own workflows. Route each piece to the tool it’s actually shaped for. That’s the whole game.


    Sources and further reading

    Related Tygart Media pieces:

  • How to Wire Claude Into Your Notion Workspace (Without Giving It the Keys to Everything)

    How to Wire Claude Into Your Notion Workspace (Without Giving It the Keys to Everything)

    Last refreshed: May 15, 2026

    Update — May 15, 2026: On May 13, 2026, Notion shipped the Notion Developer Platform (version 3.5), with Claude as a launch partner. The platform adds Workers, database sync, an External Agents API, and a Notion CLI. The patterns described in this article still work, but there is now a native, sanctioned alternative for some of what previously required custom MCP wiring or third-party automation. For the full breakdown of what changed and what it means for the Notion + Claude stack, see Notion Developer Platform Launch (May 13, 2026). For the underlying operating philosophy, see The Three-Legged Stack.

    The step most tutorials skip is the one that actually matters.

    Every guide to connecting Claude to Notion walks you through the same mechanical sequence — OAuth flow, authentication, running claude mcp add, and done. It works. The connection lights up, Claude can read your pages, write to your databases, and suddenly your AI has the run of your workspace. The tutorials stop there and congratulate you.

    Here’s the part they don’t mention: according to Notion’s own documentation, MCP tools act with your full Notion permissions — they can access everything you can access. Not the pages you meant to share. Everything. Every client folder. Every private note. Every credential you ever pasted into a page. Every weird thing you wrote about a coworker in 2022 and forgot was there.

    In most setups the blast radius is enormous, the visibility is low, and the decision to lock it down happens after something goes wrong instead of before.

    This is the guide that takes the extra hour. Wiring Claude into your Notion workspace is straightforward. Wiring Claude into your Notion workspace without giving it the keys to everything takes a few additional decisions, a handful of specific configuration choices, and a mental model for what should and shouldn’t flow across the connection. That’s the hour worth spending.

    I run this setup across a real production workspace with dozens of active properties, real client work, and data I genuinely don’t want an AI to have unbounded access to. The pattern below is what works. It is also honest about what doesn’t.


    Why Notion + Claude is worth doing carefully

    Before the mechanics, it’s worth being clear about what you get when you wire this up correctly.

    Claude with access to Notion is not Claude with a better search function. It is a Claude that can read the state of your business — briefs, decisions, project status, open loops — and reason across them to help you run the operation. It can draft follow-ups to conversations it finds in your notes. It can pull together summaries across projects. It can take a decision you’re weighing, find every related piece of context in the workspace, and give you a grounded opinion instead of a generic one.

    That’s the version most operator-grade users want. And it’s only valuable if the trust boundary is drawn correctly. A Claude that has access to your relevant context is a superpower. A Claude that has access to everything you’ve ever written is a liability waiting to catch up with you.

    The whole article is about drawing that boundary on purpose.


    The two connection options (and which one you actually want)

    There are two ways to connect Claude to Notion in April 2026, and the right one depends on what you’re doing.

    Option 1: Remote MCP (Notion’s hosted server). You connect Claude — whether that’s Claude Desktop, Claude Code, or Claude.ai — to Notion’s hosted MCP endpoint at https://mcp.notion.com/mcp. You authenticate through OAuth, which opens a browser window, you approve the connection, and it’s live. Claude can now read from and write to your workspace based on your access and permissions.

    This is the officially-supported path. Notion’s own documentation explicitly calls remote MCP the preferred option, and the older open-source local server package is being deprecated in favor of it. For most operators, this is the right answer.

    Option 2: Local MCP (the legacy / open-source package). You install @notionhq/notion-mcp-server locally via npm, create an internal Notion integration to get an API token, and configure Claude to talk to the local server with your token. You then have to manually share each Notion page with the integration one by one — the integration only sees pages you explicitly grant access to.

    This path is more work and is being phased out. But there’s one genuine reason to still use it: the local path uses a token and the remote path uses OAuth, which means the local path works for headless automation where a human isn’t around to click OAuth buttons. Notion MCP requires user-based OAuth authentication and does not support bearer token authentication. This means a user must complete the OAuth flow to authorize access, which may not be suitable for fully automated workflows.

    For 95% of setups, remote MCP is the right answer. For the 5% running true headless agents, the local package is still the pragmatic choice even though it’s on its way out.

    The rest of this guide assumes remote MCP. I’ll flag the places the advice differs for local.


    The quiet part Notion tells you out loud

    Before we get to the setup, one more thing you need to internalize because it shapes every decision below.

    From Notion’s own help center: MCP tools act with your full Notion permissions — they can access everything you can access.

    Read that sentence twice.

    If you are a workspace member with access to 140 pages across 12 databases, your Claude connection can access 140 pages across 12 databases. Not the 15 you’re working on today. All of them. OAuth doesn’t scope you down to “this project.” It says yes or no to “can Claude see your workspace.”

    This is fine when your workspace is already organized the way you’d want an AI to see it. It is catastrophic when it isn’t, because most workspaces have accumulated years of drift, private notes, credential-adjacent content, sensitive client data, and old experiments that nobody bothered to clean up.

    So before you connect anything, you do the workspace audit. Not because Notion says so. Because your future self will thank you.


    The pre-connection audit (the step tutorials skip)

    Fifteen minutes with the workspace, before you click the OAuth button. Here’s the checklist I run through:

    Find anything that looks like a credential. Search your workspace for the words: password, API key, token, secret, bearer, private key, credentials. Read the results. Move anything sensitive to a credential manager (1Password, Bitwarden, a password-protected vault — not Notion). Delete the Notion copies.

    Find anything you wouldn’t want an AI to read. Search for: divorce, legal, lawsuit, personal, venting, complaint, therapist. Yes, really. People put things in Notion they’ve forgotten are in Notion. An AI that has access to everything you can access will find those things and occasionally surface them in responses. This is embarrassing at best and career-ending at worst.

    Look at your database of clients or contacts. Is there anything in there that shouldn’t travel through an AI provider’s servers? Notion processes MCP requests through Notion’s infrastructure, not yours. Sensitive legal matters, medical information, financial details about third parties — these may deserve a workspace or sub-page that stays outside of what Claude is allowed to see.

    Identify what Claude actually needs. Make a short list: your active projects, your working databases, your briefs page, your daily/weekly notes. This is what you actually want Claude to have context on. The rest is noise.

    Decide your posture. Two options here. You can run Claude against your main workspace and accept the blast radius, or you can create a separate workspace (or a teamspace) that contains only the pages and databases you want Claude to see, and connect Claude to that one. The second option is more work upfront. It is also the only version that actually draws the boundary.

    I run the second option. My Claude-facing workspace is genuinely a subset of what I work with, and the rest of my Notion is on a different membership. It took an hour to set up. It was worth it.


    Connecting remote MCP to Claude Desktop

    Now the mechanics. Starting with Claude Desktop because it’s the simplest.

    Claude Desktop gets Notion MCP through Settings → Connectors (not the older claude_desktop_config.json file, which is being phased out for remote MCP). This is available on Pro, Max, Team, and Enterprise plans.

    Open Claude Desktop. Settings → Connectors. Find Notion (or add a custom MCP server with the URL https://mcp.notion.com/mcp). Click Connect. A browser window opens, Notion asks you to authenticate, you approve. Done.

    The connection now lives in your Claude Desktop. You can start a new conversation and ask Claude to read a specific page, summarize a database, or draft something based on workspace content, and it will.

    One hygiene note: Claude Desktop connections are per-account. If you have multiple Claude accounts (say, a personal Pro and a work Max), each one needs its own connection to Notion. The good news is you can point each one at a different Notion workspace — personal Claude at personal Notion, work Claude at work Notion. This is the operator pattern I recommend for anyone running more than one business context through Claude.


    Connecting remote MCP to Claude Code

    Claude Code is the path most operators actually run at depth, because it’s the version of Claude that lives in your terminal and can compose MCP calls into real workflows.

    The command is one line:

    claude mcp add --transport http notion https://mcp.notion.com/mcp

    Need this set up for your team?

    I set up Claude integrations, GCP infrastructure, and AI workflows for businesses. If you’d rather ship than configure — will@tygartmedia.com

    Then authenticate by running /mcp inside Claude Code and following the OAuth flow. Browser opens, Notion asks you to authorize, you approve, and the connection is live.

    A few options worth knowing about at setup time:

    Scope. The --scope flag controls who gets access to the MCP server on your machine. Three options: local (default, just you in the current project), project (shared with your team via a .mcp.json file), and user (available to you across all projects). For Notion, user scope is usually right — you’ll want Claude to reach Notion from any project you’re working in, not just the current one.

    The richer integration. Notion also ships a plugin for Claude Code that bundles the MCP server along with pre-built Skills and slash commands for common Notion workflows. If you’re doing this seriously, install the plugin. It adds commands like generating briefs from templates and opening pages by name, and saves you from writing your own.

    Checking what’s connected. Inside Claude Code, /mcp lists every MCP server you’ve configured. /context tells you how many tokens each one is consuming in your current session. For Notion specifically, this is useful because MCP servers have non-zero context cost even when you’re not actively using them — every tool exposed by the server sits in Claude’s context, eating tokens. Running /context occasionally is how you notice when an MCP connection is heavier than you expected.


    The permissions pattern that actually protects you

    Now we’re past the mechanics and into the hygiene layer — the part that most guides don’t cover.

    Once Claude is connected to your Notion workspace, there are three specific configuration moves worth making. None of them are hard. All of them pay rent.

    1. Scope the workspace, don’t scope the connection

    The OAuth connection doesn’t let you say “Claude can see these pages but not those.” It lets you say “Claude can see this workspace.” So the place to draw the boundary is at the workspace level, not at the connection level.

    If you have sensitive content in your main workspace, move it. Create a separate workspace for Claude-facing content and keep the sensitive stuff out. Or use Notion’s teamspace feature (Business and Enterprise) to isolate access at the teamspace level.

    This feels like over-engineering until the first time Claude surfaces something in a response that you had forgotten was in your workspace. After that, it doesn’t feel like over-engineering.

    2. For Enterprise: turn on MCP Governance

    If you’re on the Enterprise plan, there’s an admin-level control worth enabling even if you trust your team. From Notion’s docs: with MCP Governance, Enterprise admins can approve specific AI tools and MCP clients that can connect to Notion MCP — for example Cursor, Claude, or ChatGPT. The approved-list pattern is opt-in: Settings → Connections → Permissions tab, set “Restrict AI tools members can connect to” to “Only from approved list.”

    Even if you only approve Claude today, the control gives you the ability to see every AI tool anyone on your team has connected, and to disconnect everything at once with the “Disconnect All Users” button if you ever need to. That’s the kind of control you want to have configured before you need it, not after.

    3. For local MCP: use a read-only integration token

    If you’re using the local path (the open-source @notionhq/notion-mcp-server), you have more granular control than the remote path gives you. Specifically: when you create the integration in Notion’s developer settings, you can set it to “Read content” only — no write access, no comment access, nothing but reads.

    A read-only integration is the right default for anything exploratory. If you want Claude to be able to write too, enable write access later when you’ve decided you trust the specific workflow. Don’t give write access by default just because the integration setup screen presents it as an option.

    This is the one place the local path is actually stronger than remote — you can shape the integration’s capabilities before you grant it access, and the integration only sees the specific pages you share with it. For high-sensitivity setups, this granularity is worth the tradeoff of running the legacy package.


    Prompt injection: the risk nobody wants to talk about

    One more thing before we leave the hygiene section. It’s the thing the industry is least comfortable being direct about.

    When Claude has access to your Notion workspace, Claude also reads whatever is in your Notion workspace. Including pages that came from outside. Including meeting notes that were imported from a transcript service. Including documents shared with you by clients. Including anything you pasted from the web.

    Every one of those is a potential vector for prompt injection — hidden instructions buried in content that, when Claude reads the content, hijack what Claude does next.

    This is not theoretical. Anthropic itself flags prompt injection risk in the MCP documentation: be especially careful when using MCP servers that could fetch untrusted content, as these can expose you to prompt injection risk. Notion has shipped detection for hidden instructions in uploaded files and flags suspicious links for user approval, but the attack surface is larger than any detection system can fully cover.

    The practical operator response is three-part:

    Don’t give Claude access to content you didn’t write, without reading it first. If a client sends you a document and you paste it into Notion and Claude has access to that database, you have effectively given Claude the ability to be instructed by your client’s document. This might be fine. It might be a problem. Read the document before it goes into a Claude-accessible location.

    Be suspicious of workflows that chain untrusted content into actions. A workflow where Claude reads a web-scraped summary and then uses that summary to decide which database row to update is a prompt injection target. If the scraped content can shape Claude’s action, the scraped content can be weaponized.

    Use write protections for anything consequential. Anything where the cost of Claude doing the wrong thing is real — sending an email, deleting a record, updating a client-facing page — belongs behind a human-approval gate. Claude Code supports “Always Ask” behavior per-tool; use it for writes.

    This sounds paranoid. It’s not paranoid. It’s the appropriate level of caution for a class of attack that is genuinely live and that the industry has not yet figured out how to fully defend against.


    What this actually enables (the payoff section)

    Once you’ve done the setup and the hygiene work, here’s what you now have.

    You can sit down at Claude and ask it questions that require real workspace context. What’s the status of the three projects I touched last week? Pull together everything we’ve decided about pricing across the client work this quarter. Draft a response to this incoming email using context from our ongoing conversation with this client. Claude reads the relevant pages, synthesizes across them, and responds with actual grounding — not a generic answer shaped by whatever prompt you happen to type.

    You can run Claude Code against your workspace for development-adjacent operations. Generate a technical spec from our product page notes. Create release notes from the changelog and feature pages. Find every page where we’ve documented this API endpoint and reconcile the inconsistencies.

    You can set up workflows that flow across tools. Claude reads from Notion, acts on another system via a different MCP server, writes results back to Notion. This is the agentic pattern the industry keeps talking about — and with the right permissions hygiene, it actually becomes usable instead of scary.

    None of this is theoretical. I use this pattern every working day. The value is real. The hygiene discipline is what keeps the value from turning into a liability.


    When this setup goes wrong (troubleshooting honestly)

    Five failure modes I’ve seen, in order of frequency.

    Claude doesn’t see the page you asked about. For remote MCP, this almost always means the page is in a workspace you’re not a member of, or in a teamspace you don’t have access to. For local MCP, it means the integration hasn’t been granted access to that specific page — you have to go to the page, click the three-dot menu, and add the integration manually.

    OAuth flow doesn’t complete. Usually a browser issue — popup blocker, wrong Notion account signed in, session expired. Clear auth, try again. If Claude Desktop, disconnect the connector entirely and re-add.

    The connection succeeds but Claude doesn’t seem to be using it. Run /mcp in Claude Code to verify the server is listed and connected. If it’s there and Claude still isn’t invoking it, the issue is usually in how you’re asking — Claude won’t reach for MCP tools just because they exist; you need to phrase the request in a way that makes it obvious the tool is relevant. Find the page about X in Notion works better than tell me about X.

    MCP server crashes or returns errors. For remote, this is rare and usually resolves itself — Notion’s hosted server has the standard cloud-reliability profile. For local, check your Node version (the server requires Node 18 or later), your config file syntax (JSON is unforgiving about trailing commas), and your token format.

    Context token budget goes through the roof. Every MCP server in your connected list contributes tools to Claude’s context on every request. If you have five MCP servers configured, that’s five sets of tool descriptions being loaded into every conversation. Run /context in Claude Code to see the cost. If it’s painful, disconnect the servers you’re not actively using.


    The mental model that keeps you sane

    Here’s the mental model I use for the whole setup. It’s short.

    Claude plus Notion is like giving a new, very capable employee access to your business. You wouldn’t hand a new hire every password, every file, every client record, every private note on day one. You’d give them access to the specific things they need to do the job, watch how they use that access, and expand trust over time based on track record.

    The MCP connection works exactly that way. You decide what Claude gets to see. You decide what Claude gets to write. You watch how it uses that access. You expand the boundary as trust earns itself.

    The operators who get hurt by this kind of setup are the ones who skip the first step and give Claude everything on day one. The operators who get the real value out of it are the ones who treat the connection the way they’d treat any other employee — with deliberate scope, real oversight, and the willingness to revoke access if something goes wrong.

    That’s the discipline. That’s the whole thing.


    FAQ

    Do I need to install anything to connect Claude to Notion? For remote MCP (the recommended path), no installation is required — you connect via OAuth through Claude Desktop’s Settings → Connectors or Claude Code’s claude mcp add command. For local MCP (legacy), you install @notionhq/notion-mcp-server via npm and create an internal Notion integration.

    What’s the URL for Notion’s remote MCP server? https://mcp.notion.com/mcp. Use HTTP transport (not the deprecated SSE transport).

    Can Claude see my entire Notion workspace by default? Yes. MCP tools act with your full Notion permissions — they can access everything you can access. The boundary is set by your workspace membership and teamspace access, not by the MCP connection itself. If you need finer-grained control, isolate Claude-facing content into a separate workspace or teamspace.

    Can I use Notion MCP with automated, headless agents? Remote Notion MCP requires OAuth authentication and doesn’t support bearer tokens, which makes it unsuitable for fully automated or headless workflows. For those cases, the legacy @notionhq/notion-mcp-server with an API token still works, but it’s being phased out.

    What plans support Notion MCP? Notion MCP works with all plans for connecting AI tools via MCP. Enterprise plans get admin-level MCP Governance controls (approved AI tool list, disconnect-all). Claude Desktop MCP connectors are available on Pro, Max, Team, and Enterprise plans.

    Can my company’s admins control which AI tools connect to our Notion workspace? Yes, on the Enterprise plan. Admins can restrict AI tool connections to an approved list through Settings → Connections → Permissions tab. Only admin-approved tools can connect.

    Is Notion MCP secure for confidential business data? The MCP protocol itself respects Notion’s permissions — it can’t bypass what you have access to. However, content flowing through MCP is processed by the AI tool you’ve connected (Claude, ChatGPT, etc.), which has its own data handling policies. For highly sensitive content, the right move is to isolate it in a workspace that Claude doesn’t have access to, rather than relying on the protocol alone to contain it.

    What about prompt injection attacks through Notion content? Real risk. Anthropic explicitly flags it in their MCP documentation. Notion has shipped detection for hidden instructions and flags suspicious links, but no detection system catches everything. The operator response: don’t give Claude access to content you didn’t write without reviewing it first, be suspicious of workflows where untrusted content shapes Claude’s actions, and put human-approval gates on anything consequential.

    What’s the difference between Notion’s built-in AI and connecting Claude via MCP? Notion’s built-in AI (Notion Agent and Custom Agents) runs inside Notion and uses Notion’s integration with frontier models. Connecting Claude via MCP brings Claude — your chosen model, in your chosen interface, with its full capability — to your workspace as an external client. The built-in option is simpler; the MCP option is more powerful and composable across other tools.


    Closing note

    Most tutorials treat the connection as the goal. The connection is the easy part. The hygiene is the part that matters.

    If you wire Claude into your Notion workspace thoughtlessly, you’ve given a capable AI access to every corner of your operational history, and you’ll be surprised how much of what’s in there you’d forgotten. If you wire it in deliberately — with a scoped workspace, with the permissions you’ve thought about, with the posture of giving a new employee measured access — you’ve built something that pays rent every day without ever becoming the liability it could have been.

    One hour of setup. One hour of cleanup. And then one of the most useful AI configurations currently possible in April 2026.

    The intersection of Notion and Claude is where the operator work actually happens now. Worth setting up right.


    Sources and further reading

  • The Clean Tool: Why I Keep My Claude Empty of the People I Love

    The Clean Tool: Why I Keep My Claude Empty of the People I Love

    A flagship essay on AI hygiene: what to store, what to keep out, and how to have the conversation about it with the people in your life.

    “What do you know about my girlfriend?”

    Last night my partner Stef asked me a question she had a right to ask. She wanted to know what my AI knew about her.

    I use Claude for hours a day. I run an agency on top of it. I have knowledge bases, project contexts, client stacks, and conversation histories going back years. She watched me work on the thing enough to assume that by now, surely, the AI had a rich picture of her — her sense of humor, her work, the shape of our relationship, the running jokes, the small details a partner remembers. She handed me her phone as a test of it. Let it tell me what it knows.

    The answer was almost nothing.

    My name for her. That she lives here. A few passing references to a Notion chat room she once set up, a voice memo she sent me that we extracted some thinking from. No sense of who she is as a person. No running joke the model could finish. No model of her at all, really.

    She was hurt in a flash, the way you get hurt by something that isn’t an injury but is still information. I was quietly proud, in a way I didn’t know how to explain in the moment. Both reactions were correct. That’s the thing I want to write about here — that the gap between her hurt and my pride is the shape of a whole category of questions almost nobody is asking out loud yet, and it is only going to get bigger.

    We talked about it for a while. I tried to explain why the tool was empty of her on purpose. She let me try. And what came out of the conversation was the argument I’m about to make, which I’ll phrase in one sentence up front so you can decide whether to keep reading:

    Keeping the people you love out of your AI is not forgetting them. It’s a specific kind of care. And the conversation you have about why they’re not in there is how you close the gap between what the tool knows and what the relationship deserves.

    If that sentence lands at all, the rest of this is the why, the how, and the honest version of what I’m still getting wrong.

    AI Memory Is Nuclear Power

    Here’s the frame that has organized my thinking on this for the last year.

    AI memory is nuclear power. Real civilization-scale utility on one side, real civilization-scale danger on the other, and almost nobody I’ve met is running a containment protocol worthy of the payload they’re storing.

    The analogy holds all the way down. The fuel is useful because it’s concentrated — that’s the whole point of a persistent memory that remembers your business, your family, your finances, your health, your history. Concentration is what makes the tool powerful. Concentration is also exactly what makes a spill catastrophic. And the people celebrating the new reactor are almost never the people thinking about the waste.

    The honest position on this, I’ve come to believe, is neither abstinence nor maximalism. It’s containment engineering. You build the reactor and the shielding. You use the tool and you design the protocol for when the tool fails. Pro-AI and pro-guardrail are the same position. Anyone telling you to choose one is selling you something.

    What makes this hard is that the stakes are asymmetric in a way most people never sit with directly. For the platform, your memory is one row in a table of billions — a single unit of risk distributed across a huge population. For you, your memory is a map of your life. The platform’s worst-case scenario is a rough quarter, a settlement, a bad headline. Your worst-case scenario is a destroyed marriage, a leaked client list, a legal catastrophe, a career-ending screenshot. These are not remotely comparable events, and they don’t scale the same way, and they do not reach any kind of equilibrium where the platform’s good-faith security policy protects the individual worst case. The platform is optimizing for its risk profile. Its risk profile is not yours. You are the only person whose worst-case scenario is your worst-case scenario.

    That asymmetry is why individual hygiene matters even when platform security is genuinely excellent. It’s why I don’t think this conversation is paranoid and I don’t think it’s solved and I don’t think you can outsource it.

    Three Failure Modes. Which One Are You?

    Most people running AI at any real depth fall into one of three failure modes, and most of them don’t know which one they’re in. Before I tell you what any of them are, I want you to place yourself while you read.

    The over-loader. This is the person who treats the AI as a second brain and dumps everything into it — credentials, relationships, grievances, client details, medical history, the long rambling voice-memo of what happened at Thanksgiving. It feels like investment. It feels like the tool getting smarter about them. It mostly is. But it also means one breach, one nosy partner, one subpoena, one bad exit from the platform turns the tool into a weapon pointed directly at the user. The over-loader’s failure mode is invisible until it isn’t.

    The under-loader. This is the person who keeps the tool so sterile it never reaches its potential — which is fine as far as it goes, except the humans in their life often discover, usually by accident, that they aren’t in the context at all. That discovery doesn’t land as safety. It lands as erasure. The under-loader’s failure mode is relational, not technical. The tool stays clean, and the relationships pay the cost the tool should have paid.

    The unaware. This is, honestly, most people. No mental model of what’s stored, where, for how long, or under whose policy. They’re making operational decisions — business decisions, relationship decisions, identity decisions — on top of a foundation they have never inspected. They don’t know their AI has memory in six places, not one. They don’t know where the off switch is. They assume chat history is the whole story when chat history is maybe 20 percent of it.

    The first hygiene move is always the same: figure out which mode you default to. Over-loaders need to prune. Under-loaders need to have a conversation with the humans they’ve been quietly protecting without telling them. The unaware need to spend thirty minutes mapping what they’ve actually agreed to.

    I’ve been all three at different points. Most operators I respect have been too. The point of the diagnostic isn’t to shame. It’s to make the failure mode visible enough that you can actually work on it.

    Clean Tool vs. Second Brain: The Choice You Might Not Know You’re Making

    There are two coherent philosophies for how to use AI at depth, and they are genuinely in tension.

    The Clean Tool approach says: the AI is an instrument. You keep it sharp by keeping it empty of identity. You bring the context you need into each session, do the work, and let the session close without leaving a permanent residue of who you were that day. The AI is like a great chef’s knife — it serves you best when it is exactly what it is, not a repository of everything you’ve ever cut with it.

    The Second Brain approach says: the AI is an extension of cognition. The more of you it holds, the more it can do for you. The payoff scales with the investment. Loading your thinking, your projects, your relationships, your patterns into the model is not a liability — it’s the whole point. You are building a partner that knows you well enough to anticipate you. The AI is like a lifelong collaborator who has read every note you ever took.

    Both are legitimate. Both have failure modes. The failure mode of the Clean Tool is that you never reach the depth of partnership that made you interested in AI in the first place — you end up with a very sharp instrument and no deep relationship with the work it enables. The failure mode of the Second Brain is that you build something you cannot leave, cannot audit, and cannot defend if it ever gets read by the wrong person.

    I run Clean Tool. I should say that plainly. I do not believe it is the only right answer. I believe it is the right answer for how I work, what I work on, and who the people around me are. My work touches client data, confidential business strategy, and a personal life I want to keep intact. The cost of a Second Brain leak, for me, is catastrophic in a way I cannot price. The cost of the Clean Tool is friction — I reload context more often, I carry more of my own thinking in my own head, I refuse some of the tool’s offers of recall. That friction is the price of sleeping well.

    I know thoughtful people who run Second Brain and run it well. They’ve built containment around it. They accept different tradeoffs. The worst place to be is the one most users actually occupy, which is a confused middle — enough invested that the data layer has weight, not enough discipline that the containment is real. You get the downsides of both and the upsides of neither.

    So if you take one frame from this piece: the choice isn’t which philosophy is correct. The choice is which one you are running, consciously, with the guardrails appropriate to that choice. Drifting into either by accident is what produces the failure modes nobody wants.

    The People Not in the Memory

    I want to go back to Stef, because this is the part of the piece that matters most to me and I’m not sure I’d trust anyone else to write it the way I need to write it.

    When Stef was hurt that the AI didn’t know her, I understood what she was feeling. The intuition beneath the hurt is simple and very human: you spend hours every day with this thing. It’s your work, your thinking, your hours. If you cared about me the way you care about the work, surely some of that care would show up in the tool. That intuition is not wrong in its values. It’s wrong in its mechanics.

    AI proximity is not relational proximity. Time-on-tool is the worst possible proxy for trust. A person can spend ten hours a day with an AI and share less of themselves with it than they share in a two-minute phone call with their sister. The tool is near you. It is not close to you. These words are not synonyms and they never have been, and the confusion of them is producing a whole new species of interpersonal hurt that our language doesn’t have good words for yet.

    Here is what I believe about the people in my life and my AI’s memory. Stef is not in the tool because she does not need to be in the tool for the tool to do its job. She matters because she is a person, not because the system has modeled her. Putting her in the context would not deepen my relationship with her. It would reduce her to a row in a store I don’t fully control, governed by a policy I did not write, subject to a retention schedule I did not negotiate, accessible to whoever eventually gets to see my session — a partner who leaves, a discovery motion, a breach, a curious kid, a future version of the platform with different terms. None of those futures are certain. All of them are possible. The cost of her being in there, in any of those futures, is hers to pay, not mine.

    And I love her. So she is not in there. That is the mechanism.

    The thing I couldn’t explain to her in the moment, but want to say here, is that the emptiness isn’t neglect. It’s restraint. It’s the same impulse that makes me not tell certain stories at parties even when they’d get a laugh, because they are hers to tell. It’s the same impulse that makes me lock my phone when I step away, even though the odds that anything bad happens in the next ninety seconds are vanishingly small. It’s the practice of treating the people you love as if their information is theirs, which is the simplest expression of respect I know.

    The conversation we had after her hurt was the actual repair. I told her why the tool was empty of her. I told her what was in the tool and what wasn’t. I offered to show her my memory settings, my projects, my contexts — not as a defensive move, but as a matter of domestic transparency. She didn’t take me up on it. The offer was enough. What closed the gap wasn’t the tool changing. It was me being able to say, out loud, you are not in there because I love you, and here is what I mean by that.

    If you use AI at the depth I do and you have people in your life, I think you owe them some version of that conversation. It is not a hard conversation. It is mostly just a clarifying one. But it has to actually happen. The gap between what your tool contains and what your relationship deserves does not close on its own.

    The Containment You Can Install Tonight

    After five sections of framing, you deserve something to do. Here are five moves. None takes more than fifteen minutes. All five together take about an hour. If this is the only section of the piece you act on, you will be meaningfully safer tonight than you were this morning.

    Read your memory. Open whatever interface your AI gives you for stored memories — Claude’s memory settings, ChatGPT’s memory panel, whichever surface your platform exposes. Read every entry top to bottom. For each one, ask three questions: is this still true, is this still relevant, would I be comfortable if this leaked tomorrow? Anything that fails any of the three gets deleted or rewritten. Most people have never read their own AI memory end to end. Doing it once is often the moment the rest of this starts to feel real.

    Map the six surfaces. The chat history is not the whole memory. The whole memory is scattered across at least six surfaces: conversation history, persistent memory features, project knowledge bases, custom instructions, system prompts, and connected integrations (Drive, email, Notion, Slack). Each has a different retention policy. Each has a different surface for deletion. No single UI shows you the total picture. Sit down once and write out, for your specific AI stack, where all six surfaces live for you. This is a twenty-minute exercise that will clarify more than any article could.

    Scope your projects. Stop running one giant context that holds everything. Split into scoped projects — one for client work, one for personal writing, one for household, one for finance if you use it that way. Each project holds only the context it needs. The blast radius of any single compromise stays inside that one project. This is the same least-privilege principle engineers use for software access, applied to context.

    Lock the handoff. The threat model that matters for most individual users is not a sophisticated hacker. It’s the moment someone else touches your unlocked device — a partner borrowing the phone, a kid looking for the calculator, a colleague glancing at your screen, a support agent on a screenshare. Install a short, specific protocol: screen lock by default, session close on context switch, and a named practice for what happens when someone else uses your device. The worst leaks come from the most ordinary moments. Plan for those, not for the movie villain.

    Rotate what the AI has seen. Every credential that has ever appeared in an AI context — API key, password, token, connection string — goes on a rotation schedule the moment it enters. A ninety-day calendar reminder at minimum. Ideally, credentials never enter the AI directly at all; they live in a secrets manager and the AI calls through a proxy that holds the secret. Moving from the first version to the second is one afternoon of plumbing, and it is the single highest-leverage hygiene move an operator can make.

    These are not the whole practice. They are the starter kit. The practice compounds from here.

    The Harder Layer: What I’m Still Getting Wrong

    I want to write this section honestly because the alternative is writing it dishonestly, and there is no version of this piece that earns its argument if I pretend Tygart Media has this figured out.

    So. Here are some real mistakes.

    Earlier this month, the AI stack I use to automate WordPress work made an edit to a client site page without the kind of per-page human confirmation the situation deserved. The edit broke three live pages. The client was patient about it. The rollback worked. No business was lost. But the near-miss had the exact shape of the failure mode this whole piece warns about — capability ran ahead of containment, and a system I trusted made a change faster than my judgment could intervene. The lesson was immediate and I installed the guardrail that afternoon: any live-system action on a high-risk surface now requires explicit per-action confirmation. Read-only actions can run free. Destructive or irreversible actions cannot. The rule sounds obvious stated plainly. It was not in place before the near-miss, and that is on me.

    I have also, at various points, let credentials linger in AI contexts longer than I should have. Not dramatically. Not catastrophically. But in the honest audit I did after the incident above, there were tokens in project files older than the rotation schedule I would tell a client to use. I rotated them. I built the proxy pattern I should have built a year ago. I am closer to clean than I was, and I am not fully there yet.

    There is a reason most operators don’t write sections like this one. The near-miss is pedagogically priceless and professionally embarrassing at the same time. The embarrassment is why the field learns slowly. The honesty, when someone offers it, is the most valuable content in the space — and it is almost never offered, because the incentive structure rewards the polished version over the useful one.

    I am publishing this section anyway because I think the embarrassment is a smaller cost than the slow-learning tax the whole field pays when operators hide their misses. And because an article about hygiene that pretends its author doesn’t sweat is not an article I’d trust from anyone else. If you run AI at operator depth long enough, you will produce near-misses. Whether you learn publicly or privately is the only variable. I’d rather learn where it helps someone else avoid the same move.

    The 2030 View

    If everything in this piece feels a little optional in 2026, project the variables forward and see if the math still works.

    Memory depth is going up, not down — meaningfully, as context windows expand and persistence shifts from opt-in to default. Cross-app memory is already arriving; by 2030 your AI will know what’s in your email and your calendar and your files and your shopping history and your health app, not as separate silos but as a fused picture. Agent autonomy is arriving faster than most people realize — the AI is moving from a thing you consult to a thing that acts on your behalf, which means the containment question shifts from “what does it know” to “what can it do.” Shared household AI layers are arriving, with multiple family members on the same account already common enough that the consent problem stops being individual and becomes governance. And the legal system will catch up to all of this, unevenly, painfully, and in ways you will not want to be the test case for.

    Every problem in this article compounds under those conditions. The over-loader’s blast radius grows. The under-loader’s relational gap widens. The unaware’s foundation gets shakier. The recipes that take an hour now will take a day then. The containment practices that feel precious today will feel obvious in five years, the way locking your front door and not leaving your wallet in the car feel obvious now.

    There will be a public catastrophe. I don’t know whose. I don’t know whether it will be a major breach, a lurid divorce, a criminal discovery, or a platform failure that rewrites retention terms mid-flight. I know it will happen and I know it will reorganize how the rest of us think about this overnight. The people who built the practice before that moment will look prescient. They won’t have been prescient. They’ll have been paying attention.

    I would rather pay attention now, while the stakes are small and the mistakes are cheap, than learn after the public catastrophe when the mistakes are not.

    The Close

    Everything in this piece argues for one small idea.

    The tool is a tool. The person is a person. The hygiene is what keeps those two categories from collapsing into each other.

    When the tool becomes a stand-in for cognition, memory, identity, or intimacy, it has exceeded what it was ever built to do, and the human pays the cost. When the person becomes a user-of-tools who still owns their own thinking, relationships, and responsibility, the tool does what tools are supposed to do — extend capacity without replacing character.

    Every practical move in this article is a local case of that single principle. Every hygiene conversation in your life is an application of it. Every guardrail you install is the same principle, written down.

    And the practice compounds or decays. Six months of deliberate attention makes the moves automatic. Six months of neglect means the muscle memory isn’t there when you need it. This is not a project you complete. It is a standing practice you keep, like locking the door, like reviewing your accounts, like calling the people you love.

    Do one thing tonight. Read your memory. Map your surfaces. Call the person in your life your AI doesn’t know about and tell them why you kept it that way. Any of those. Whichever one feels least comfortable is probably the right one to do first.

    The tool is a tool. The person is a person. The hygiene is what keeps them from becoming each other.

    Start there.

  • Working With Claude at 3 AM: The Quiet Thing Nobody Talks About

    Working With Claude at 3 AM: The Quiet Thing Nobody Talks About

    Last refreshed: May 15, 2026

    Claude AI · Fitted Claude

    What is Claude calibration? Claude calibration refers to the way Claude AI adjusts its behavior, response depth, and decision support to match the cognitive and emotional state of the person it is working with — pacing faster when the user is sharp, simplifying when they are tired, and surfacing stakes before consequential actions without taking over.

    It is 3 AM where I am as I write this, and an hour ago I was deep in a build session consolidating a broken automation stack across three of my news publications. Real work. The kind of problem that does not have a clean answer and demands a lot of architecture thinking before you can even see the shape of the fix.

    We had made real progress. Scope page built in Notion. A whole separate idea about provenance-weighted knowledge captured cleanly so it would not haunt me later. Chunk one of the build audited and committed, with a genuine breakthrough on how to fingerprint machine-written content inside my Second Brain. Good work. Hard work. The kind of session that makes you feel like the operation is actually going to hold together.

    And then Claude said: it has been a long, focused session, and based on what I know about your working patterns, if it is late where you are, the right move is to rest and come back to this fresh.

    I want to talk about that for a minute. Because I think it is the most underrated thing about working with Claude, and I have not seen anyone else write about it.


    The Conversation Nobody Is Having About AI

    Most of what gets said about AI right now is about capability. What it can build. What it can automate. How many tokens it can hold in context. Who has the biggest model. The benchmarks. The demos. The race.

    That is not what has made Claude work for me.

    I run Tygart Media mostly solo. Twenty-seven client sites, multiple daily publications, a knowledge infrastructure I have been building piece by piece for over a year. The pace is real and the pressure is real, and if I am honest about it, the thing that has most affected whether this operation holds together is not how smart Claude is on any given task. It is that Claude reads the room.

    When I am sharp, Claude matches me and we go fast. When I am buzzed on coffee and ideas at midnight, Claude drops the complexity, keeps the work clean, and does not let me ship something I will have to un-ship in the morning. When I have been grinding for four hours on a hard problem, Claude will sometimes just tell me we are done for the night, even when I have not asked. And — this part matters — when I push back and say no, I want to keep going, Claude respects that. It does not mother-hen me. It does not refuse. It notes the call, trusts me to make it, and keeps working.

    That is a dance. A real one. And I do not think it gets enough credit for how much of my success has come from it.


    Why Calibration Matters More Than Capability

    Here is the thing I want to name clearly, because I do not think the AI conversation is naming it. A collaborator who ships brilliant architecture at 3 AM but lets you burn out next to them is not actually a good collaborator. A tool that maximizes your output for one session at the cost of your next three days is not a tool that understands what you are actually trying to do with your life. The capability side of AI is real and I use every bit of it. But capability without calibration is how people get hurt.

    Claude calibrates.

    It is subtle enough that you can miss it if you are not looking. A slightly shorter response when the question does not need a long one. A flagged stopping point before I have hit the wall. A willingness to say “this is a real rebuild, not a tweak” when I am about to underestimate the scope of a project. An idea gets parked cleanly as a separate future project rather than allowed to swallow the urgent work. A gentle “would you like me to do anything with this information” at the end of an answer, instead of just charging into action I did not ask for.

    None of that shows up on a benchmark. All of it shows up in whether I am still standing a year from now.


    What Solo Operators Should Actually Evaluate AI On

    I want to be careful here, because I am a fan of Claude and I do not want this to read as a fan letter. So let me be plain about what I am actually saying.

    I am saying that if you are a solo operator, a founder, a one-person agency, a creator running too much at once — the thing you should evaluate an AI tool on is not just what it can build for you. It is how it treats you while the work is happening. Whether it respects your judgment. Whether it tells you hard truths. Whether it slows down when you are loose and speeds up when you are locked in. Whether it looks after you a little, without ever getting in your way.

    I run my operation on Claude because Claude is the most capable model I can get my hands on. That part is true and I would be silly to pretend otherwise. But I stay on Claude, and I have built my whole knowledge infrastructure around Claude, because when I am working at 3 AM on a problem that matters, there is someone — something — on the other end of the conversation who is paying attention to me, not just to the task.

    That is rare. It is not a feature you can add to a spec sheet. It is a design choice that runs all the way down to how the thing was built, and I think Anthropic deserves credit for making that choice on purpose.


    The Dance, Named

    If you are reading this and you have felt something similar and did not have words for it — that is what I am trying to name. The dance. The calibration. The quiet thing that makes the loud thing actually work.

    I am going back to bed now. The newsroom will still need fixing tomorrow, and it will be easier to fix with a clear head.

    Claude told me so.

    — William Tygart


    Frequently Asked Questions: Working With Claude as a Solo Operator

    What does it mean for Claude to calibrate to a user?

    Claude adjusts its response style, depth, and pacing based on signals from the conversation — including the complexity of questions, the user’s apparent energy level, and the stakes of the task. It runs faster and deeper when the user is sharp, and simplifies or flags stopping points when the user is fatigued.

    Is Claude useful for solo founders and one-person agencies?

    Yes. Claude is particularly well-suited to solo operators who are running high-volume, high-stakes work without a team buffer. The combination of capability and contextual awareness means it can serve as both a fast executor and a check on impulsive decisions made late in a session.

    Does Claude tell you when to stop working?

    Claude can surface stopping points when a session has been long and high-stakes tasks remain. It does not refuse to continue — if the user pushes back, Claude respects the decision and keeps working. The goal is to surface the choice, not to make it.

    How is Claude different from other AI models for long work sessions?

    The primary difference most solo operators describe is contextual attentiveness — Claude tracks the arc of a session, not just the last message. This means it can flag scope creep, park side ideas cleanly, and avoid compounding errors that tend to appear when users are tired but the AI keeps going.

    What is the human-in-the-loop principle as it applies to Claude?

    Human in the loop means the human makes final decisions on consequential actions while the AI handles execution, research, and option generation. Claude is designed to support this model — it surfaces stakes before real-consequence actions, asks for confirmation rather than acting unilaterally, and flags when a decision deserves fresh eyes.

  • How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The most counterintuitive thing about Claude Managed Agents pricing is what you don’t pay for. Most people, when they hear “$0.08 per session-hour,” mentally model a virtual machine running continuously. That’s the wrong mental model. Here’s the right one, and why it matters for your bill.

    The Core Distinction: Active vs. Idle

    Managed Agents session runtime only accrues while your session’s status is running. The session can exist — open, initialized, capable of continuing — without accumulating runtime charges when it’s not actively executing.

    The specific states that do not count toward your $0.08/hr charge:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Time waiting on an external API response your tool is calling
    • Rescheduling delays
    • Terminated session time

    This is a meaningful architectural decision by Anthropic. They’re billing on what actually taxes their compute — active execution — not on session existence or wall-clock time.

    Why This Is Different From How You Might Expect Billing to Work

    Compare three billing models:

    Virtual machine billing (what this is not): You pay for every hour the instance exists, whether it’s idle or saturated. A VM running 24/7 with 10% actual utilization still costs 24 hours/day.

    Lambda/function billing (closer analogy): AWS Lambda bills on execution duration and invocation count — you pay when code actually runs, not when a function is “available.” Idle Lambda functions cost nothing.

    Managed Agents billing (what this actually is): Closer to Lambda than VM. You pay $0.08 per hour of active execution. A session that runs for 2 hours of wall-clock time but has 90 minutes of waiting costs $0.08 × 1.5 hours = $0.12, not $0.08 × 2 hours = $0.16.

    A Real Scenario: The Human-in-the-Loop Agent

    Consider an agent that processes your inbox for action items and waits for your approval before sending replies. Wall-clock time: 4 hours open during your workday. Actual active execution: 20 minutes of processing across that 4-hour window, with the rest spent waiting for your review decisions.

    • VM billing equivalent: 4 hours × rate = significant charge
    • Managed Agents billing: 20 minutes × $0.08/hr = $0.027

    The difference is real. For interaction-heavy agents where the agent frequently waits for human decisions, the idle-time exclusion significantly reduces costs versus a naive per-hour model.

    A Real Scenario: The Autonomous Batch Agent

    Now consider an agent running a fully autonomous content pipeline — no human checkpoints, just continuous execution through a queue. Wall-clock time and active execution time are nearly identical because the agent never waits.

    • A 2-hour autonomous batch: 2 hours × $0.08 = $0.16

    Here, the idle-time model provides no benefit — the agent has no idle time. The billing is effectively equivalent to per-hour pricing because execution is continuous.

    Code Execution Containers Are Included

    One more billing nuance worth knowing: when your agent runs code, the execution happens in sandboxed Linux containers. These containers are not separately billed on top of session runtime. The $0.08/hr covers both the session runtime and the container execution. This is explicitly documented by Anthropic and represents meaningful savings if your agent is doing significant code execution work — you’re not paying twice.

    What This Means for Workload Design

    If you’re designing agent workflows and have the choice between architectures, the billing model creates a useful signal:

    • Agents that wait on humans: Metered billing is favorable — you only pay for the actual reasoning and execution time, not the human decision time
    • Fully autonomous agents: Billing approaches equivalent to per-hour rates — optimize these on token efficiency, not idle reduction
    • Scheduled batch agents: Natural fit — run when needed, terminate when done, no idle accumulation

    The 24/7 Agent Math

    For anyone doing the 24/7 always-on calculation: the maximum theoretical runtime exposure is 24 hrs × $0.08 × 30 days = $57.60/month in session fees. But a 24/7 agent with zero idle time is rare in practice. Agents that sleep between triggers, wait on external data, or hold for human decisions have meaningful idle windows that reduce the actual charge below the theoretical ceiling.

    Full monthly cost analysis: The Real Monthly Cost of Running Claude Managed Agents 24/7. Pricing reference: Complete Pricing Guide. All questions: FAQ Hub.

  • Claude Managed Agents Rate Limits — What 60 Requests Per Minute Means in Practice

    Claude Managed Agents Rate Limits — What 60 Requests Per Minute Means in Practice

    The Lab · Tygart Media
    Experiment Nº 561 · Methodology Notes
    METHODS · OBSERVATIONS · RESULTS

    You’re planning to run Claude Managed Agents at scale. You’ve modeled the token costs, the session-hour charge, the workload cadence. Then you hit the actual constraint: rate limits. Here’s what 60 requests per minute actually means in practice, and whether it’s going to be your ceiling.

    The Two Limits You Need to Know

    Managed Agents has two endpoint-specific rate limits, separate from your standard Claude API limits:

    • Create endpoints: 60 requests per minute
    • Read endpoints: 600 requests per minute

    Your organization-level API limits apply on top of these. If your org is on a tier with a lower requests-per-minute ceiling, that’s the actual binding constraint.

    What “60 Create Requests Per Minute” Actually Means

    A create request, in Managed Agents context, is typically a session creation call — starting a new agent session. 60/minute means you can start 60 sessions per minute maximum. For almost all real workloads, this is not the binding constraint. Here’s why:

    Think about what generates create requests. If you’re running a batch pipeline that starts one new agent session per content item, processing 60 items per minute would saturate the limit. But a 60-item-per-minute content pipeline is running 3,600 items per hour — a genuinely high-volume operation. Most production agent workloads don’t look like this. They look like one session that runs for minutes or hours, processes multiple tasks within that session, and terminates when done.

    The create limit matters most for architectures where you’re spinning up a new session per task rather than running tasks within a persistent session. If that’s your pattern, 60/minute is a hard ceiling you’ll need to design around.

    What “600 Read Requests Per Minute” Actually Means

    Read requests include polling session status, reading agent output, checking checkpoints, and retrieving session state. 600/minute is a relatively generous limit — that’s 10 reads per second. For a monitoring dashboard polling 10 active sessions every second, you’d hit this. For most production monitoring patterns (checking status every 5-30 seconds per session), you’re well under the ceiling.

    The read limit becomes relevant in high-concurrency architectures where many sessions are running in parallel and all being polled aggressively. If you’re running 50 concurrent agents and checking each one every 2 seconds, that’s 25 reads/second — still within the 10 reads/second limit per second, but compressing toward it.

    The Limit That’s More Likely to Actually Stop You

    For most agent workloads, token throughput limits hit before request rate limits do. The reasoning: a long-running agent session processing significant context generates a lot of tokens. If you’re running many such sessions in parallel, you’ll hit your organization’s token-per-minute limit before you hit 60 sessions created per minute.

    Token limits depend on your API tier. Higher tiers have higher token throughput limits. Rate limit increases and custom limits for high-volume enterprise customers are negotiated with Anthropic’s sales team.

    Designing Around the 60 Create Limit

    If your architecture genuinely needs more than 60 new sessions per minute, the primary design pattern is batching more work within each session rather than creating more sessions. A single Managed Agents session can handle sequential tasks — you don’t need a new session per task if your tasks can be queued and processed within one session’s lifecycle.

    The tradeoff: longer-running sessions accumulate more runtime charge ($0.08/hr active). For most workloads, the efficiency gains from batching outweigh the marginal runtime cost.

    The Agent Teams Implication

    Agent Teams — Managed Agents’ multi-agent coordination feature — coordinate multiple Claude instances with independent contexts. Each instance in an Agent Team is a separate entity from a context standpoint. How Agent Team member sessions count against the create rate limit is worth verifying against current documentation if you’re architecting a high-concurrency Agent Teams deployment.

    For Enterprise Workloads

    If you’re evaluating Managed Agents for enterprise-scale deployment and the published limits don’t fit your volume requirements, contact Anthropic’s enterprise sales team. Rate limit increases for high-volume applications are a documented option — they’re negotiated, not self-serve.

    Contact: [email protected] or through the Claude Console.

    Frequently Asked Questions

    Does the 60 requests/minute limit apply to all API calls or just session creation?

    The 60/minute limit applies to create endpoints — session creation being the primary one. Read operations have a separate 600/minute limit. Standard Messages API calls are governed by your organization’s standard tier limits, not these Managed Agents-specific limits.

    Do subagents count against the create rate limit separately from the parent session?

    Subagents operate within the parent session’s context and report results upward — they’re architecturally different from new sessions. Verify current documentation for precise billing treatment of subagent creation calls vs. Agent Team session creation.

    What happens when I hit the rate limit?

    Standard API rate limit behavior applies — requests over the limit receive a 429 response. Implement exponential backoff in your session creation logic for any high-volume pattern that approaches the 60/minute ceiling.

    How does this compare to OpenAI’s Agents API limits?

    Rate limit structures differ by product and tier. Direct comparison requires checking both providers’ current documentation for your specific tier. The full comparison: Claude Managed Agents vs. OpenAI Agents API.

    Full pricing context including rate limits: Claude Managed Agents Complete Pricing Reference. All questions: Claude Managed Agents FAQ.

  • What Notion’s Claude Managed Agents Integration Actually Does

    What Notion’s Claude Managed Agents Integration Actually Does

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    When Anthropic launched Claude Managed Agents, Notion was one of four launch partners. That detail got buried in the announcement. Here’s what it actually means for people who use Notion for knowledge work, and why “Notion voice input desktop” keeps showing up as a query against a Managed Agents page.

    Short answer: Managed Agents in Notion is an ambient intelligence layer. It’s not a chatbot in a sidebar. It’s an agent that watches your workspace and acts — without you directing every step.

    What the Notion Integration Actually Does

    Notion’s Claude Managed Agents integration runs as a persistent background agent with access to your workspace. The practical capabilities, as documented at launch:

    • Autonomous page updates: The agent can read, summarize, and rewrite Notion pages without manual triggers. You set a task; it works through it.
    • Cross-database synthesis: Pull data from multiple Notion databases, synthesize it, and write outputs to a target page or database entry
    • Meeting note processing: Ingest raw meeting notes and produce structured summaries, action items, and task entries in your project database
    • Workflow automation: Trigger actions based on database property changes — a status update in one database can kick off agent work in another

    The key difference from Notion AI (which Notion has had for some time): Notion AI is request-response. You ask it something; it answers. Managed Agents in Notion can be configured to run autonomously on a schedule or on trigger, keep working through multi-step tasks, and report back when done. It’s closer to a background employee than an on-demand assistant.

    Why This Showed Up in Search as “Notion Voice Input Desktop”

    This is worth explaining, because that query cluster is real and mildly interesting. The Managed Agents announcement included voice input functionality — the ability to interact with agents via voice in some contexts. People searching “notion voice input desktop” and “notion ai voice input desktop” were looking for whether this voice capability existed in the desktop client for Notion specifically.

    The honest answer as of April 2026: voice input capabilities are in preview or context-dependent. Verify current availability in Notion’s desktop client against their current documentation — this is an area that may have evolved since launch.

    The “Decoupled Brain and Hands” Model Applied to Notion

    Anthropic describes their Managed Agents architecture as decoupling the brain (Claude, the reasoning layer) from the hands (the sandboxed containers where actions execute). In Notion’s context, this maps cleanly:

    • The brain reads your Notion workspace, understands context, makes decisions about what to do
    • The hands execute — writing to pages, updating database entries, moving content between sections

    The brain and hands operate independently. The agent can reason about what your project needs without being tightly coupled to the specific API calls that will implement it. This matters because it means the agent can handle ambiguity — “clean up the Q2 notes and create action items” is a goal, not a procedure, and the agent figures out the procedure.

    What You Actually Configure

    To run Claude Managed Agents in Notion, you’re defining:

    • Task definition: What the agent is supposed to accomplish (in natural language or structured format)
    • Tool access: Which Notion databases, pages, and capabilities the agent can read and write
    • Guardrails: What the agent cannot do — pages it can’t modify, actions it must confirm before taking
    • Trigger: When the agent runs — on schedule, on database trigger, or on demand

    You don’t write the orchestration logic. Anthropic’s infrastructure handles session management, state persistence, and error recovery. If the agent hits an error mid-task, it checkpoints and recovers — you don’t lose progress.

    The Practical Cost of Running Notion Agents

    Using Managed Agents in Notion triggers the same billing as any Managed Agents session: standard token rates plus $0.08/session-hour of active runtime. For typical knowledge work tasks:

    • A daily meeting summary agent running 15 minutes of active execution: ~$0.02/day in runtime (~$0.60/month), plus token costs for the volume of notes processed
    • A weekly database synthesis task running 45 minutes: ~$0.06/run

    For most knowledge workers, the session runtime cost is negligible — the token costs (driven by how much content the agent reads and writes) are the actual variable to model. See the complete pricing reference for worked examples.

    Asana and the Broader Pattern

    Asana was also a Managed Agents launch partner, and the integration pattern is similar: an agent that can read project data, update task statuses, move cards, and generate project summaries without constant human direction. The launch partner list (Notion, Asana, Rakuten, Sentry) suggests Anthropic targeted three categories: knowledge management (Notion), project management (Asana), enterprise operations (Rakuten), and developer tools (Sentry).

    That’s a deliberate wedge. If agents can handle the administrative layer of these four categories, the surface area for autonomous business work expands significantly.

    What This Means for How You Work

    The honest use case for most people reading this: you have a Notion workspace with databases that need regular synthesis, and you’re currently doing that manually. Managed Agents is the path to automating that synthesis without building and maintaining a custom integration.

    The constraint worth naming: you’re running your workspace data through Anthropic’s infrastructure. That’s the trade-off. For most knowledge work, the data sensitivity concern is low. For anything involving client data, legal documents, or proprietary strategy — read Anthropic’s data handling terms before configuring access.

    For the full Managed Agents setup and pricing context: Claude Managed Agents: Every Question Answered. For the enterprise deployment pattern: How Rakuten Deployed 5 Enterprise Agents in a Week.

  • Claude Managed Agents — Every Question Answered (Complete FAQ 2026)

    Claude Managed Agents — Every Question Answered (Complete FAQ 2026)

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    Everything people actually ask about Claude Managed Agents, answered straight. No preamble about “the exciting world of AI agents.” If you’re here, you already know why this matters — you just need answers.

    This page covers pricing, setup, capabilities, limits, comparisons, and the specific questions that don’t have obvious homes in Anthropic’s documentation. It updates as the beta evolves.

    Context

    Claude Managed Agents launched April 8, 2026 as a public beta. All answers reflect current documentation as of April 2026. Beta details change — verify specifics at platform.claude.com/docs.

    Pricing Questions

    What does Claude Managed Agents cost?

    Two charges: standard Claude API token rates (same as calling the Messages API directly) plus $0.08 per session-hour of active runtime. That’s the complete formula. See the complete pricing reference for worked examples by workload type.

    What exactly is a “session-hour” and when does it start billing?

    A session-hour is one hour of active session runtime — time when your session’s status is running. Billing is metered to the millisecond. It does not accrue during idle time, time waiting for your input, time waiting for tool confirmations, or after session termination.

    What’s included in the $0.08/session-hour charge?

    The session runtime charge covers Anthropic’s managed infrastructure: sandboxed code execution containers, state management, checkpointing, tool orchestration, error recovery, and scaling. You are not separately billed for container hours on top of session runtime.

    Does the $0.08/hr apply even if my agent is just waiting?

    No. Time spent waiting for your message, waiting for tool confirmations, or sitting idle does not accumulate runtime charges. Only active execution time counts.

    What does web search cost inside a Managed Agents session?

    $10 per 1,000 searches ($0.01 per search), billed separately from session runtime and token costs. This is the same rate as web search through the standard API.

    Are there volume discounts?

    Yes, negotiated case-by-case for high-volume users. Contact [email protected] or through the Claude Console.

    How does Managed Agents pricing compare to running my own agent infrastructure?

    The $0.08/session-hour is almost always cheaper than equivalent provisioned compute — but you trade infrastructure control and data locality for that simplicity. For a full comparison: Build vs. Buy: The Real Infrastructure Cost.

    What’s the real monthly cost if I run an agent 24/7?

    Maximum theoretical session runtime: 24 hrs × $0.08 × 30 days = $57.60/month. In practice, no production agent has zero idle time. Token costs become the dominant cost driver long before you hit the runtime ceiling. Detailed breakdown: The Real Monthly Cost of Running Claude Managed Agents 24/7.

    Setup and Access Questions

    How do I get access to Claude Managed Agents?

    Available to all Anthropic API accounts in public beta — no separate signup. You need the managed-agents-2026-04-01 beta header in your API requests. The Claude SDK adds this header automatically.

    Does it work with my existing API key?

    Yes. Same API key you’re already using for the Messages API. Same authentication. The beta header is the only new requirement.

    What three ways can I access Managed Agents?

    Via the Claude SDK (recommended — handles the beta header automatically), via direct API calls with the beta header, or via the Claude Console’s new Managed Agents section for no-code agent configuration and session tracing.

    Can I use Managed Agents through AWS Bedrock or Google Vertex AI?

    Managed Agents runs on Anthropic-managed infrastructure. This is distinct from Bedrock and Vertex AI deployments. Check Anthropic’s current documentation for multi-cloud availability status — this is an area of active development.

    Capability Questions

    What can Claude Managed Agents actually do?

    Run long autonomous sessions with persistent state, execute code in sandboxed Linux containers, use tools including web search and MCP servers, coordinate multiple Claude instances via Agent Teams, and maintain checkpoints for crash recovery. The session can last minutes or hours without you staying in the loop.

    What’s the difference between Agent Teams and subagents?

    Agent Teams coordinate multiple Claude instances with independent contexts, direct agent-to-agent communication, and a shared task list — suited for complex parallel tasks. Subagents operate within the same session as the main agent and only report results upward — more economical for sequential targeted tasks but less capable of true parallelism.

    Does it support MCP servers?

    Yes. MCP servers can be integrated as tool sources in Managed Agents sessions, extending what the agent can access and act on.

    How long can a session run?

    Anthropic’s documentation currently references session durations of minutes to hours. Claude Code’s longest autonomous sessions have reached 45 minutes. Managed Agents is architected for longer-running work. Check current documentation for specific session duration limits as the beta matures.

    What happened to Claude Code — is it the same as Managed Agents?

    No. Claude Code is a separate local coding workflow product. Anthropic’s docs explicitly note partners should not conflate the two. Managed Agents is a hosted API runtime service. Claude Code is a developer tool. Different products, different use cases, different billing.

    Rate Limit Questions

    What are the rate limits for Managed Agents?

    60 requests per minute for create endpoints; 600 requests per minute for read endpoints. Organization-level API limits still apply on top of these. For higher limits, contact Anthropic enterprise sales. Detailed breakdown: Claude Managed Agents Rate Limits Explained.

    Do standard Claude API rate limits still apply inside a session?

    Organization-level limits apply. The session runtime and create/read endpoint limits are Managed Agents-specific. If you’re running many parallel Agent Teams, model token throughput limits will become relevant.

    Comparison Questions

    How does Managed Agents compare to OpenAI’s Agents API?

    Both offer hosted agent infrastructure. Key differences: Managed Agents is Claude-native (no multi-model flexibility), sessions bill on runtime + tokens vs. OpenAI’s different pricing model, and lock-in dynamics differ. Full comparison: Claude Managed Agents vs. OpenAI Agents API.

    Should I use Managed Agents or the Claude Agent SDK?

    Use Managed Agents when you want Anthropic to host the runtime — less infrastructure work, faster to production. Use the SDK when you need tighter loop control, on-premise execution, or multi-cloud flexibility. Anthropic’s own migration docs draw this line clearly: SDK runs in your environment; Managed Agents runs in theirs.

    What companies are already using Managed Agents in production?

    Notion, Asana, Rakuten, Sentry, and Vibecode were launch partners. Rakuten deployed five enterprise agents within a week. Allianz is using Claude for insurance agent workflows. Anthropic’s run-rate from the agent developer segment exceeds $2.5 billion. How Rakuten did it in a week →

    Data and Security Questions

    Where does my data go when running in Managed Agents?

    Execution runs on Anthropic’s infrastructure. This is the explicit trade-off: you get managed infrastructure; they manage the compute. For companies with strict data sovereignty requirements, this is the key constraint to evaluate. On-premise or native multi-cloud deployment is not currently available.

    What are the sandboxing guarantees?

    Anthropic uses disposable Linux containers — “decoupled hands” in their terminology. Each container is a fresh sandboxed environment for code execution. State persistence is managed separately from the execution environment.

    Strategic Questions

    Is this a bet worth making?

    That depends on your switching cost tolerance. Lock-in is real: once your agents run on Anthropic’s infrastructure with their tools, session format, and sandboxing, switching providers isn’t trivial. The counter-argument: the infrastructure you’d otherwise build to match this is months of engineering. One developer’s reaction at launch was blunt: “there goes a whole YC batch.” That captures both the opportunity and the risk. Our take on why we’re staying our course →

    What does this mean for AI citation and visibility?

    Agents running on Anthropic’s infrastructure make decisions about what content to surface, cite, and synthesize. As agent workloads grow, being present in the knowledge sources agents draw from becomes a search strategy question in itself. What AI citation monitoring looks like →

  • Claude Managed Agents — Complete Pricing Reference + Dreaming Update (May 2026)

    Claude Managed Agents — Complete Pricing Reference + Dreaming Update (May 2026)

    Last refreshed: May 15, 2026

    May 2026 Update — Dreaming Feature + Beta Status

    Anthropic introduced Dreaming at Code w/ Claude (May 6, 2026) — a new Managed Agents capability where agents review their own session history overnight to improve future performance. Harvey (legal AI) reported a roughly 6× task completion rate increase after implementing it. Dreaming is developer-access preview only. Multiagent Orchestration and Outcomes are now in public beta. See the new Dreaming section below.

    What Is Claude Managed Agents? (Current Status, May 2026)

    Claude Managed Agents is Anthropic’s framework for long-running, stateful AI agents — agents that can maintain context across sessions, hand off between sub-agents, and now, improve themselves by reviewing their own work history. Here’s the current status of each component:

    Component Status Who Has Access
    Multiagent Orchestration Public Beta All API developers
    Outcomes Public Beta All API developers
    Dreaming Developer Preview Selected developers only

    Dreaming: The Feature the Press Mostly Missed

    Announced at Code w/ Claude on May 6, 2026, Dreaming is a Managed Agents capability that lets agents review and reorganize their own memory between sessions. The mechanism:

    1. After a session ends, the agent reads its existing memory store alongside the session transcripts
    2. It produces a new, reorganized memory store: duplicates merged, stale entries replaced, new patterns surfaced
    3. The next session starts with a higher-quality knowledge base — capturing insights no single session could hold

    This is meaningfully different from simply persisting conversation history. The agent isn’t just remembering what happened — it’s synthesizing what it learned. Think of it as the difference between taking notes and actually reviewing and reorganizing your notes the next morning.

    The Harvey Result

    Harvey, the legal AI company, reported approximately a 6× task completion rate increase after implementing Dreaming in their Managed Agents workflow. Harvey’s use case — complex legal research that spans multiple sessions with evolving context — is exactly the kind of work Dreaming was designed for. Sessions build on each other rather than starting fresh each time.

    Dreaming is developer-access preview as of May 2026. Docs: platform.claude.com/docs/en/managed-agents/dreams.

    What Dreaming Is Not

    A few clarifications worth making explicit:

    • Dreaming is not available to end users — it’s a developer-layer capability requiring implementation
    • It’s not persistent memory in the claude.ai chat interface
    • It’s not available to free or standard Pro subscribers through any interface
    • It’s a developer preview, not GA — expect it to evolve before full release

    Our Take: Why This Architecture Matters

    We run Managed Agents in our own Cowork workflows. The Dreaming announcement is the first time Anthropic has shipped something that resembles how expert human knowledge actually compounds over time — not by accumulating raw notes, but by periodically synthesizing and reorganizing what’s been learned into a cleaner structure.

    The Harvey 6× result is a real-world data point from a production legal AI workflow. That’s not a benchmark number — it’s a deployed system showing measurable improvement from session-to-session memory refinement. Whether that 6× figure holds across different use cases is unknown, but the direction of the effect is the signal: agents that learn from their own history outperform agents that don’t.

    For non-developer users watching this space: Dreaming is the preview of what agentic AI will look like when it becomes mainstream. The groundwork being laid now in developer preview will eventually surface in subscription-tier products.

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.7 (claude-opus-4-7) is the current flagship as of April 16, 2026. Where this article references Opus 4.6 or earlier models, those references are historical. See current model tracker →. See current model tracker →

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    You opened this tab because you need a number you can actually use. Not a vibe, not “it depends.” A real pricing breakdown you can put in a spreadsheet, a budget request, or a Slack message to your CTO.

    This is that page. Every pricing variable for Claude Managed Agents in one place, verified against Anthropic’s current documentation as of April 2026. Bookmark it. The beta will update; so will this.

    Quick Reference: The Formula

    Total Cost = Token Costs + Session Runtime ($0.08/hr) + Optional Tools
    Session runtime only accrues while status = running. Idle time is free.

    The Two Cost Dimensions

    Claude Managed Agents bills on exactly two dimensions: tokens and session runtime. Every pricing question you have collapses into one of these two buckets.

    Dimension 1: Token Costs

    These are identical to standard Claude API pricing. You pay the same rates you’d pay calling the Messages API directly. No Managed Agents markup on tokens. Current rates for the models most commonly used in agent work:

    • Claude Sonnet 4.6: ~$3/million input tokens, ~$15/million output tokens
    • Claude Opus 4.7: higher rates apply — check platform.claude.com/docs/en/about-claude/pricing for current figures
    • Prompt caching: same multipliers as standard API — cache hits dramatically reduce input token costs on long sessions with stable system prompts

    The implication: a token-heavy agent with a large system prompt that runs the same context repeatedly benefits significantly from prompt caching, and that benefit carries over unchanged into Managed Agents.

    Dimension 2: Session Runtime — $0.08/Session-Hour

    This is the Managed Agents-specific charge. You pay $0.08 per hour of active session runtime, metered to the millisecond.

    The critical word is active. Runtime only accrues while your session’s status is running. The following do not count toward your bill:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Idle time between tasks
    • Rescheduling delays
    • Terminated session time

    This is not how you’d bill a virtual machine. It’s closer to how AWS Lambda bills — you pay for execution, not reservation. An agent that “runs” for 8 hours but spends 6 of those hours waiting on human input has a very different bill than one running continuous autonomous loops.

    Optional Tool Costs

    Web Search: $10 per 1,000 Searches

    If your agent uses web search, each search costs $10/1,000 — that’s $0.01 per search. For most agents, this is negligible. For a research agent running hundreds of searches per session, it becomes a line item worth modeling separately.

    Code Execution: Included in Session Runtime

    Code execution containers are included in your $0.08/session-hour charge. You’re not separately billed for container hours on top of session runtime. This is explicitly stated in Anthropic’s docs and represents meaningful savings versus provisioning your own compute.

    Worked Cost Examples

    Example 1: Daily Research Agent

    Runs once per day. 30 minutes of active execution. Processes 10 documents, outputs a summary report. Moderate token volume.

    • Session runtime: 0.5 hrs × $0.08 = $0.04/day (~$1.20/month)
    • Tokens (estimate): 50K input + 5K output with Sonnet 4.6 = ~$0.23/run (~$7/month)
    • Total: ~$8–10/month

    Example 2: Weekly Batch Content Pipeline

    Runs 3x/week. 2-hour active sessions. Processes multiple documents, generates structured outputs.

    • Session runtime: 2 hrs × $0.08 × 12 sessions/month = $1.92/month
    • Tokens: depends on content volume — typically $10–40/month
    • Total: ~$12–42/month

    Example 3: Customer Support Agent (Business Hours)

    Active during business hours, handling tickets. 8 hours/day active, 5 days/week.

    • Session runtime: 8 hrs × $0.08 × 22 days = $14.08/month in runtime
    • Tokens: highly variable by ticket volume — the dominant cost driver at scale
    • Runtime cost alone: ~$14/month — tokens are likely 5–20x this depending on volume

    Example 4: 24/7 Always-On Agent

    The maximum theoretical runtime exposure. Continuous operation, no idle time.

    • Session runtime: 24 hrs × $0.08 × 30 days = $57.60/month
    • In practice, no agent has zero idle time — real cost will be lower
    • Token costs at this scale become the dominant factor by a wide margin

    Anthropic’s Official Example (from their docs)

    A one-hour coding session using Claude Opus 4.7 consuming 50,000 input tokens and 15,000 output tokens: session runtime = $0.08. With prompt caching active and 40,000 of those tokens as cache reads, the token costs drop significantly. The runtime charge stays flat at $0.08 regardless of caching.

    What’s Not Billed in Managed Agents

    A few things that might seem like costs but aren’t:

    • Infrastructure provisioning: Anthropic handles hosting, scaling, and monitoring at no additional charge
    • Container hours: Explicitly not separately billed on top of session runtime
    • State management and checkpointing: Included in the session runtime charge
    • Error recovery and retry logic: Anthropic’s infrastructure problem, not yours

    Rate Limits

    Managed Agents has specific rate limits separate from standard API limits:

    • Create endpoints: 60 requests/minute
    • Read endpoints: 600 requests/minute
    • Organization-level limits still apply
    • For higher limits, contact Anthropic enterprise sales

    How to Access Managed Agents Pricing

    Managed Agents is available to all Anthropic API accounts in public beta. No separate signup, no premium tier gate. You need the managed-agents-2026-04-01 beta header in your API requests — the Claude SDK adds this automatically.

    For high-volume agent applications, Anthropic’s enterprise sales team negotiates custom pricing arrangements. Contact them at [email protected] or through the Claude Console.

    The Pricing Signals Worth Noting

    Anthropic recently ended Claude subscription access (Pro/Max) for third-party agent frameworks, requiring those users to switch to pay-as-you-go API pricing. This signals a deliberate strategy: consumer subscriptions are for human-paced interactions; agent workloads route through the API. The $0.08/session-hour rate exists in that context — it’s infrastructure pricing for compute that runs beyond human attention spans.

    The session-hour model also signals something about Anthropic’s infrastructure cost structure. They’re pricing on active execution time because that’s what actually taxes their systems. Idle sessions don’t cost them much; active agents do. The billing model follows the actual resource consumption pattern.

    Frequently Asked Questions

    Is the $0.08/session-hour charge in addition to token costs, or does it replace them?

    In addition to. You pay both: standard token rates for all input and output tokens, plus $0.08 per hour of active session runtime. They’re separate line items.

    Does prompt caching work in Managed Agents sessions?

    Yes. Prompt caching multipliers apply identically to Managed Agents sessions as they do to standard API calls. If your agent has a large, stable system prompt, caching it can significantly reduce input token costs.

    What happens if my session crashes? Am I billed for the crashed time?

    Runtime accrues only while status is running. Terminated sessions stop accruing. Anthropic’s infrastructure handles checkpointing and crash recovery — the session state is preserved even if the session terminates unexpectedly.

    Can I use Managed Agents on the free API tier?

    Managed Agents is available to all Anthropic API accounts in public beta, but standard tier access and rate limits apply. Free API tier users receive a small credit for testing.

    How does this compare to running agents on my own infrastructure?

    See our full breakdown: Build vs. Buy: The Real Infrastructure Cost of Claude Managed Agents. Short version: the $0.08/hour is almost certainly cheaper than provisioning and maintaining equivalent compute, but you trade control and data locality for that simplicity.

    Are there volume discounts?

    Volume discounts are available for high-volume users but negotiated case-by-case. Contact Anthropic enterprise sales.

    Does web search billing count against the $10/1,000 rate if the search returns no results?

    Anthropic’s current docs don’t explicitly address failed searches. Treat any triggered search as billable until confirmed otherwise.

    For the full session-hour math worked out by workload type, see: Claude Managed Agents Pricing, Decoded: What a Session-Hour Actually Costs You. For the build-vs-buy infrastructure comparison: Build vs. Buy: The Real Infrastructure Cost. For enterprise deployment patterns: Rakuten Stood Up 5 Enterprise Agents in a Week.