What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

I run both Notion Custom Agents and Claude every working day. I have opinions about when each one earns its place and when each one doesn’t. This article is those opinions, named clearly, with no vendor fingers on the scale.

Most comparative writing about AI tools is written by people with an incentive to recommend one over the other — affiliate programs, platform partnerships, the writer’s own consulting practice specializing in one side. This piece doesn’t have that problem. I use both, I pay for both, and if one of them got replaced tomorrow, the pattern I run would survive with a different tool slotted into the same role. The tools are interchangeable. The judgment about which one to reach for is not.

Here’s the honest map.


The short version

Use Notion Custom Agents when: the work is a recurring rhythm, the context lives in Notion, the output is a Notion page or database change, and you’re willing to spend credits on it running in the background.

Use Claude when: the work needs real judgment, the context is complex or contested, the output is something that needs a human’s voice and review, or the workflow crosses enough systems that the agent’s world is too small.

Those two sentences will save most operators ninety percent of the architecture mistakes I see people make. The rest of this article is specificity about why, because general rules only take you so far before you need to know what’s actually going on under the hood.


Where Notion Custom Agents genuinely shine

I’m going to start with the positive because anyone who only reads the critical part of a comparative article will walk away with a warped picture. Custom Agents are genuinely impressive when they fit the job.

Recurring synthesis tasks across workspace data. The daily brief pattern I’ve written about works better in a Custom Agent than in Claude. The agent runs on schedule, reads the right pages, writes the synthesis back into the workspace, and is done. Claude can do this too, but Custom Agents do it without you remembering to prompt them. That’s the whole point of the “autonomous teammate” framing, and for rhythmic synthesis work, it genuinely delivers.

Inbox triage. An agent watching a database with a clear decision tree — categorize incoming requests, assign a priority, route to the right owner — is a sweet-spot Custom Agent. It does the boring sort every day, flags the ones it’s unsure about, and keeps the pile from growing. Real teams are reportedly triaging at over 95% accuracy on inbound tickets with this pattern.

Q&A over workspace knowledge. Agents that answer company policy questions in Slack or provide onboarding guidance for new hires are quietly some of the most valuable agents in production. They replace hours of repetitive answer-the-same-question work, and because the answers come from actual workspace content, the accuracy is high when the workspace is well-maintained.

Database enrichment. An agent that watches for new rows in a database, looks up additional context, and fills in fields automatically is a beautiful fit. The agent is doing deterministic-adjacent work with just enough judgment to handle edge cases. This is exactly what Custom Agents were designed for.

Autonomous reporting. Weekly sprint recaps, monthly OKR reports, Friday retrospectives. Reports that would otherwise require someone to sit down and write them, now drafted automatically from the workspace state.

For these categories, Custom Agents are the right tool, and Claude is the wrong tool even though Claude would technically work. The wrong-tool-even-though-it-works framing matters because operators often default to Claude for everything, which is expensive in different ways.


Where Notion Custom Agents break down

Now the honest part. Custom Agents have real limits, and pretending otherwise is how operators get burned.

1. Anything that requires serious reasoning across contested information

Custom Agents are capable of synthesis, but the quality of their synthesis degrades when the inputs disagree with each other, when the right answer isn’t on the page, or when the task requires actually thinking through a problem rather than summarizing existing context.

The signal that you’ve hit this limit: the agent produces an output that sounds plausible, reads well, and is subtly wrong. If you need to double-check every agent output in a category of work because you can’t trust the judgment, that category of work shouldn’t be going through an agent. Use Claude in a conversation where you can actually interrogate the reasoning.

Specific examples where this shows up: strategic decisions, conflicting client feedback, legal or compliance-adjacent questions, anything that involves weighing tradeoffs. The agent will produce an answer. The answer will often be wrong in a specific way.

2. Long-horizon work that needs to hold nuance across steps

Custom Agents are designed for bounded tasks with clear inputs and clear outputs. When you try to use them for work that requires holding nuance across many steps — drafting a long document, executing a multi-stage strategic plan, navigating a complex workflow — the wheels come off.

Part of this is architectural: agents have limited ability to carry state across runs in the way an extended Claude conversation can. Part of it is practical: the “one agent, one job” principle Notion itself recommends is a hard constraint, not a style guideline. When you try to make an agent do multiple things, you get an agent that does each of them worse than a single-purpose agent would.

If the job you’re thinking about is genuinely one coherent thing that happens to have many steps, and the steps inform each other, it’s probably a Claude conversation, not a Custom Agent.

3. Work that needs a specific human voice

This one is more important than most operators realize. Agents write in a synthesized style. It’s a perfectly fine style. It’s also recognizable as a perfectly fine style, which is the problem.

If the output is going to have your name on it — client communications, thought leadership, outbound that should sound like you — the agent’s default voice will flatten whatever was distinctive about your writing. You can push back on this with instructions, and good instructions help a lot. But the underlying truth is that Custom Agents optimize for “sounds like a competent business writer,” and competent business writing is a commodity. If you sell distinctiveness, the agent is a liability.

Claude in a conversation, with your active voice-shaping, produces writing that can actually sound like you. Custom Agents optimize for a different thing.

4. Anything requiring real-time web context

Custom Agents can reach external tools via MCP, but they don’t have a general ability to browse the live web and integrate what they find into their reasoning. If the work requires recent news, real-time market data, or anything that isn’t in a known database the agent can query, the agent will either fail, hallucinate, or return stale information from whatever workspace snapshot it had.

Claude — with web search enabled, with the ability to fetch arbitrary URLs, with research capabilities — handles this class of work dramatically better. The right architectural response: use Claude for anything with a live-web dependency, let Custom Agents handle the parts that don’t.

5. Deep technical work

Custom Agents can technically do technical work. They should mostly not be asked to. Writing code, debugging failures, analyzing logs, reasoning through system architecture — these live in Claude Code’s territory, not Custom Agents’ territory. The Custom Agent framework was built for operational workflows, and while it will attempt technical tasks, it attempts them at the quality of a generalist, not a specialist.

The sign you’ve crossed this line: the agent is producing code or technical reasoning that a competent human reviewer would push back on. Move the work to Claude Code, which was built for exactly this.

6. High-stakes writes with permanent consequences

Agents execute. They don’t second-guess themselves. An agent configured to send emails will send emails. An agent configured to update client records will update client records. An agent configured to delete rows will delete rows.

When the cost of the agent doing the wrong thing is high — sending a message you can’t unsend, overwriting data you can’t recover, triggering a payment you can’t reverse — the discipline is: don’t let the agent do it without human approval. Use “Always Ask” behavior. Use a draft-and-review pattern. Use anything that puts a human in the loop before the irreversible action.

Operators who ship fast and iterate freely tend to underweight this category. The day you discover it’s been quietly overwriting the wrong database field for two weeks is the day you wish you’d built the review gate.

7. Credit efficiency for genuinely reasoning-heavy work

This one is practical rather than architectural. Starting May 4, 2026, Custom Agents run on Notion Credits at roughly $10 per 1,000 credits. Internal Notion data suggests Custom Agents run approximately 45–90 times per 1,000 credits for typical tasks — meaning tasks that require more steps, more tool calls, or more context cost proportionally more credits per run. That means simple recurring tasks are cheap. Complex reasoning-heavy tasks add up.

If you’re building an agent that does heavy reasoning work many times per day, the credit cost can exceed what the same work would cost through Claude’s API directly, especially on higher-capability Claude models called directly without the Notion overhead. For high-frequency reasoning work, run the math before you commit to the agent architecture.


Where Claude genuinely wins

The other side of the honest comparison. Claude earns its place in categories where Custom Agents either can’t operate or operate poorly.

Strategic thinking conversations. When you’re working through a decision, evaluating a tradeoff, or thinking through a strategy, Claude in an extended conversation is the right tool. The back-and-forth is the whole point. You can interrogate reasoning, push back on conclusions, reframe the problem mid-conversation. An agent that produces a one-shot answer, no matter how good, is the wrong shape for this kind of work.

Drafting with voice. Writing that needs to sound like a specific person is Claude’s territory. You can load up Claude with context about your voice — past writing, tonal preferences, things to avoid — and get output that actually reads as yours. Notion Custom Agents will always produce generic-flavored writing. That’s fine for internal reports. It’s a problem for anything external.

Code and technical work. Claude Code specifically is built for technical depth. It reads codebases, executes in a terminal, calls tools, iterates on failures. Custom Agents will flail at the same work.

Research synthesis across live sources. Claude with web search and fetch capabilities handles “go read this, this, and this, and tell me what the current state actually is” in a way Custom Agents structurally can’t. Anything that requires reaching outside a known data universe is Claude.

Work that crosses many systems. When a workflow needs to touch code, Notion, a database, an external API, and a human review, Claude Code with the right MCP servers connected coordinates across them better than a Custom Agent inside Notion does. The agent’s world is Notion-plus-connected-integrations. Claude’s world is wider.

Anything requiring judgment about whether to proceed. Agents execute. Claude in a conversation can pause, check with you, and ask “should I actually do this?” That judgment layer is frequently the most important part of the workflow.


The pattern that actually works (both, in the right places)

The operators who get this right aren’t choosing one tool over the other. They’re running both, in specific roles, with clear handoffs.

The pattern I run:

Rhythmic operational work lives in Custom Agents. Morning briefs, triage, weekly reviews, database enrichment, Q&A over workspace knowledge. Things that happen repeatedly, have clear inputs, and produce workspace-shaped outputs.

Judgment-heavy work lives in Claude conversations. Strategic decisions, drafting with voice, research, anything requiring back-and-forth. I do this work in Claude chat sessions with the Notion MCP wired in, so Claude has real context when I need it to.

Technical work lives in Claude Code. Building scripts, managing infrastructure, debugging, writing code. Custom Agents don’t touch this.

Handoffs are explicit. When I make a decision in Claude that needs to become operational, it lands as a task or brief in a Notion database, and from there a Custom Agent can pick it up. When a Custom Agent surfaces something that needs judgment, it creates an escalation entry that shows up on my Control Center, where I engage Claude to think through it.

The two systems pass work back and forth through the workspace. Neither tries to do the other’s job. The seams are the Notion databases where state lives.

This is not the vendor-shaped pattern. The vendor-shaped pattern says “Custom Agents can handle everything.” The operator-shaped pattern says “Custom Agents handle what they’re good at, and when the work exceeds their reach, another tool takes over with a clean handoff.”


The decision tree, when you’re not sure

For a specific piece of work, run these questions in order. Stop at the first “yes.”

Does this task need a specific human voice, or could it be written by any competent person? If it needs your voice, reach for Claude. If it doesn’t, move on.

Does this task require reasoning across contested or ambiguous information? If yes, Claude. If no, move on.

Does this task need real-time web context, live external data, or information not already in a known database? If yes, Claude. If no, move on.

Does this task involve code, system architecture, or technical depth? If yes, Claude Code. If no, move on.

Does this task have high-stakes irreversible consequences? If yes, wrap it in a human-approval gate — either run it through Claude where the human is in the loop, or use Custom Agents with “Always Ask” behavior.

Does this task happen repeatedly on a schedule or in response to workspace events? If yes, Custom Agent. This is the sweet spot.

Is the output a Notion page, database row, or something that stays in the workspace? If yes, Custom Agent is usually the right call.

Is the task bounded enough that it could be described in a couple of clear sentences? If yes, Custom Agent. If it’s sprawling, it’s probably too big for an agent.

If you’re through the tree and still not sure, default to Claude. Claude is more expensive in money and cheaper in hidden cost than a Custom Agent running the wrong job.


The failure modes I’ve seen

Specific patterns that go wrong, in my observation:

The “agent for everything” operator. Someone who just got access to Custom Agents and is building agents for tasks that don’t need agents. The agents mostly work. The ones that mostly work waste credits on tasks a template or a simple automation would handle. The ones that partially work produce quiet low-grade mistakes that accumulate.

The “Claude for everything” operator. The inverse. Someone who got comfortable with Claude and hasn’t made the leap to letting agents handle the rhythmic work. They’re paying the context-loss tax every morning, doing the triage manually, writing every brief from scratch. Claude is too expensive a tool — in attention, if not dollars — to run routine work through.

The operator who built one giant agent. Custom Agents are meant to be narrow. Someone violates the “one agent, one job” principle by building an agent that does inbox triage and database updates and weekly reports and client communications. The agent becomes hard to debug, expensive to run, and unreliable across its many hats. The fix is almost always breaking it into three or four single-purpose agents.

The operator who didn’t build review gates. An agent sending emails without human approval. An agent deleting rows based on inferred criteria. An agent updating client-facing pages from an unchecked data source. The cost of the first real mistake exceeds the cost of the review gate that would have prevented it, every time.

The operator who never checked credit consumption. Custom Agents consume credits based on model, steps, and context size. An operator who built ten agents and never looked at the dashboard ends up surprised when the monthly bill is much higher than expected. The fix is easy — Notion ships a credits dashboard — but it has to actually get checked.


The timing honest note

A piece of this article that ages. These comparisons are true in April 2026. Custom Agents are new enough that the feature set will expand significantly over the next year. Claude is evolving rapidly. The specific gaps I’ve named may close; new gaps may open in different directions.

What won’t change is the pattern: some work wants a specialized tool, some work wants a general-purpose one. Some work is rhythmic, some is judgment-driven. Some work lives inside a workspace, some crosses systems. The vocabulary for when to use which tool will evolve; the underlying truth that different shapes of work deserve different tools will not.

If you’re reading this in 2027 and Custom Agents have shipped fifteen new capabilities, the specific “can’t do” list will be shorter. The decision tree at the top of this article will still work. That’s the part worth holding onto.


What I’m not saying

A few clarifications because I want to be clear about what this article is and isn’t.

I’m not saying Custom Agents are bad. They’re genuinely good at what they’re good at. They’re saving me hours per week on work I used to do manually.

I’m not saying Claude is strictly better. Claude is more capable at a broader set of tasks, but it also costs more, requires active operator engagement, and can’t sit in the background running overnight rhythms the way Custom Agents can.

I’m not saying there’s one right answer for every operator. Different operators with different businesses and different workflows will land on different splits. The decision tree helps, but it’s a starting point, not a conclusion.

I’m not saying this is permanent. Tool landscapes change fast. Six months from now there may be categories where Custom Agents beat Claude that don’t exist today, and vice versa. What matters is developing the habit of asking “which tool is this work actually shaped for?” instead of defaulting to whichever one you learned first.


The one thing I’d want you to walk away with

If you read nothing else in this article, this is the sentence I’d want in your head:

Rhythmic operational work wants an agent; judgment-heavy work wants a conversation.

That distinction — rhythm versus judgment — cuts through almost every architecture question you’ll have when deciding what to route where. It’s not the only dimension that matters, but it’s the one that settles the most decisions correctly.

Work that happens on a schedule or in response to an event, with bounded inputs and clear outputs? That’s rhythm. Build a Custom Agent.

Work that requires thinking through tradeoffs, integrating disparate information, or producing output with specific voice and judgment? That’s a conversation. Engage Claude.

Get that right for most of your workflows and the rest of the architecture tends to sort itself out.


FAQ

Can’t Custom Agents do everything Claude can do, just inside Notion? No. Custom Agents are optimized for bounded, rhythmic, workspace-shaped tasks. They can technically attempt work that requires deep reasoning, specific voice, or live external context, but the results degrade in predictable ways. Claude — in a conversation or in Claude Code — handles those categories better.

Should I just use Claude for everything then? No. Rhythmic operational work — morning briefs, triage, weekly reports, database enrichment — is genuinely better in Custom Agents than in Claude, because the “autonomous teammate running while you sleep” property matters. The right answer is running both, in their respective sweet spots.

What’s the cost comparison? Starting May 4, 2026, Custom Agents cost roughly $10 per 1,000 Notion Credits. Internal Notion data suggests agents run approximately 45–90 times per 1,000 credits depending on task complexity. Claude’s subscription pricing is flat. For high-frequency simple tasks, Custom Agents are usually cheaper. For heavy reasoning work done many times per day, running Claude directly can be more cost-efficient.

What about Notion Agent (the personal one) versus Claude? Notion Agent is Notion’s on-demand personal AI — you prompt it, it responds. It’s fine for in-workspace tasks where you need AI help with content you’re already looking at. For deeper reasoning, complex drafting, or cross-tool work, Claude is more capable. Notion Agent is a good ambient utility; Claude is a general-purpose intelligence layer.

Which should I learn first if I’m new to both? Claude. Learn to think with an AI as a thinking partner before you try to build autonomous agents. Once you understand what AI can and can’t do in a conversation, the design decisions for Custom Agents become much clearer. Jumping to Custom Agents without the Claude foundation is how operators end up with agents that don’t work as expected.

Can Custom Agents use Claude models? Yes. Custom Agents let you pick the AI model they run on. Claude Sonnet and Claude Opus are both available, along with GPT-5 and various other models. This means the underlying intelligence of a Custom Agent can be Claude — you’re choosing between Claude-as-conversation (claude.ai, Claude Desktop, Claude Code) and Claude-as-embedded-agent (Custom Agent running Claude). Different interfaces, same underlying model in that case.

What if I want Claude to work autonomously on a schedule like Custom Agents do? Possible, but requires more work. Claude Code can be scripted; you can run it on a cron job; you can set up headless workflows. But the “out of the box autonomous teammate” experience is Notion’s current strength, not Anthropic’s. If you want autonomous-background-work without building your own infrastructure, Custom Agents are easier.

How do I decide for my specific situation? Run the decision tree in the article. If you’re still unsure, default to Claude — it’s the more general-purpose tool, and the cost of using the wrong tool for judgment-heavy work is higher than the cost of using the wrong tool for rhythmic work. You can always migrate a recurring workflow to a Custom Agent once you understand the shape.


Closing note

The honest comparison isn’t one tool versus the other. It’s understanding that different shapes of work want different shapes of tool, and that most operators lose more time to the mismatch than to any individual tool’s limitations.

Custom Agents are good at being Custom Agents. Claude is good at being Claude. Neither is good at being the other. Use both, in the places each belongs, with clean handoffs between them, and the stack hums.

Skip the vendor narratives. Read your own workflows. Route each piece to the tool it’s actually shaped for. That’s the whole game.


Sources and further reading

Related Tygart Media pieces:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *