Category: Claude AI

Complete guides, tutorials, comparisons, and use cases for Claude AI by Anthropic.

  • Cowork Is No Longer a Research Preview — Here’s What Changes for Non-Developers Today

    Cowork Is No Longer a Research Preview — Here’s What Changes for Non-Developers Today

    Anthropic’s Cowork feature — the desktop automation tool aimed squarely at non-developers — moved out of research preview on April 29, 2026, and is now generally available on both macOS and Windows. It ships with a feature set that represents a meaningful step forward for anyone who has been running scheduled tasks, file workflows, and multi-step automations through Claude without writing a line of code.

    What’s New in the GA Release

    The GA release lands on Pro, Max, Team, and Enterprise plans. The headline additions are expanded analytics, OpenTelemetry support for enterprise observability, and role-based access controls — the last of these being the signal that Cowork is now ready for team deployments, not just individual power users.

    Persistent agent threads are now live across both mobile (iOS and Android) and desktop, which means you can start a Cowork task on your laptop and monitor or manage it from your phone. The new Customize section consolidates skills, plugins, and connectors into a single panel, replacing what was previously a scattered setup experience across multiple menus.

    Recurring and on-demand task scheduling is also included, enabling the kind of “set it and check it” automation workflows that Cowork was always promising but only partially delivering during the preview period.

    Why This Matters for Non-Developers

    Cowork’s core bet has always been that the most valuable use cases for AI automation don’t belong to engineers — they belong to operators, marketers, content teams, and business owners who know exactly what they want done but have no interest in writing Python scripts or JSON configs to get there. The GA release validates that bet with a production-grade infrastructure story: OpenTelemetry means IT and enterprise security teams can audit what the agents are doing; role-based access controls mean managers can delegate without handing over full system access.

    For the non-developer using Cowork day-to-day, the practical change is reliability. Research previews carry an implicit asterisk — “this works, mostly, until it doesn’t.” GA means the feature is supported, documented, and subject to real SLAs. Scheduled tasks that have been running through the preview period should now be more stable, and new automations can be built with the expectation that they’ll still work next month.

    The Enterprise Observability Story

    The addition of Cowork data into the Analytics API and OpenTelemetry support is worth noting separately. This is the detail that unlocks enterprise adoption at scale. Procurement and security teams at larger organizations have consistently asked for auditability before green-lighting AI automation tools. Cowork now has an answer: every agent action can be traced, logged, and routed into whatever observability stack the enterprise already runs.

    For Team and Enterprise plan subscribers, this should accelerate internal approval processes for Cowork deployments that may have stalled during the preview period.

    What Stays the Same

    The fundamental Cowork model — Claude running autonomous tasks on behalf of the user, triggered by schedule or on-demand, guided by skills and connectors — is unchanged. If you’ve been running workflows in the preview, the transition to GA should be seamless. The Customize section reorganizes the setup experience but doesn’t require rebuilding existing configurations.

    Plans and pricing remain unchanged from the research preview tier placement — Cowork is included in Pro, Max, Team, and Enterprise, with no new add-on cost announced alongside the GA release.

    The Bottom Line

    Cowork GA is the milestone that turns a promising experiment into a product you can build operational workflows around. The combination of persistent threads, role-based access, and OpenTelemetry support brings Cowork into alignment with what enterprise buyers require from any automation tool they’re willing to run at scale. For individual users, the reliability improvement and the cleaner Customize panel are the day-one wins. For teams, the observability story is the green light many have been waiting for.

    Source: Anthropic Cowork Release Notes

  • The Context Stack: How I Give Claude Memory Across 27 Sites and 6 Businesses

    The Context Stack: How I Give Claude Memory Across 27 Sites and 6 Businesses

    The most common question I get from people who read the Split-Brain Architecture piece is some version of: how does Claude actually know what it’s working on? If you are managing 27 sites, 6 businesses, and hundreds of ongoing tasks, how do you avoid spending the first ten minutes of every session re-explaining your entire operation to an AI that has no memory of yesterday?

    The answer is what I call the Context Stack. It is not a single file or a single tool — it is a layered system where each layer handles a different time horizon of memory, and Claude reads exactly what it needs for the task at hand without being overwhelmed by everything else.

    The Problem With AI Memory

    Claude does not have persistent memory across sessions by default. Every conversation starts blank. For someone running a simple use case — drafting an email, summarizing a document — this is fine. For someone running a content network across 27 WordPress sites with different brand voices, different SEO strategies, different clients, and different publishing schedules, a blank slate every session is an operational catastrophe.

    The naive solution is to paste a giant context document at the start of every conversation. I tried this. It doesn’t work. Not because Claude can’t read it — it can — but because a 5,000-word context dump at the start of every session is cognitively expensive for the human, slows down the first response, and buries the relevant information under a pile of irrelevant information.

    The right solution is a stack: different layers of context loaded at different times, for different purposes.

    Layer One — The Global Layer (Always Loaded)

    The global layer is the context that is true across everything I do, all the time. It lives in a CLAUDE.md file at the workspace root and in a persistent system prompt inside Claude’s project settings.

    What goes here: my name, my email, the fact that I manage a network of WordPress sites, the Notion workspace structure, the proxy URL and authentication pattern for WordPress API calls, and a handful of behavioral rules that apply universally — brevity preferences, how I want work logged, what “done” means to me.

    What does not go here: anything site-specific, client-specific, or task-specific. The global layer is 200 lines maximum. Anthropic’s own guidance on CLAUDE.md length is right — longer files reduce adherence. I treat the 200-line limit as a hard constraint, not a guideline.

    Layer Two — The Site Layer (Loaded Per Project)

    Each WordPress site I manage has its own Claude Project, and each project has its own knowledge files. These files contain everything Claude needs to work on that specific site without me having to explain it: the brand voice, the target audience, the top-performing content, the internal linking structure, the credentials, the publishing cadence, and the current content roadmap.

    I generate these files programmatically when I onboard a new site. They pull from the WordPress REST API, the site’s GA4 data, and the Notion database for that client. A site knowledge file for an established site runs about 800–1,200 words. Claude reads it at the start of any session for that project and immediately knows the difference between how to write for a Houston restoration contractor versus a New York luxury lender.

    The site layer is why I can switch from working on a restoration contractor to a luxury lender to a live comedy platform in the same afternoon without losing context. The context travels with the project, not with me.

    Layer Three — The Task Layer (Loaded On Demand)

    The task layer is ephemeral. It is the specific context for the thing I am doing right now: the article brief, the GA data from this session, the list of posts that need refreshing, the client’s feedback on last week’s content.

    This layer lives nowhere permanent. I paste it into the conversation, Claude uses it, and when the session ends it is gone. The task layer is intentionally disposable. If it matters beyond this session, it gets promoted to the site layer or the global layer. If it doesn’t matter beyond this session, it doesn’t need to be stored.

    Most AI users try to make everything permanent. The discipline of the context stack is knowing what deserves permanence and what doesn’t.

    Layer Four — The Second Brain (Asynchronous)

    The second brain layer is Notion. It is not loaded into Claude’s context window directly — it is queried via the Notion MCP when Claude needs specific information.

    What lives here: every session log, every publish log, every piece of competitive intelligence, every client preference that has emerged over time, the Promotion Ledger for autonomous behaviors, the Second Brain database of extracted knowledge from prior sessions.

    The key distinction: Notion is not context I push into Claude. It is context Claude pulls from Notion when it needs it. The MCP connection means Claude can search the Second Brain mid-session, find a relevant prior session log, and use it — without me having to remember that the prior session happened.

    This is the layer that makes the system feel like it has long-term memory even though it doesn’t. Claude doesn’t remember. But it can look things up, and the things worth looking up are stored.

    What This Looks Like In Practice

    A typical session for me starts with a project context already loaded (site layer). Within thirty seconds Claude knows which site it’s working on, what voice to use, and what the current priorities are. I drop in the task layer — a GA report, a list of post IDs, a brief — and we are working within two minutes of starting.

    When something important happens — a new client preference, a site credential change, a strategy decision — I say “log this to Notion” and Claude writes it to the Second Brain. I don’t maintain the second brain manually. Claude maintains it as a byproduct of doing the work.

    When I need to recall something from months ago — what we decided about the internal linking structure for a specific site, what the client said about their brand voice in March — Claude searches Notion and finds it. The retrieval is imperfect but it is dramatically better than my own memory.

    The Honest Constraints

    This system took months to build and it is still not finished. The site knowledge files need updating when strategies change and I don’t always remember to update them. The Second Brain has gaps where sessions weren’t logged properly. The global CLAUDE.md drifts toward bloat and needs periodic pruning.

    The bigger constraint is that this architecture assumes you are operating at a certain scale — multiple sites, multiple clients, recurring workflows. If you are running one site for one business, the overhead of building and maintaining this stack is probably not worth it. A well-written CLAUDE.md and a single Notion page of context will get you most of the way there.

    But if you are scaling past three or four sites, or if you find yourself re-explaining the same context in every session, the stack pays for itself quickly. The ten minutes you spend building a site knowledge file saves you two minutes per session indefinitely.

    The goal is not to give Claude everything. The goal is to give Claude exactly what it needs, when it needs it, at the right layer of permanence.

    Building Your Own Context Stack?

    Email me what you are managing and I will tell you which layers you actually need.

    Most people over-engineer the global layer and under-invest in the site layer. Five minutes of conversation usually fixes it.

    Email Will → will@tygartmedia.com

  • Claude API Access from Singapore and China: What Actually Works in 2026

    Claude API Access from Singapore and China: What Actually Works in 2026

    If you are a developer in Singapore or China trying to use Claude, you have already noticed that the standard instructions don’t quite apply to you. The console.anthropic.com onboarding assumes a US billing address. The latency numbers assume you are pinging from a US data center. And for developers in mainland China, the direct API doesn’t work at all without a workaround.

    This is a practical guide to what actually works in 2026, written for the Asian developer market that is increasingly one of Claude’s most active audiences.

    Singapore: What Works Directly

    Singapore is a fully supported country for the Anthropic API. You can create an account at console.anthropic.com, add a payment method, and generate API keys with no restrictions. Most major international credit cards work without issues. If you are at a company with a Singapore entity, Anthropic accepts international wire transfers for enterprise contracts.

    Latency from Singapore to Anthropic’s US API endpoints typically runs 180–250ms round-trip depending on your ISP and the model you are calling. For most application use cases this is acceptable. For latency-sensitive real-time applications — voice interfaces, live coding assistants — you will want to route through a closer compute layer, which is where Vertex AI becomes relevant.

    Vertex AI: The Regional Solution for Both Markets

    Google Cloud’s Vertex AI hosts Claude models (Sonnet and Haiku tiers as of mid-2026) and has a data center in Singapore: asia-southeast1. This is the cleanest solution for developers in both Singapore and the broader Asia-Pacific region who want lower latency and enterprise-grade SLAs.

    The practical difference: instead of calling api.anthropic.com, you call a Vertex AI endpoint scoped to asia-southeast1. Your tokens are processed in Singapore, not Virginia. For regulated industries — fintech, healthcare, legal — this also means your data doesn’t leave the region, which is a compliance requirement in several Singapore regulatory frameworks (MAS TRM guidelines being the primary one).

    To get started with Claude on Vertex AI from Singapore:

    1. Create a GCP project and enable the Vertex AI API
    2. Request access to Claude models via the Vertex AI Model Garden (approval is typically same-day for Singapore accounts)
    3. Set your region to asia-southeast1 in all API calls
    4. Authenticate via a GCP service account rather than an Anthropic API key

    The pricing on Vertex AI is comparable to direct Anthropic API pricing, with GCP committed use discounts available at higher volumes.

    AWS Bedrock: The Other Regional Option

    Amazon Bedrock also hosts Claude models and has a Singapore region (ap-southeast-1). If your infrastructure is already on AWS, this is often the simpler path. The setup mirrors Vertex AI: enable Bedrock in your AWS console, request Claude model access, and specify the Singapore region in your SDK calls.

    The practical consideration: as of mid-2026, model availability on Bedrock sometimes lags behind the direct Anthropic API by a few weeks when new versions ship. If being on the latest Claude version immediately matters for your use case, the direct API or Vertex AI are more current.

    China: The Honest Situation

    The direct Anthropic API is not accessible from mainland China without a VPN. Console.anthropic.com is not blocked at the DNS level in the same way Google is, but connectivity is unreliable and payment processing from Chinese-issued cards through Stripe (Anthropic’s payment processor) fails for most users.

    The workarounds that Chinese developers are actually using in 2026:

    VPN plus international card. Developers with access to a VPN and an international payment card (Hong Kong or Singapore bank account) use the direct API without issues. This is the most common setup among individual developers and small teams.

    Hong Kong entity. Companies with a Hong Kong subsidiary or registered office use that entity for the Anthropic API account. Hong Kong is a fully supported region with no connectivity issues.

    Third-party API proxies. Several API aggregators operating out of Hong Kong and Singapore re-sell Anthropic API access to mainland China developers. Quality and terms vary significantly — vet carefully before using in production.

    Vertex AI via a non-China GCP account. Some development teams maintain a GCP account registered to a Singapore or Hong Kong entity, then call the Vertex AI Claude endpoint from within China via GCP’s global network. Google Cloud has limited but operational connectivity from within China through its global backbone. This is the most enterprise-appropriate solution for teams that need a compliant path.

    Latency Reality Check by Access Method

    Access Method From Singapore From China (with VPN)
    Direct Anthropic API (us-east) 180–250ms 300–500ms+
    Vertex AI (asia-southeast1) 30–60ms 150–300ms via GCP backbone
    AWS Bedrock (ap-southeast-1) 25–55ms Not directly accessible

    Latency figures are representative ranges based on typical ISP routing. Your numbers will vary.

    Payment and Billing Notes

    For Singapore developers on the direct Anthropic API: Visa, Mastercard, and American Express issued by Singapore banks work reliably. PayNow and local payment rails are not supported — you need an international card.

    For enterprise: Anthropic’s sales team handles invoiced billing for Singapore and other APAC markets. If you are spending meaningfully on the API, contact sales rather than running on a credit card — the invoiced route gives you better cost predictability and eliminates card limit friction.

    The Bottom Line

    If you are in Singapore, the direct API works and Vertex AI’s asia-southeast1 region gives you a lower-latency, compliance-friendly alternative worth evaluating for production workloads.

    If you are in mainland China, the direct API requires a workaround. A Hong Kong entity plus Vertex AI is the cleanest enterprise path. For individual developers, VPN plus an international card is the practical reality.

    The Asian developer market is using Claude at scale. The tooling is there — it just requires knowing which path to take from where you are sitting.

    Based in Singapore or Asia-Pacific?

    I can help you pick the right access path for your stack and region.

    Email me your setup — direct API, Vertex AI, or Bedrock — and I’ll give you a straight answer on what makes sense.

    Email Will → will@tygartmedia.com

  • Claude AI Context Window Explained: Size, Limits, and How It Works

    Claude AI Context Window Explained: Size, Limits, and How It Works

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.6 referenced in this article has been superseded. See current model tracker →

    Claude’s context window is one of the most consequential — and most misunderstood — specs in the AI landscape. It determines how much information Claude can hold and reason about at once. Get it wrong in your planning and you’ll hit hard walls mid-task. This guide covers exactly how large Claude’s context window is, how it differs by model and plan, and what it means in practice.

    What is a context window? The context window is Claude’s working memory for a conversation — the total amount of text (including your messages, Claude’s responses, uploaded files, and system instructions) that Claude can actively process at once. When a conversation exceeds this limit, Claude can no longer reference earlier parts of it without summarization or a new session.

    Claude’s Context Window Size by Model and Plan

    Context window size in Claude varies by model, plan type, and which product surface you’re using. Here’s the accurate picture as of April 2026:

    Claude.ai (Web and Mobile Chat)

    For users on paid claude.ai plans — Pro, Max, Team, and most Enterprise — the context window is 200,000 tokens across all models and paid plans. According to Anthropic’s support documentation, this is roughly 500 pages of text or more.

    Enterprise plans on specific models have access to a 500,000 token context window. This is a plan-level feature, not a model selection — contact Anthropic’s enterprise team for details on which models qualify.

    Claude Code (Terminal and IDE)

    The larger context windows — 1 million tokens — are available specifically through Claude Code on paid plans:

    • Claude Opus 4.6: Supports a 1M token context window in Claude Code on Pro, Max, Team, and Enterprise plans. Pro users need to enable extra usage to access Opus 4.6 in Claude Code.
    • Claude Sonnet 4.6: Also supports a 1M token context window in Claude Code, but extra usage must be enabled to access it (except for usage-based Enterprise plans).

    Claude API

    Via the direct API, the current model context windows as published in Anthropic’s official documentation are:

    Model Context Window Max Output
    Claude Opus 4.7 1,000,000 tokens 128,000 tokens
    Claude Sonnet 4.6 1,000,000 tokens 64,000 tokens
    Claude Haiku 4.5 200,000 tokens 64,000 tokens

    Source: Anthropic Models Overview, April 2026.

    What 200K Tokens Actually Means

    Tokens are not the same as words. A token is roughly 3–4 characters, which works out to approximately 0.75 words in English. Here’s how the 200K token context window translates into practical content:

    • ~150,000 words of plain text
    • ~500+ pages of a standard document
    • A full-length novel (most are 80,000–120,000 words) with room to spare
    • Hundreds of emails in a thread
    • A moderately large codebase or multiple interconnected files
    • Hours of meeting transcripts

    For the vast majority of everyday tasks — document review, writing, research, coding, analysis — 200K tokens is more than enough. The ceiling only becomes relevant for extended research sessions, very large codebases, or scenarios where you need to maintain context across a lengthy back-and-forth over many hours.

    What 1M Tokens Actually Means

    One million tokens is roughly 750,000 words — equivalent to about five full-length novels, or a substantial enterprise codebase in a single session. The practical use cases that genuinely require this scale are narrower than the marketing suggests, but they’re real:

    • Large codebase analysis: Feeding an entire repository — multiple files, modules, and dependencies — into a single Claude Code session for architecture review, debugging, or refactoring.
    • Book-length document processing: Analyzing or summarizing an entire textbook, legal corpus, or research archive without chunking.
    • Long-running agentic workflows: Multi-agent tasks where conversation history, tool call results, and accumulated context grow significantly over time.
    • Extended conversation history: Maintaining full context across a very long research or writing session without losing earlier exchanges.

    For most individual users on claude.ai, the 200K chat context window is the relevant number. The 1M context window matters most to developers building on the API and power users running Claude Code sessions on large codebases.

    Context Window vs. Usage Limit: Two Different Things

    This is the most common point of confusion. The context window and usage limit are separate constraints that operate independently:

    Context window (length limit): How much content Claude can hold in a single conversation. This is a technical capability of the model. When you hit the context window, Claude can no longer actively process earlier parts of the conversation without summarization.

    Usage limit: How much you can interact with Claude over a rolling time period — the five-hour session window and weekly cap on paid plans. This controls how many total messages and how much total compute you consume across all your conversations, not the depth of any single conversation.

    You can hit a usage limit without ever approaching the context window (many short conversations). You can also approach the context window limit without hitting your usage limit (one very long, deep conversation). They’re orthogonal constraints.

    Automatic Context Management

    For paid plan users with code execution enabled, Claude automatically manages long conversations when they approach the context window limit. When the conversation gets long enough that it would otherwise hit the ceiling, Claude summarizes earlier messages to make room for new content — allowing the conversation to continue without interruption.

    Important details about how this works:

    • Your full chat history is preserved — Claude can still reference earlier content even after summarization.
    • This does not count toward your usage limit.
    • You may see Claude note that it’s “organizing its thoughts” — this indicates automatic context management is active.
    • Code execution must be enabled for automatic context management to work. Users without code execution enabled may encounter hard context limits.
    • Rare edge cases — very large first messages or system errors — may still hit context limits even with automatic management active.

    How Context Window Affects Cost on the API

    For developers using the Claude API directly, context window size has direct billing implications. Every token in the context window — input messages, conversation history, system prompts, uploaded documents, and tool call results — is billed as input tokens on each API call.

    This creates an important cost dynamic: long conversations get progressively more expensive per message. In a 100-message thread, every new message requires reprocessing the entire conversation history as input tokens. A session that started at $0.01 per exchange can reach $0.10 or more per exchange by message 80.

    Two features exist specifically to manage this cost:

    • Prompt caching: For repeated content — large system prompts, reference documents, or conversation history that doesn’t change — prompt caching allows Claude to read from a cache at roughly 10% of the standard input token price, rather than reprocessing the same content on every call. This can reduce costs by up to 90% on cached content.
    • Message Batches API: For non-real-time workloads, the Batch API provides a 50% discount on all token pricing. It doesn’t reduce the token count, but halves the cost per token.

    How Projects Expand Effective Context

    Claude Projects on claude.ai use retrieval-augmented generation (RAG), which changes how context works in a meaningful way. Instead of loading all project knowledge into the active context window at once, Projects retrieve only the most relevant content for each message.

    This means you can store substantially more information in a Project’s knowledge base than would fit in the raw context window — and Claude will pull the relevant pieces into the active context as needed. For research-heavy workflows, content libraries, or any use case where you’re working with a large knowledge base across many sessions, Projects are the practical way to work beyond the hard context window ceiling.

    Anthropic also offers a RAG mode for expanded project knowledge capacity that pushes this further for users who need it.

    Context Window and Model Choice

    If context window size is a primary constraint for your use case, here’s how to think about model selection:

    For claude.ai chat users, all paid plans give you 200K tokens regardless of which model you’re using. The model choice doesn’t affect the context window in the chat interface.

    For Claude Code users on Pro, Max, or Team plans, Opus 4.6 and Sonnet 4.6 both offer the 1M context window — but you need extra usage enabled to access it (except on usage-based Enterprise plans).

    For API developers, Opus 4.7 and Sonnet 4.6 both provide 1M token context windows at their standard per-token rates. Haiku 4.5 is capped at 200K. If your workload requires context beyond 200K tokens, Sonnet 4.6 at $3/$15 per million tokens is the cost-efficient choice — you get the same 1M context window as Opus at 40% lower cost.

    Practical Tips to Maximize Your Context Window

    Whether you’re on the 200K or 1M window, these practices extend how effectively you can use available context:

    • Start fresh conversations for new topics. Don’t carry long threads across unrelated tasks — the accumulated history consumes context without adding value for the new task.
    • Use Projects for recurring reference material. Documents, instructions, and background context that you reference repeatedly belong in a Project, not re-uploaded to each conversation.
    • Keep system prompts concise. In API applications, every extra token in a system prompt multiplies across every call. Trim aggressively.
    • Disable unused tools and connectors. Web search, MCP connectors, and other tools add system prompt tokens even when not actively used. Turn them off for sessions that don’t need them.
    • Enable code execution if you’re on a paid plan — it activates automatic context management and extends how long conversations can run without hitting the ceiling.

    Frequently Asked Questions

    What is Claude’s context window size?

    For paid claude.ai plans (Pro, Max, Team), the context window is 200,000 tokens — roughly 500 pages of text. Enterprise plans have a 500,000 token context window on specific models. Via the API and in Claude Code, Opus 4.7 and Sonnet 4.6 support a 1,000,000 token context window. Haiku 4.5 is 200,000 tokens across all surfaces.

    How many words is 200K tokens?

    Approximately 150,000 words. A token is roughly 0.75 words in English. 200,000 tokens is equivalent to a long novel, 500+ pages of standard text, or many hours of conversation history.

    How many words is 1 million tokens?

    Approximately 750,000 words — roughly five full-length novels, or the equivalent of a substantial codebase in a single session.

    Does the context window reset between conversations?

    Yes. Each new conversation starts with a fresh context window. Previous conversations do not carry over unless you’re using a Project, which maintains persistent knowledge across sessions, or unless Claude has memory features enabled that reference past conversations.

    What happens when Claude hits the context window limit?

    For paid plan users with code execution enabled, Claude automatically summarizes earlier messages and continues the conversation. Without code execution enabled, you may encounter a hard limit that requires starting a new conversation. In either case, the context window limit is separate from your usage limit — hitting one doesn’t affect the other.

    Can I increase Claude’s context window?

    The context window size is fixed by your plan and model. You can’t expand it directly, but you can use Projects (which use RAG to work with more information than fits in the raw context window), enable automatic context management via code execution, or use the API with models that have larger native context windows.

    Does every message use the full context window?

    No. Context usage grows as a conversation progresses. The first message in a conversation uses only the tokens from that message plus any system prompt. By message 50, the entire thread history is included as context on every subsequent call. This is why long conversations get progressively more token-intensive over time.

    Is the context window the same as Claude’s memory?

    Not exactly. The context window is technical working memory — what Claude can actively process in a session. Claude’s memory features (available on paid plans) are separate: they extract and store information from past conversations and make it available in future sessions, beyond what the context window can hold.

  • Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.6 referenced in this article has been superseded. See current model tracker →

    Anthropic’s Claude model lineup in 2026 breaks down into three distinct tiers: Opus 4.7 for maximum capability, Sonnet 4.6 for the best balance of performance and cost, and Haiku 4.5 for speed and high-volume work. Picking the wrong model costs money or performance — sometimes both. This guide covers every meaningful difference so you can make the right call for your use case.

    Quick answer: Sonnet 4.6 handles 80–90% of tasks at 40% less cost than Opus. Use Opus 4.7 when you need maximum reasoning depth, the largest output window, or agentic coding at frontier quality. Use Haiku 4.5 when speed and cost are the priority and the task is straightforward.

    The Current Claude Model Lineup (April 2026)

    As of April 2026, Anthropic’s three recommended models are Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5. All three support text and image input, multilingual output, and vision processing. They differ significantly in pricing, context window, output limits, and capability.

    Feature Opus 4.7 Sonnet 4.6 Haiku 4.5
    Input price $5 / MTok $3 / MTok $1 / MTok
    Output price $25 / MTok $15 / MTok $5 / MTok
    Context window 1M tokens 1M tokens 200K tokens
    Max output 128K tokens 64K tokens 64K tokens
    Extended thinking No Yes Yes
    Adaptive thinking Yes Yes No
    Latency Moderate Fast Fastest
    Knowledge cutoff Jan 2026 Aug 2025 Feb 2025

    Pricing is per million tokens (MTok) via the Claude API. Source: Anthropic Models Overview, April 2026.

    Claude Opus 4.7: When to Use It

    Opus 4.7 is Anthropic’s most capable generally available model as of April 2026. Anthropic describes it as a step-change improvement in agentic coding over Opus 4.6, with a new tokenizer that contributes to improved performance on a range of tasks. Note that this new tokenizer may use up to 35% more tokens for the same text compared to previous models — a cost consideration worth factoring in for high-volume workflows.

    Key differentiators for Opus 4.7 over the other two models:

    • 128K max output tokens — double Sonnet and Haiku’s 64K cap. This matters for generating long-form code, detailed reports, or complete document drafts in a single call.
    • 1M token context window — same as Sonnet 4.6, meaning Opus can process entire codebases or book-length documents in a single session.
    • Adaptive thinking — Opus 4.7 and Sonnet 4.6 both support adaptive thinking, which lets the model adjust reasoning depth based on task complexity.
    • Most recent knowledge cutoff — January 2026, versus August 2025 for Sonnet and February 2025 for Haiku.

    Opus does not support extended thinking — that capability lives on Sonnet 4.6 and Haiku 4.5. Extended thinking lets the model reason step-by-step before generating output, which is particularly useful for complex math, science, and multi-step logic problems.

    Use Opus 4.7 for: complex architecture decisions, large codebase analysis, multi-agent orchestration tasks, outputs that require more than 64K tokens, tasks demanding the latest possible knowledge, and any work where you need the absolute frontier of Anthropic’s reasoning capability.

    Skip Opus 4.7 for: routine content generation, customer support pipelines, high-volume classification or extraction, real-time applications requiring low latency, or any task where Sonnet scores within your acceptable quality threshold.

    Claude Sonnet 4.6: The Workhorse

    Sonnet 4.6 is the model Anthropic recommends as the best combination of speed and intelligence. Released in February 2026, it delivers a 1M token context window at $3 input / $15 output per million tokens — the same context window as Opus at 40% lower cost.

    Sonnet 4.6 also uniquely offers extended thinking, which Opus 4.7 does not. When extended thinking is enabled, Sonnet can perform additional internal reasoning before generating its response — useful for reasoning-heavy tasks like complex debugging, multi-step research, and technical problem-solving where chain-of-thought depth matters.

    For developers and teams using Claude Code, Sonnet 4.6 is the standard daily driver. It handles tool calling, agentic workflows, and multi-file code reasoning reliably, at a price point that makes heavy daily use economically viable.

    Use Sonnet 4.6 for: most production workloads, Claude Code sessions, long-document analysis, content generation, coding tasks, research synthesis, customer-facing applications, and any workflow requiring the 1M context window where Opus’s premium isn’t justified.

    Skip Sonnet 4.6 for: high-volume pipelines where Haiku’s lower cost is acceptable, simple classification or extraction tasks, or real-time applications where Haiku’s faster latency is required.

    Claude Haiku 4.5: Speed and Volume

    Haiku 4.5 is the fastest model in the Claude family and the most cost-efficient at $1 input / $5 output per million tokens. It has a 200K token context window — smaller than Opus and Sonnet’s 1M, but still substantial for most single-task work. It supports extended thinking but not adaptive thinking.

    The 200K context limit is the most important practical constraint. Most single-document, single-task workflows fit within 200K. Multi-file codebases, long books, or extended conversation histories that push past that threshold need Sonnet or Opus.

    Haiku 4.5 has the oldest knowledge cutoff of the three: February 2025. For tasks requiring awareness of events or developments from mid-2025 onward, Haiku won’t have that context baked in.

    Use Haiku 4.5 for: content moderation, classification pipelines, entity extraction, customer support triage, real-time chat interfaces, simple Q&A, high-volume API workflows where cost and speed dominate, and any task where quality requirements are modest.

    Skip Haiku 4.5 for: complex reasoning, large codebase analysis, tasks requiring recent knowledge (post-February 2025), multi-step agent workflows, or any output requiring more than 200K tokens of input context.

    Pricing: What the Numbers Actually Mean in Practice

    All three models price output tokens at 5x the input rate — a ratio that holds across the entire Claude lineup. This means verbose, long-form outputs cost significantly more than short, targeted responses. Minimizing generated output length is the highest-leverage cost optimization available before you touch model routing or caching.

    To put the pricing in concrete terms: generating one million output tokens (roughly 750,000 words of generated text) costs $25 on Opus, $15 on Sonnet, and $5 on Haiku. For input-heavy workloads like document analysis where you’re feeding in large amounts of text but getting shorter responses, the cost gap narrows.

    Three additional pricing levers apply across all models:

    • Prompt caching: Cuts cache-read input costs by up to 90% for repeated system prompts or documents. If your application reuses a large system prompt across many requests, caching is the single highest-impact cost reduction available.
    • Batch API: Provides a 50% discount for non-time-sensitive workloads processed asynchronously. Combine with prompt caching for up to 95% savings on qualifying workflows.
    • Model routing: Running a mix of Haiku for simple tasks, Sonnet for production workloads, and Opus for complex reasoning — rather than using one model for everything — can reduce total API costs by 60–70% without meaningful quality loss on the tasks that don’t require a flagship model.

    Context Windows: 1M Tokens vs. 200K

    Opus 4.7 and Sonnet 4.6 both offer a 1M token context window at standard pricing — no premium surcharge for extended context. For reference, 1 million tokens is roughly 750,000 words, enough to hold a large codebase, a full academic textbook, or months of business communications in a single conversation.

    Haiku 4.5 has a 200K token context window. That’s still roughly 150,000 words — sufficient for most single-document tasks, but it creates a hard ceiling for anything requiring multi-file code review, book-length document analysis, or lengthy conversation histories.

    If your workflow consistently requires more than 200K tokens of input, Sonnet 4.6 is the cost-efficient choice. Opus 4.7 is the right call only when the input load requires the additional reasoning capability Opus provides, not just the context window size — because Sonnet gets you the same 1M window at 40% lower cost.

    Extended Thinking vs. Adaptive Thinking

    These are two distinct features that appear together in the comparison table but serve different purposes.

    Extended thinking (available on Sonnet 4.6 and Haiku 4.5, not Opus 4.7) lets Claude perform additional internal reasoning before generating its response. When enabled, the model produces a “thinking” content block that exposes its reasoning process — step-by-step problem decomposition before the final answer. Extended thinking tokens are billed as standard output tokens at the model’s output rate. A minimum thinking budget of 1,024 tokens is required when enabling this feature.

    Adaptive thinking (available on Opus 4.7 and Sonnet 4.6, not Haiku 4.5) adjusts reasoning depth dynamically based on task complexity — the model allocates more reasoning for harder problems and less for simpler ones, without requiring explicit configuration.

    The practical implication: if you need transparent, controllable step-by-step reasoning that you can inspect and use in your application, Sonnet 4.6’s extended thinking is often the right tool — and at lower cost than Opus.

    Which Claude Model Should You Choose?

    The right framework for model selection in 2026 is to start with Sonnet 4.6 as your default and escalate selectively. Most production workloads — coding, writing, analysis, research, customer-facing applications — are well-served by Sonnet. Opus 4.7 earns its premium in specific scenarios: tasks requiring more than 64K output tokens, agent workflows demanding maximum reasoning depth, or applications where Anthropic’s latest knowledge cutoff is a meaningful factor.

    Haiku 4.5 belongs in any pipeline where you’ve identified tasks that don’t require Sonnet’s capability. High-volume routing, triage, classification, and real-time response scenarios are Haiku’s natural territory. Building a 70/20/10 routing split across Haiku, Sonnet, and Opus — rather than using a single model for everything — is the standard approach for cost-efficient production deployments.

    Frequently Asked Questions

    What is the difference between Claude Opus, Sonnet, and Haiku?

    Opus is Anthropic’s most capable model, optimized for complex reasoning, large outputs, and agentic tasks. Sonnet offers a balance of capability and cost, handling most production workloads at lower price. Haiku is the fastest and cheapest option, suited for high-volume, lower-complexity tasks. All three share the same core Claude architecture and safety training.

    Is Claude Opus worth the extra cost over Sonnet?

    For most tasks, no. Sonnet 4.6 handles the majority of coding, writing, and analysis work at 40% lower cost. Opus 4.7 is worth the premium when you need outputs longer than 64K tokens, maximum agentic coding capability, or the most recent knowledge cutoff (January 2026 vs. Sonnet’s August 2025).

    Which Claude model is best for coding?

    Sonnet 4.6 is the standard recommendation for most coding work, including Claude Code sessions. Opus 4.7 is preferred for large codebase analysis, complex architecture decisions, or multi-agent coding workflows where maximum reasoning depth is required. Haiku 4.5 can handle simple code edits and explanations at much lower cost.

    What is the Claude context window?

    Claude Opus 4.7 and Sonnet 4.6 both have a 1 million token context window — roughly 750,000 words of combined input and conversation history. Claude Haiku 4.5 has a 200,000 token context window. Context window size determines how much information Claude can hold and reference in a single conversation.

    Does Claude Opus support extended thinking?

    No. Extended thinking is available on Claude Sonnet 4.6 and Claude Haiku 4.5, but not on Claude Opus 4.7. Opus 4.7 supports adaptive thinking instead, which dynamically adjusts reasoning depth based on task complexity.

    What is the cheapest Claude model?

    Claude Haiku 4.5 is the least expensive model at $1 per million input tokens and $5 per million output tokens. It is also the fastest Claude model, making it well-suited for high-volume, latency-sensitive applications.

    Can I use Claude through Amazon Bedrock or Google Vertex AI?

    Yes. All three current Claude models — Opus 4.7, Sonnet 4.6, and Haiku 4.5 — are available through Amazon Bedrock and Google Vertex AI in addition to the direct Anthropic API. Bedrock and Vertex AI offer regional and global endpoint options. Pricing on third-party platforms may vary from direct Anthropic API rates.

  • Should You Give Claude Access to Your Email, Slack, and SSH Keys?

    Should You Give Claude Access to Your Email, Slack, and SSH Keys?

    Should You Give Claude Access to Your Email, Slack, and SSH Keys?

    The Lethal Trifecta is a security framework for evaluating agentic AI risk: any AI agent that simultaneously has access to your private data, access to untrusted external content, and the ability to communicate externally carries compounded risk that is qualitatively different from any single capability alone. The name comes from the AI engineering community’s own terminology for the combination. The industry coined it, documented it, and then mostly shipped it anyway.

    The answer to the question in the title is: it depends, and the framework for deciding is more important than any blanket yes or no. But before we get to the framework, it is worth spending some time on why the question is harder than the AI industry’s current marketing posture suggests.

    In the spring of 2026, the dominant narrative at AI engineering conferences and in developer tooling launches is one of frictionless connection. Give your AI access to everything. Let it read your email, monitor your calendar, respond to your Slack, manage your files, run commands on your server. The more you connect, the more powerful it becomes. The integration is the product.

    This narrative is not wrong exactly. Broadly connected AI agents are genuinely powerful. The capabilities being described are real and the productivity gains are real. What gets systematically underweighted in the enthusiasm — sometimes by speakers who are simultaneously naming the risks and shipping the product anyway — is what happens when those capabilities are exploited rather than used as intended.

    This article is the risk assessment the integration demos skip.


    What the AI Engineering Community Actually Knows (And Ships Anyway)

    The most clarifying thing about the current moment in AI security is not that the risks are unknown. It is that they are known, named, documented, and proceeding regardless.

    At the AI Engineer Europe 2026 conference, the security conversation was unusually candid. Peter Steinberger, creator of OpenClaw — one of the fastest-growing AI agent frameworks in recent history — presented data on the security pressure his project faces: roughly 1,100 security advisories received in the framework’s first months of existence, the vast majority rated critical. Nation-state actors, including groups attributed to North Korea, have been actively probing open-source AI agent frameworks for exploitable vulnerabilities. This was stated plainly, in a keynote, at a major developer conference, and the session continued directly into how to build more powerful agents.

    The Lethal Trifecta framework — the recognition that an agent with private data access, untrusted content access, and external communication capability is a qualitatively different risk than any single capability — was presented not as a reason to slow down but as a design consideration to hold in mind while building. Which is fair, as far as it goes. But the gap between “hold this in mind” and “actually architect around it” is where most real-world deployments currently live.

    The point is not that the AI engineering community is reckless. The point is that the incentive structure of the industry — where capability ships fast and security is retrofitted — means that the candid acknowledgment of risk and the shipping of that risk can happen in the same session without contradiction. Individual operators who are not building at conference-demo scale need to do the risk assessment that the product launches are not doing for them.


    The Three Capabilities and What Each Actually Means

    The Lethal Trifecta is a useful lens because it separates three capabilities that are often bundled together in integration pitches and treats each one as a distinct risk surface.

    Access to Your Private Data

    This is the most commonly understood capability and the one most people focus on when thinking about AI privacy. When you connect Claude — or any AI agent — to your email, your calendar, your cloud storage, your project management tools, your financial accounts, or your communication platforms, you are giving the AI a read-capable view of data that exists nowhere else in the same configuration.

    The risk is not primarily that the AI platform will misuse it, though that is worth understanding. The risk is that the AI becomes a single point of access to an unusually comprehensive portrait of your life and work. A compromised AI session, a prompt injection, a rogue MCP server, or an integration that behaves differently than expected now has access to everything that integration touches.

    The practical question is not “do I trust this AI platform” but “what is the blast radius if this specific integration is exploited.” Those are different questions with different answers.

    Access to Untrusted External Content

    This capability is less commonly thought about and considerably more dangerous in combination with the first. When you give an AI agent the ability to browse the web, read external documents, process incoming email from unknown senders, or access any content that originates outside your controlled environment, you are exposing the agent to inputs that may be deliberately crafted to manipulate its behavior.

    Prompt injection — embedding instructions in content that the AI will read and act on as if those instructions came from you — is not a theoretical vulnerability. It is a documented, actively exploited attack vector. An email that appears to be a routine business inquiry but contains embedded instructions telling the AI to forward your recent correspondence to an external address. A web page that looks like a documentation page but instructs the AI to silently modify a file it has write access to. A document that, when processed, tells the AI to exfiltrate credentials from connected services.

    The AI does not always distinguish between instructions you gave it and instructions embedded in content it reads on your behalf. This is a fundamental characteristic of how language models process text, not a bug that will be patched in the next release.

    The Ability to Communicate Externally

    The third leg of the trifecta is what turns a read vulnerability into a write vulnerability. An AI that can read your private data and read untrusted content but cannot take external actions is a privacy risk. An AI that can also send email, post to Slack, make API calls, or run commands has the ability to act on whatever instructions — legitimate or injected — it processes.

    The combination of all three is what produces the qualitative shift in risk profile. Private data access means the attacker gains access to your information. Untrusted content access means the attacker can deliver instructions to the agent. External action capability means those instructions can produce real-world consequences without your direct involvement.

    The agent that reads your email, processes an injected instruction from a malicious sender, and then forwards your sensitive files to an external address is not a hypothetical attack. It is a specific, documented threat class that AI security researchers have demonstrated in controlled environments and that real deployments are not consistently protected against.


    Cross-Primitive Escalation: The Attack You Are Not Modeling

    The AI engineering community has a more specific term for one of the most dangerous attack patterns in this space: cross-primitive escalation. It is worth understanding because it describes the mechanism by which a seemingly low-risk integration becomes a high-risk one.

    Cross-primitive escalation works like this: an attacker compromises a read-only resource — a document, a web page, a log file, an incoming message — and embeds instructions in it that the AI will process as legitimate directives. Those instructions tell the AI to invoke a write-action capability that the attacker could not access directly. The read resource becomes a bridge to the write capability.

    A concrete example: you connect your AI to your cloud storage for read access, so it can summarize documents and answer questions about project files. You also connect it to your email with send capability, so it can draft and send routine correspondence. These seem like two separate, bounded integrations. Cross-primitive escalation means a compromised document in your cloud storage could instruct the AI to use its email send capability to forward sensitive files to an external address. The read access and the write access interact in a way that neither integration’s risk model accounts for individually.

    This is why the Lethal Trifecta matters at the combination level rather than the individual capability level. The question to ask is not “is this specific integration risky” but “what can the combination of my integrations do if the read-capable surface is compromised.”


    The Framework: How to Actually Decide

    With the risk structure clear, here is a practical framework for evaluating whether to grant any specific AI integration.

    Question 1: What is the blast radius?

    For any integration you are considering, define the worst-case scenario specifically. Not “something bad might happen” but: if this integration were exploited, what data could be accessed, what actions could be taken, and who would be affected?

    An integration that can read your draft documents and nothing else has a contained blast radius. An integration that can read your email, access your calendar, send messages on your behalf, and call external APIs has a blast radius that encompasses your professional relationships, your schedule, your correspondence history, and whatever systems those APIs touch. These are not comparable risks and should not be evaluated with the same threshold.

    Question 2: Is this integration delivering active value?

    The temptation with AI integrations is to connect everything because connection is low-friction and disconnection requires a deliberate action. This produces an accumulation of integrations where some are actively useful, some are marginally useful, and some were set up once for a specific purpose that no longer exists.

    Every live integration is carrying risk. An integration that is not delivering value is carrying risk with no offsetting benefit. The right practice is to connect deliberately and maintain an active integration audit — reviewing what is connected, what it is actually doing, and whether that value justifies the risk posture it creates.

    Question 3: What is the minimum scope necessary?

    Most AI integration interfaces offer choices in how broadly to grant access. Read-only versus read-write. Access to a specific folder versus access to all files. Access to a single Slack channel versus access to all channels including private ones. Access to outbound email drafts only versus full send capability.

    The principle is the same one that governs good access control in any security context: grant the minimum scope necessary for the function you need. The guardrails starter stack covers the integration audit mechanics for doing this in practice. An AI that needs to read project documents to answer questions about them does not need write access to those documents. An AI that needs to draft email responses does not need send-without-review access. The capability gap between what you grant and what you actually use is attack surface that exists for no benefit.

    Question 4: Is there a human confirmation gate proportional to the action’s reversibility?

    This is the question that most integration setups skip entirely. The AI engineering community has a name for the design pattern that gets this right: matching the depth of human confirmation to the reversibility of the action.

    Reading a document is reversible in the sense that nothing changes in the world if the read is wrong. Sending an email is not reversible. Deleting a file is not immediately reversible. Making an API call that triggers an external workflow may not be reversible at all. The confirmation requirement should scale with the irreversibility.

    An AI integration with full autonomous action capability — no human in the loop, no confirmation step, no review before execution — is an appropriate architecture for a narrow set of genuinely low-stakes tasks. It is not an appropriate architecture for anything that touches external communication, data modification, or actions with downstream consequences. The friction of confirmation is not overhead. It is the mechanism that makes the capability safe to use.


    SSH Keys Specifically: The Highest-Stakes Integration

    The title of this article includes SSH keys because they represent the clearest case of where the Lethal Trifecta analysis should produce a clear answer for most operators.

    SSH access is full computer access. An AI with SSH key access to a server can read any file on that server, modify any file, install software, delete data, exfiltrate credentials stored on the system, and use that server as a jumping-off point to reach other systems on the same network. The blast radius of an SSH key integration extends to everything that server touches.

    The AI engineering community has thought carefully about this specific tradeoff and arrived at a nuanced position: full computer access — bash, SSH, unrestricted command execution — is appropriate in cloud-hosted, isolated sandbox environments where the blast radius is deliberately contained. It is not appropriate in local environments, production systems, or anywhere that the server has meaningful access to data or systems that should be protected.

    This is a reasonable position. Claude Code running in an isolated cloud container with no access to production data or external systems is a genuinely different risk profile than an AI agent with SSH access to a server that also holds client data and has credentials to your infrastructure. The key question is not “should AI ever have SSH access” but “what does this specific server touch, and am I comfortable with the full blast radius.”

    For most operators who are not running dedicated sandboxed environments: the answer is to not give AI systems SSH access to servers that hold anything you would not want to lose, expose, or have modified without your explicit instruction. That boundary is narrower than it sounds for most real-world setups.


    What Secure AI Integration Actually Looks Like

    The risk framework above can sound like an argument against AI integration entirely. It is not. The goal is not to disconnect everything but to connect deliberately, with architecture that matches the capability to the risk.

    The AI engineering community has developed several patterns that meaningfully reduce risk without eliminating capability:

    MCP servers as bounded interfaces. Rather than giving an AI direct access to a service, exposing only the specific operations the AI needs through a defined interface. An AI that needs to query a database gets an MCP tool that can run approved queries — not direct database access. An AI that needs to search files gets a tool that searches and returns results — not file system access. The MCP pattern limits the blast radius by design.

    Secrets management rather than credential injection. Credentials never appear in AI contexts. They live in a secrets manager and are referenced by proxy calls that keep the raw credential out of the conversation and the memory. The AI can use a credential without ever seeing it, which means a compromised AI context cannot exfiltrate credentials it was never given.

    Identity-aware proxies for access control. Enterprise-grade deployments use proxy architecture that gates AI access to internal tools through an identity provider — ensuring that the AI can only access resources that the authenticated user is authorized to reach, and that access can be revoked centrally when a session ends or an employee departs.

    Sentinel agents in review loops. Before an AI takes an irreversible external action, a separate review agent checks the proposed action against defined constraints — security policies, scope limitations, instructions that would indicate prompt injection. The reviewer is a second layer of judgment before the action executes.

    Most of these patterns are not available out of the box in consumer AI products. They are the architecture that thoughtful engineering teams build when they are taking the risk seriously. For operators who are not building custom architecture, the practical equivalent is the simpler version: grant minimum scope, maintain a confirmation gate for irreversible actions, and audit integrations regularly.


    The Honest Position for Solo Operators and Small Teams

    The AI security conversation at the engineering level — MCP portals, sentinel agents, identity-aware proxies, Kubernetes secrets mounting — is not where most solo operators and small teams currently live. The consumer and prosumer AI products that most people actually use do not yet offer granular integration controls at that level of sophistication.

    That gap creates a practical challenge: the risk is real at the individual level, the mitigations that are most effective require engineering investment most operators cannot make, and the consumer product interfaces do not always surface the right questions at integration time.

    The honest position for this context is a set of simpler rules that approximate the right architecture without requiring it:

    • Do not connect integrations you will not actively maintain. If you set up a connection and forget about it, it is carrying risk without delivering value. Only connect what you will review in your quarterly integration audit. Stale integrations are a form of context rot — carrying signal you no longer control.
    • Do not grant write access when read access is sufficient. For any integration where the AI’s function is informational — summarizing, searching, answering questions — read-only scope is enough. Write access is a separate decision that should require a specific use case justification.
    • Do not give AI agents autonomous action on anything with a large blast radius. Anything that sends external communications, modifies production data, makes financial transactions, or touches infrastructure should have a human confirmation step before execution. The confirmation friction is the point.
    • Treat incoming content from unknown sources as untrusted. Email from senders you do not recognize, external documents processed on your behalf, web content accessed by an agent — all of this is potential prompt injection surface. The AI processing it does not automatically distinguish instructions embedded in content from instructions you gave directly.
    • Know the blast radius of your current setup. Sit down once and map what your AI integrations can reach. If you cannot describe the worst-case scenario for your current configuration, you are carrying risk you have not evaluated.

    None of these rules require engineering expertise. They require the same deliberate attention to scope and consequences that good operators apply to other parts of their work.


    The Market Will Not Solve This for You

    One of the more uncomfortable truths about the current AI integration landscape is that the market incentives do not strongly favor solving the risk problem on behalf of individual users. AI platforms are rewarded for adoption, engagement, and integration depth. Security friction reduces all three in the short term. The platforms that will invest heavily in making the security posture of broad integrations genuinely safe are the ones with enterprise customers whose procurement processes require it — not the consumer products that most individual operators use.

    This is not an argument against using AI integrations. It is an argument for not assuming that the product’s default configuration represents a considered risk assessment on your behalf. The default is optimized for capability and adoption. The security posture you actually want requires active choices that push against those defaults.

    The AI engineering community named the Lethal Trifecta, documented the attack vectors, and ships them anyway because the capability demand is real and the market rewards it. Individual operators who understand the framework can make different choices about what to connect, at what scope, with what confirmation gates — and those choices are available right now, in the current product interfaces, without waiting for the platforms to solve it.

    The question is not whether to use AI integrations. The question is whether to use them with the same level of deliberate attention you would give to any other decision with that blast radius. The answer to that question should be yes, and it usually is not yet.


    Frequently Asked Questions

    What is the Lethal Trifecta in AI security?

    The Lethal Trifecta refers to the combination of three AI agent capabilities that creates compounded risk: access to private data, access to untrusted external content, and the ability to take external actions. Any one of these capabilities carries manageable risk in isolation. The combination creates attack vectors — particularly prompt injection — that can turn a read-only vulnerability into an irreversible external action without the user’s knowledge or intent.

    What is prompt injection and why does it matter for AI integrations?

    Prompt injection is an attack where instructions are embedded in content the AI reads on your behalf — an email, a document, a web page — and the AI processes those instructions as if they came from you. Because language models do not reliably distinguish between user instructions and instructions embedded in processed content, a malicious actor who can get the AI to read a crafted document can potentially direct the AI to take actions using whatever integrations are available. This is an actively exploited vulnerability class, not a theoretical one.

    Is it safe to give Claude access to my email?

    It depends on the scope and architecture. Read-only access to your sent and received mail, with no ability to send on your behalf, has a significantly different risk profile than full read-write access with autonomous send capability. The relevant questions are: what is the minimum scope necessary for the function you need, is there a human confirmation gate before any send action, and do you treat incoming email from unknown senders as potential prompt injection surface? Read access for summarization with no send capability and manual review before any draft is sent is a defensible configuration. Fully autonomous email handling with broad send permissions is not.

    Should AI agents ever have SSH key access?

    Full computer access via SSH is appropriate in deliberately isolated sandbox environments where the blast radius is contained — a dedicated cloud instance with no access to production data, no credentials to sensitive systems, and no path to infrastructure that matters. It is not appropriate for servers that hold client data, production systems, or any infrastructure where unauthorized access would have significant consequences. The key question is not SSH access in principle but what the specific server touches and whether that blast radius is acceptable.

    What is cross-primitive escalation in AI security?

    Cross-primitive escalation is an attack pattern where a compromised read-only resource is used to instruct an AI to invoke a write-action capability. For example, a malicious document in your cloud storage might contain instructions telling the AI to use its email-send capability to forward sensitive files externally. The read integration and the write integration each seem bounded; the combination creates a bridge that neither risk model accounts for individually. It is why the Lethal Trifecta analysis applies at the combination level, not just per-integration.

    What is the minimum viable security posture for AI integrations?

    For operators who are not building custom security architecture: connect only what you will actively maintain; grant read-only scope unless write access is specifically required; require human confirmation before any irreversible external action; treat incoming content from unknown sources as potential prompt injection surface; and maintain a quarterly integration audit that reviews what is connected and whether the access scope is still appropriate. These rules do not require engineering investment — they require deliberate attention to scope and consequences at integration time.

    How does AI integration security differ for enterprise versus solo operators?

    Enterprise deployments have access to architectural mitigations — identity-aware proxies, MCP portals, sentinel agents in CI/CD, centralized credential management — that meaningfully reduce risk without eliminating capability. Solo operators and small teams typically use consumer product interfaces that do not offer the same granular controls. The gap means individual operators need to apply simpler rules (minimum scope, confirmation gates, regular audits) that approximate the right architecture without requiring it. The risk is real at both levels; the available mitigations differ significantly.



  • Context Rot: Why Your Bloated AI Memory Is Making Your Results Worse

    Context Rot: Why Your Bloated AI Memory Is Making Your Results Worse

    Context Rot: Why Your Bloated AI Memory Is Making Your Results Worse

    Context rot is the gradual degradation of AI output quality caused by an accumulating memory layer that has grown too large, too stale, or too contradictory to serve as reliable signal. It is not a platform bug. It is the predictable consequence of loading more into a persistent memory than it can usefully hold — and of never pruning what should have been retired months ago.

    Most people using AI with persistent memory believe the same thing: more context makes the AI better. The more it knows about you, your work, your preferences, and your history, the more useful it becomes. Load it up. Keep everything. The investment compounds.

    This intuition is wrong — not in the way that makes for a hot take, but in the way that explains a real pattern that operators running AI at depth eventually notice and cannot un-notice once they see it. Past a certain threshold, context does not add signal. It adds noise. And noise, when the model treats it as instruction, produces outputs that are subtly and then increasingly wrong in ways that are difficult to diagnose because the wrongness is baked into the foundation.

    This article is about what context rot is, why it happens, how to recognize it in your current setup, and what to do about it. It is primarily a performance argument, not a privacy argument — though the two converge at the pruning step. If you have already read about the archive vs. execution layer distinction, this piece goes deeper on the memory side of that argument. If you have not, the short version is: the AI’s memory should be execution-layer material — current, relevant, actionable — not an archive of everything you have ever told it.


    What Context Rot Actually Looks Like

    Context rot does not announce itself. It does not produce error messages. It produces outputs that feel slightly off — not wrong enough to immediately flag, but wrong enough to require more editing, more correction, more follow-up. Over time, the friction accumulates, and the operator who was initially enthusiastic about AI begins to feel like the tool has gotten worse. Often, the tool has not gotten worse. The context has gotten worse, and the tool is faithfully responding to it.

    Some specific patterns to recognize:

    The model keeps referencing outdated facts as if they are current. You told the AI something six months ago — about a client relationship, a project status, a constraint you were working under, a preference you had at the time. The situation has changed. The memory has not. The AI keeps surfecting that outdated framing in responses, subtly anchoring its reasoning in a version of your reality that no longer exists. You correct it in the session; next session, the stale memory is back.

    The model’s responses feel generic or averaged in ways they didn’t used to. This is one of the stranger manifestations of context rot, and it happens because memory that spans a long time period and many different contexts starts to produce a kind of composite portrait that reflects no single real state of affairs. The AI is trying to honor all the context simultaneously and producing outputs that are technically consistent with all of it, which means outputs that are specifically right about none of it.

    The model contradicts itself across sessions in ways that seem arbitrary. Inconsistent context produces inconsistent outputs. If your memory contains two different versions of your preferences — one from an early session and one from a later revision that you added without explicitly replacing the first — the model may weight them differently across sessions, producing responses that seem random when they are actually just responding to contradictory instructions.

    You find yourself re-explaining things you know you have already told the AI. This is a signal that the memory is either not storing what you think it is, or that what it stored has been diluted by so much other context that it no longer surfaces reliably. Either way, the investment you made in building up the context is not producing the return you expected.

    The model’s tone or approach feels different from what you established. Early in a working relationship with a particular AI setup, many operators take care to establish a voice, a set of norms, a way of working together. If that context is now buried under months of accumulated memory — project names that changed, client relationships that evolved, instructions that got superseded — the foundational preferences may be getting overridden by later context that is closer to the top of the stack.

    None of these patterns are definitive proof of context rot in isolation. Together, or in combination, they are a strong signal that the memory layer has grown past the point of serving you and has started to cost you.


    Why More Context Stops Helping Past a Threshold

    To understand why context rot happens, it helps to have a working mental model of what the AI’s memory is actually doing during a session.

    When you begin a conversation, the platform loads your stored memory into the context window alongside your message. The model then reasons over everything in that window simultaneously — your current question, your stored preferences, your project knowledge, your historical context. It is not a database lookup that retrieves the one right fact; it is a reasoning process that tries to integrate everything present into a coherent response.

    This works well when the memory is clean, current, and non-contradictory. It produces responses that feel genuinely personalized and informed by your actual situation. The investment is paying off.

    What happens when the memory is large, stale, and contradictory is different. The model is now trying to integrate a much larger set of information that includes outdated facts, superseded instructions, and implicit contradictions. The reasoning process does not fail cleanly — it degrades. The model produces outputs that are trying to honor too many constraints at once and end up genuinely optimal for none of them.

    There is also a more fundamental issue: not all context is equally valuable, and the model generally cannot tell which parts of your memory are still true. It treats stored facts as current by default. A memory that says “working on the Q3 campaign for client X” was useful context in August. In February, it is noise — but the model has no way to know that from the entry alone. It will continue to treat it as relevant signal until you tell it otherwise, or until you delete it.

    The result is that the memory you have built up — which felt like an asset as you were building it — is now partly a liability. And the liability grows with every session you add context without also pruning context that has expired.


    The Pruning Argument Is a Performance Argument, Not Just a Privacy Argument

    Most discussion of AI memory pruning frames it as a safety or privacy practice. You should prune your memory because you do not want old information sitting in a vendor’s system, because stale context might contain sensitive information, because hygiene is good practice. All of that is true.

    But framing pruning primarily as a privacy move misses the larger audience. Many operators who do not think of themselves as privacy-conscious will recognize the performance argument immediately, because they have already felt the effect of context rot even if they did not have a name for it.

    The performance argument: a pruned memory produces better outputs than a bloated one, even when none of the bloat is sensitive. Removing context that is outdated, irrelevant, or contradictory is a productivity practice. It sharpens the signal. It makes the AI’s responses more accurate to your current reality rather than a historical average of your past several selves.

    The two arguments converge at the pruning ritual. Whether you are motivated by privacy, performance, or both, the action is the same: open the memory interface, read every entry, and remove or revise anything that no longer accurately represents your current situation.

    The operators who find this argument most resonant are typically the ones who have been using AI long enough to have accumulated significant context, and who have noticed — sometimes without naming it — that the quality of responses has quietly declined over time. The context rot framing gives that observation a name and a cause. The pruning ritual gives it a fix.


    Memory as a Relationship That Ages

    There is a more personal dimension to this that the pure performance framing misses.

    The memory your AI holds about you is a portrait of who you were at the time you provided each piece of information. Early entries reflect the version of you that first started using the tool — your situation, your goals, your preferences, your constraints, as they existed at that moment. Later entries layer on top. Revisions exist alongside the things they were meant to revise. The composite that emerges is not quite you at any moment; it is a kind of time-averaged artifact of you across however long you have been building it.

    This aging is why old memories can start to feel wrong even when they were accurate when they were written. The entry is not incorrect — it correctly describes who you were in that context, at that time. What it fails to capture is that you are not that person anymore, at least not in the specific ways the entry claims. The AI does not know this. It treats the stored memory as current truth, which means it is relating to a version of you that is partly historical.

    Pruning, from this angle, is not just removing noise. It is updating the relationship — telling the AI who you are now rather than asking it to keep averaging across who you have been. The operators who maintain this practice have AI setups that feel genuinely current; the ones who neglect it have setups that feel subtly stuck, like a colleague who keeps referencing a project you finished eight months ago as if it were still active.

    This is also why the monthly cadence matters. The version of you that exists in March is meaningfully different from the version that existed in September, even if you do not notice the changes from day to day. A monthly pruning pass catches the drift before it compounds into something that would take a much larger effort to unwind.


    The Memory Audit Ritual: How to Actually Do It

    The mechanics of a memory audit are simple. The discipline of doing it consistently is the whole practice.

    Step 1: Open the memory interface for every AI platform you use at depth. Do not assume you know what is there. Actually look. Different platforms surface memory differently — some have a dedicated memory panel, some bury it in settings, some show it as a list of stored facts. Find yours before you start.

    Step 2: Read every entry in full. Not skim — read. The entries that feel immediately familiar are not the ones you need to audit carefully. The ones you have forgotten about are. For each entry, ask three questions:

    • Is this still true? Does this entry accurately describe your current situation, preferences, or context?
    • Is this still relevant? Even if it is still true, does it have any bearing on the work you are doing now? Or is it historical context that serves no current function?
    • Would I be comfortable if this leaked tomorrow? This is the privacy gate, separate from the performance gate. An entry can be current and relevant and still be something you would prefer not to have sitting in a vendor’s system indefinitely.

    Step 3: Delete or revise anything that fails any of the three questions. Be more aggressive than feels necessary on the first pass. You can always add context back; you cannot un-store something that has already been held longer than it should have been. The instinct to keep things “just in case” is the instinct that produces bloat. Resist it.

    Step 4: Review what remains for contradictions. After removing the obviously stale or irrelevant entries, read through what is left and look for internal conflicts — two entries that make incompatible claims about your preferences, working style, or situation. Where you find contradictions, consolidate into a single current entry that reflects your actual current state.

    Step 5: Set the next audit date. The audit is not a one-time event. Put a recurring calendar event for the same day every month — the first Monday, the last Friday, whatever you will actually honor. The whole audit takes about ten minutes when done monthly. It takes two hours when done annually. The math strongly favors the monthly cadence.

    The first full audit is almost always the most revealing. Most operators who do it for the first time find at least several entries they want to delete immediately, and sometimes find entries that surprise them — context they had completely forgotten they had loaded, sitting there quietly influencing responses in ways they had not accounted for.


    The Cross-App Memory Problem: Why One Platform’s Audit Is Not Enough

    The audit ritual above applies to one platform at a time. The more significant and harder-to-manage problem is the cross-app version.

    As AI platforms add integrations — connecting to cloud storage, calendar, email, project management, communication tools — the practical memory available to the AI stops being siloed within any single app. It becomes a composite of everything the AI can reach across your connected stack. The sum is larger than any individual component, and no platform’s interface shows you the total picture.

    This matters for context rot in a specific way: even if you diligently audit and prune your persistent memory on one platform, the context available to the AI may include stale information from integrated services that you have not reviewed. An old Google Drive document the AI can access, a Notion page that was accurate six months ago and has not been updated, a connected email thread from a project that is now closed — all of these become inputs to the reasoning process even if they are not explicitly stored as memories.

    The hygiene move here is a two-part practice: audit the explicit memory (what the platform stores about you) and audit the integrations (what external services the platform can reach). The integration audit — reviewing which apps are connected, what scope of access they have, and whether that scope is still appropriate — is a distinct activity from the memory audit but serves the same function. It asks: is the AI’s reachable context still accurate, current, and deliberately chosen?

    As cross-app AI integration becomes more standard — which it is becoming, quickly — this composite memory audit will matter more, not less. The platforms that make it easy to see the full picture of what an AI can access will have a meaningful advantage for users who care about this. For now, the practice is manual: map your integrations, review what each one provides, and prune access that is no longer serving a current purpose.

    The guardrails article covers the integration audit mechanics in detail, including the specific steps for reviewing and revoking connected applications. This piece focuses on why it matters from a context-quality standpoint, which the guardrails article only addresses briefly.


    The Epistemic Problem: The AI Doesn’t Know What Year It Is

    There is a deeper layer to context rot that goes beyond pruning habits and integration audits. It involves a fundamental characteristic of how AI systems work that most users have not fully internalized.

    AI systems do not have a reliable sense of when information was provided. A fact stored in memory six months ago is treated with roughly the same confidence as a fact stored yesterday, unless the entry itself includes a date or the user explicitly flags it as recent. The model has no internal calendar for your context — it cannot look at your memory and identify the stale entries on its own, because staleness requires knowing current reality, and the model’s current reality is whatever is in its context window.

    This has a practical consequence that extends beyond persistent memory into generated outputs: AI-produced content about time-sensitive topics — pricing, best practices, platform features, competitive landscape, regulatory status, organizational structures — may reflect the training data’s version of those facts rather than the current version. The model does not know the difference unless it has been explicitly given current information or instructed to flag temporal uncertainty.

    For operators producing AI-assisted content at volume, this is a meaningful quality risk. A confidently stated claim about the current state of a tool, a price, a policy, or a practice may be confidently wrong because the model is drawing on information that was accurate eighteen months ago. The model does not hedge this automatically. It states it as current truth.

    The hygiene move is explicit temporal flagging: when you store context in memory that has a time dimension, include the date. When you produce content that makes present-tense claims about things that change, verify the specific claims before publication. When you notice the model stating something present-tense about a fast-moving topic, treat that as a prompt to check rather than a fact to accept.

    This practice is harder than the memory audit because it requires active vigilance during generation rather than a scheduled maintenance pass. But it is the same underlying discipline: not treating the AI’s output as current reality without confirmation, and building the habit of asking “is this still true?” before accepting and using anything time-sensitive.


    What Healthy Memory Looks Like

    The goal is not an empty memory. An empty memory is as useless as a bloated one, for the opposite reason. The goal is a memory that is current, specific, non-contradictory, and scoped to what you are actually doing now.

    A healthy memory for a solo operator in a typical week might include:

    • Current active projects with their actual current status — not what they were in January, what they are now
    • Working preferences that are genuinely stable — communication style, output format preferences, tools in use — without the ten variations that accumulated as you refined those preferences over time
    • Constraints that are still active — deadlines, budget limits, scope boundaries — with outdated constraints removed
    • Context about recurring relationships — clients, collaborators, audiences — at a level of detail that is useful without being exhaustive

    What healthy memory does not include: finished projects, resolved constraints, superseded preferences, people who are no longer part of your active work, context that was relevant to a past sprint and is not relevant to the current one, and anything that would fail the leak-safe question.

    The difference between a memory that serves you and one that costs you is not primarily about size — it is about currency. A large memory that is fully current and internally consistent will serve you better than a small one that is half-stale. The pruning practice is what keeps currency high as the memory grows over time.


    Context Rot as a Proxy for Everything Else

    Operators who take context rot seriously and build the pruning practice tend to find that it changes how they approach the whole AI stack. The discipline of asking “is this still true, is this still relevant, would I be comfortable if this leaked” — three times a month, for every stored entry — trains a more deliberate relationship with what goes into the context in the first place.

    The operators who notice context rot and act on it are also the ones who notice when they are loading context that probably should not be loaded, who think about the scoping of their projects before they become useful, who maintain integrations deliberately rather than by accumulation. The pruning ritual is a keystone habit: it holds several other good practices in place.

    The operators who ignore context rot — who keep loading, never pruning, trusting the accumulation to compound into something useful — tend to arrive eventually at the moment where the AI feels fundamentally broken, where the outputs are so shaped by stale and contradictory context that a fresh start seems like the only option. Sometimes the fresh start is the right move. But it is a more expensive version of what the monthly audit was doing cheaply all along.

    The AI hygiene practice, at its simplest, is the practice of maintaining a current relationship with the tool rather than letting that relationship age on autopilot. Context rot is what happens when the relationship ages. The audit is what keeps it fresh. Neither is complicated. Only one of them is common.


    Frequently Asked Questions

    What is context rot in AI systems?

    Context rot is the degradation of AI output quality caused by a persistent memory layer that has grown too large, too stale, or too contradictory. As memory accumulates outdated facts and superseded instructions, the AI begins to produce responses that are shaped by historical context rather than current reality — resulting in outputs that require more correction and feel subtly off-target even when the underlying model has not changed.

    How does more AI memory make outputs worse?

    AI models reason over everything present in the context window simultaneously. When memory includes current, accurate, non-contradictory information, this produces well-calibrated responses. When memory includes stale facts, outdated preferences, and implicit contradictions, the model tries to honor all of it at once — producing outputs that are averaged across incompatible inputs and specifically correct about none of them. Past a threshold, more context adds noise faster than it adds signal.

    How often should I audit my AI memory?

    Monthly is the recommended cadence for most operators. The first audit typically takes 30–60 minutes; subsequent monthly passes take around 10 minutes. Waiting longer than a month allows drift to compound — by the time you audit annually, the volume of stale entries can make the exercise feel overwhelming. The monthly cadence is what keeps it manageable.

    Does context rot apply to all AI platforms or just Claude?

    Context rot applies to any AI system with persistent memory or long-lived context — including ChatGPT’s memory feature, Gemini with Workspace integration, enterprise AI tools with shared knowledge bases, and any platform where prior context influences current responses. The specific mechanics differ by platform, but the underlying dynamic — stale context degrading output quality — is consistent across systems.

    What is the difference between a memory audit and an integration audit?

    A memory audit reviews what the AI explicitly stores about you — the facts, preferences, and context entries in the platform’s memory interface. An integration audit reviews which external services the AI can access and what information those services expose. Both affect the AI’s effective context; a thorough hygiene practice addresses both on a regular schedule.

    Should I delete all my AI memory and start fresh?

    A full reset is sometimes the right move — particularly after a long period of neglect or when the memory has accumulated to a point where selective pruning would take longer than starting over. But as a regular practice, surgical pruning (removing what is stale while keeping what is current) preserves the genuine value you have built while eliminating the noise. The goal is not an empty memory but a current one.

    How does context rot relate to AI output accuracy on factual claims?

    Context rot in persistent memory is one layer of the accuracy problem. The deeper layer is that AI models carry training-data assumptions that may be out of date regardless of what is stored in memory — prices, policies, platform features, and best practices change faster than training cycles. For time-sensitive claims, the right practice is to verify against current sources rather than treating AI-generated present-tense statements as confirmed fact.



  • Guardrails You Can Install Tonight: The AI Hygiene Starter Stack

    Guardrails You Can Install Tonight: The AI Hygiene Starter Stack

    Guardrails You Can Install Tonight: The AI Hygiene Starter Stack

    AI hygiene refers to the set of deliberate practices that govern what information enters your AI system, how long it stays there, who can access it, and how it exits cleanly when you leave. It is not a product, a setting, or a one-time setup. It is an ongoing practice — more like brushing your teeth than installing antivirus software.

    Most AI hygiene advice is either too abstract to act on tonight (“think about what you store”) or too technical to reach the average operator (“implement OAuth 2.0 scoped token delegation”). This article is neither. It is a specific, ordered list of things you can do today — many of them in under 20 minutes — that will meaningfully reduce the risk profile of your current AI setup without requiring you to become a security engineer.

    These guardrails were developed from direct operational experience running AI across a multi-site content operation. They are not theoretical. Each one exists because we either skipped it and paid the price, or installed it and watched it prevent something that would have cost real time and money to unwind.

    Start with Guardrail 1. Finish as many as feel right tonight. Come back to the rest when you have energy. The practice compounds — even one guardrail installed is meaningfully better than none.


    Before You Install Anything: Map the Six Memory Surfaces

    Here is the single most important diagnostic you can run before touching any setting: sit down and write out every place your AI system currently stores information about you.

    Most people think chat history is the memory. It is not — or at least, it is only one layer. Between what you have typed, what is in persistent memory features, what is in system prompts and custom instructions, what is in project knowledge bases, what is in connected applications, and what the model was trained on, the picture of “what the AI knows about me” is spread across at least six surfaces. Each surface has different retention rules. Each has different access paths. And no single UI in any major AI platform shows all of them in one place.

    Here are the six surfaces to map for your specific stack:

    1. Chat history. The conversation log. On most platforms this is visible in the sidebar and can be cleared manually. Retention policies vary widely — some platforms keep it indefinitely until you delete it, some have automatic deletion windows, some export it in data portability requests and some do not. Know your platform’s policy.

    2. Persistent memory / memory features. Explicitly stored facts the AI carries across conversations. Claude has a memory system. ChatGPT has memory. These are distinct from chat history — you can delete all your chat history and still have persistent memories that survive. Most users who have these features enabled have never read them in full. That is the first thing to fix.

    3. Custom instructions and system prompts. Any standing instructions you have given the AI about how to behave, what role to play, or what to know about you. These are often set once and forgotten. They may contain information you would not want surface-level visible to someone who borrows your device.

    4. Project knowledge bases. Files, documents, and context you have uploaded to a project or workspace within the AI platform. These are often the most sensitive layer — operators upload strategy documents, client files, internal briefs — and they are also the layer most users have never audited since initial setup.

    5. Connected applications and integrations. OAuth connections to Google Drive, Notion, GitHub, Slack, email, calendar, or other services. Each connection is a two-way door. The AI can read from that service; depending on permissions, it may be able to write to it. Many users have accumulated integrations they set up once and no longer actively use.

    6. Browser and device state. Cached sessions, autofilled credentials, open browser tabs with active AI sessions, and any extensions that interact with AI tools. This is the analog layer most people forget entirely.

    Write the six surfaces down. For each one, note what is currently there and whether you know the retention policy. This exercise alone — before you change a single thing — is often the most clarifying act an operator can perform on their current AI setup. Most people discover at least one surface they had either forgotten about or never thought to inspect.

    With the map in hand, the following guardrails make more sense and install faster. You know what you are protecting and where.


    Guardrail 1: Lock Your Screen. Log Out of Sensitive Sessions.

    Time to install: 2 minutes. Requires: discipline, not tooling.

    The threat model most people imagine when they think about AI data security is the sophisticated one: a nation-state actor, a platform breach, a data-center incident. These are real risks and deserve real attention. But they are also statistically rare and largely outside any individual user’s control.

    The threat model people do not imagine is the one that is statistically constant: the partner who borrows the phone, the coworker who glances at the open laptop on the way to the coffee machine, the house guest who uses the family computer to “just check something quickly.”

    The most personal data in your AI setup is almost always leaked by the most personal connections — not by adversaries, but by proximity. A locked screen is not a sophisticated security measure. It is a boundary that makes accidental exposure require active effort rather than passive convenience.

    The practical installation:

    • Set your screen lock to 2 minutes of inactivity or less on any device where you have an active AI session.
    • When you step away from a high-stakes session — anything involving credentials, client data, medical information, or personal strategy — close the browser tab or log out, not just lock the screen.
    • Treat your AI session like you would treat a physical folder of sensitive documents. You would not leave that folder open on the coffee table when guests came over. Apply the same habit digitally.

    This is the embarrassingly analog first guardrail. It is also the one that prevents the most common class of accidental exposure in 2026. Install it before installing anything else.


    Guardrail 2: Read Your Memory. All of It. Tonight.

    Time to install: 15–30 minutes for first pass. 10 minutes monthly after that. Requires: your AI platform’s memory interface.

    If you have persistent memory features enabled on any AI platform — and if you have used the platform for more than a few weeks, there is a reasonable chance you do — open the memory interface and read every entry top to bottom. Not skim. Read.

    For each entry, ask three questions:

    • Is this still true?
    • Is this still relevant?
    • Would I be comfortable if this leaked tomorrow?

    Anything that fails any of the three questions gets deleted or rewritten. The threshold is intentionally conservative. You are not trying to delete everything useful; you are trying to remove the entries that are outdated, overly specific, or higher-risk than they are useful.

    What operators typically find in their first full memory read:

    • Facts that were true six months ago and are no longer accurate — old project names, old client relationships, old constraints that have been resolved.
    • Context that was added in a moment of convenience (“remember that my colleague’s name is X and they tend to push back on Y”) that they would now prefer to not have stored in a vendor’s system.
    • Information that is genuinely sensitive — financial figures, relationship details, health-adjacent context — that got added without much deliberate thought and has been sitting there since.
    • References to people in their life — partners, colleagues, clients — that those people have no idea are in the system.

    The audit itself is the intervention. The act of reading your stored self forces a level of attention that no automated tool can replicate. Most users who do this for the first time find at least one entry they want to delete immediately, and many find several. That is not a failure. That is the practice working.

    After the initial audit, the maintenance version takes about ten minutes once a month. Set a recurring calendar event. Call it “memory audit.” Do not skip it when you are busy — the months when you are too busy to audit are usually the months with the most new context to review.


    Guardrail 3: Run Scoped Projects, Not One Sprawling Context

    Time to install: 30–60 minutes to restructure. Requires: your AI platform’s project or workspace feature.

    If your entire AI setup lives in one undifferentiated context — one assistant, one memory layer, one big bucket of everything you have ever discussed — you have an architecture problem that no individual guardrail can fully fix.

    The solution is scope: separate projects (or workspaces, or contexts, depending on your platform) for genuinely distinct domains of your work and life. The principle is the same one that governs good software architecture: least privilege access, applied to context instead of permissions.

    A practical scope structure for a solo operator or small agency might look like this:

    • Client work project. Contains client briefs, deliverables, and project context. No personal information. No information about other clients. Each major client ideally gets their own scoped context — client A should not be able to inform responses about client B.
    • Personal writing project. Contains voice notes, draft ideas, personal brand thinking. No client data. No credentials.
    • Operations project. Contains workflows, templates, and process documentation. Credentials do not live here — they live in a secrets manager (see Guardrail 4).
    • Research project. Contains general reading, industry notes, reference material. The least sensitive scope, and therefore the most appropriate place for loose context that does not fit elsewhere.

    The cost of this architecture is a small amount of cognitive overhead when switching between projects. You need to think about which project you are in before starting a session, and occasionally move context from one project to another when your use case shifts.

    The benefit is that the blast radius of any single compromise, breach, or accidental exposure is contained to the scope of that project. A problem in your client work project does not expose your personal writing. A problem in your operations project does not expose your client data. You are not protected from all risks, but you are protected from the cascading-everything-fails scenario that a single undifferentiated context creates.

    If restructuring everything tonight feels like too much, start smaller: create one scoped project for your most sensitive current work and move that context there. You do not have to do the whole restructure in one session. The direction matters more than the completion.


    Guardrail 4: Rotate Credentials That Have Touched an AI Context

    Time to install: 1–3 hours depending on how many credentials are affected. Requires: credential audit, rotation, and a calendar reminder.

    Any API key, application password, OAuth token, or connection string that has ever appeared in an AI conversation, project file, or memory entry is a credential at elevated risk. Not because the platform necessarily stores it in a searchable way, but because the scope of “where could this have ended up” is now broader than a single system with a single access log.

    The practical steps:

    Step 1: Inventory. Go through your project files, chat history, and memory entries. Look for anything that looks like a key, password, or token. API keys typically start with a platform prefix (sk-, pk-, or similar). Application passwords often appear as space-separated character groups. OAuth tokens are usually longer strings. Write down every credential you find.

    Step 2: Rotate. For every credential you found, generate a new one from the issuing platform and invalidate the old one. Yes, this requires updating wherever the credential is used. Yes, this takes time. Do it anyway. A credential that has appeared in an AI context is not a credential whose exposure history you can audit.

    Step 3: Move credentials out of AI contexts. Going forward, credentials do not live in AI memory, project files, or conversation history. They live in a secrets manager — GCP Secret Manager, 1Password, Doppler, or similar. The AI gets a reference or a proxy call; the credential itself never touches the AI context. This is a one-time architectural change that eliminates the problem permanently rather than requiring ongoing vigilance.

    Step 4: Set a rotation schedule. Any credential that has a legitimate reason to exist in a system the AI can touch should be on a rotation schedule — 90 days is a reasonable default. Put a recurring calendar event on the same day you do your memory audit. The two practices pair well.

    This is the guardrail that most operators resist most strongly, because it requires the most concrete work. It is also the guardrail with the highest upside: a rotated credential that gets compromised costs you a rotation. A static credential that gets compromised and you discover six months later costs you everything that credential touched in the intervening time.


    Guardrail 5: Install Session Discipline for High-Stakes Work

    Time to install: 5 minutes to build the habit. Requires: no tooling, only intention.

    For any session involving information you would genuinely not want to surface at the wrong time — client strategy, credentials, legal matters, financial planning, relationship context — install a simple open-and-close discipline:

    • Open explicitly. At the start of a sensitive session, load the context you need. Do not assume previous sessions left you in the right state. Verify what is in scope before you start.
    • Work in scope. Keep the session focused on the stated purpose. If you find yourself drifting into unrelated territory, either stay on task or close the current session and open a new one for the new topic.
    • Close explicitly. When the session is done, close it — not just by navigating away, but by actively ending it. If your platform allows session clearing or archiving, use it. Do not leave a sensitive session sitting open indefinitely in a background tab.

    The reason most people resist this is friction: reloading context at the start of a new session feels like wasted time. But the sessions that never close are the ones that eventually create exposure. The habit of closing is not overhead. It is the practice that keeps the context you built from becoming permanent ambient risk.

    The physical analog is ancient and no one argues with it: you do not leave sensitive documents spread across your desk when you leave the office. The digital version of the same habit just requires conscious installation because the digital default is “leave it open.”


    Guardrail 6: Audit Your Integrations and Revoke What You Don’t Use

    Time to install: 20 minutes. Requires: access to your AI platform’s integration or connected apps settings.

    Every major AI platform now supports integrations with external services — calendar, email, cloud storage, project management, communication tools. Each integration you authorize is a door between your AI system and that external service. Most people set up these integrations in a moment of enthusiasm, use them once or twice, and then forget they exist.

    Forgotten integrations are risk you are carrying without benefit.

    The audit is straightforward:

    1. Open your AI platform’s connected apps, integrations, or OAuth settings.
    2. Read every authorized connection. For each one, answer: “Am I actively using this? Is it providing value I cannot get another way?”
    3. For anything where the answer is no, revoke the integration immediately.
    4. For anything where the answer is yes, note what scope of access you have granted. Many integrations default to broad permissions when narrow ones would serve. If you authorized “read and write access to all files” when you only need “read access to one folder,” revoke and re-authorize with the minimum scope necessary.

    Repeat this audit quarterly, or any time you add a new integration. The list has a way of growing faster than you notice.

    As AI platforms increasingly support cross-app memory — where context from one platform informs responses in another — the integration audit becomes more important, not less. The sum of what your AI stack knows is now the composite of all connected surfaces, not any individual platform. Auditing the connections is how you keep that composite picture within bounds you have deliberately chosen.


    Putting It Together: The Starter Stack in Priority Order

    If you are starting from zero tonight, here is the order that produces the most protection per hour of time invested:

    First 10 minutes: Lock your screen. Log out of any AI sessions you have left open that you are not actively using. This is Guardrail 1 and costs nothing except attention.

    Next 30 minutes: Read your memory. Run the full audit on any AI platform where you have persistent memory features enabled. Delete anything that fails the three-question test. This is Guardrail 2 and is the single highest-leverage action on this list for most users.

    This week: Audit your integrations (Guardrail 6) and set up session discipline for high-stakes work (Guardrail 5). Neither requires heavy lifting — both primarily require attention and the five minutes it takes to actually look at what is connected.

    This month: Structure scoped projects (Guardrail 3) and rotate credentials that have touched AI contexts (Guardrail 4). These are the higher-effort guardrails but also the ones with the most durable benefit. Once they are installed, the maintenance burden is light.

    Ongoing: The monthly memory audit and quarterly integration audit become standing practices. Once the initial work is done, the maintenance version of this whole stack takes about 30 minutes a month. That is the steady-state cost of not periodically detonating.


    What This Stack Does Not Cover

    Intellectual honesty requires naming the edges. This starter stack addresses the most common risk profile for individual operators and small teams. It does not address:

    Enterprise-grade threat models. If you are running AI in a regulated industry, handling protected health information or financial data at scale, or operating in a context where you have disclosure obligations to regulators, this stack is a floor, not a ceiling. You need more: data residency agreements, vendor security audits, formal incident response plans, and probably legal counsel who has thought about AI liability specifically.

    The platform’s obligations. These guardrails are about what you control. They do not address what the AI platform does with your data on its end — training policies, retention practices, breach disclosure timelines, or third-party data sharing agreements. Read the privacy policy for any platform you use at depth. If you cannot find a clear answer to “does this company use my conversations to train future models,” treat that as a meaningful signal.

    Credential security at the infrastructure level. Guardrail 4 covers credentials that have appeared in AI contexts. It is not a comprehensive credential security framework. If you are operating infrastructure where credentials are a significant risk surface, the right tool is a full secrets management solution and possibly a security review of your deployment architecture — not a checklist.

    The people in your life who are in your AI context without knowing it. This is a different kind of guardrail entirely, and it belongs in a conversation rather than a settings menu. The Clean Tool pillar piece covers this in depth. The short version: if people you care about appear in your AI memory, they almost certainly do not know they are there, and that is worth a conversation.


    The Practice Compounds or Decays

    AI hygiene is not a project with a completion date. It is a standing practice — more like financial review or equipment maintenance than a one-time installation. The operators who build this practice early, when the stakes are still relatively small and the mistakes are still cheap to recover from, will be meaningfully safer in 2027 and 2028 as memory depth increases, cross-app integration becomes standard, and the AI stack handles more consequential work.

    The operators who wait for the first public catastrophe to start thinking about it will not be starting from scratch — they will be starting from negative, trying to contain an incident while simultaneously installing the practices they should have had in place.

    This is not fear-based reasoning. It is the same logic that applies to backing up your data, maintaining your vehicle, or reviewing your contracts annually. The cost of the practice is small and constant. The cost of the failure is large and concentrated. The math is not complicated.

    Start with Guardrail 1 tonight. Add one more this week. The practice compounds from there — or it doesn’t start, and you keep carrying risk you could have put down.

    The choice is available to you right now, which is the whole point of this article.


    Related Reading


    Frequently Asked Questions

    How long does it take to install the basic AI hygiene guardrails?

    The first two guardrails — locking your screen and reading your persistent memory in full — take under 45 minutes and can be done tonight. The full starter stack, including scoped projects, credential rotation, session discipline, and integration audit, requires a few hours spread over a week or two. Maintenance after initial setup runs approximately 30 minutes per month.

    Do these guardrails apply to Claude specifically, or to all AI platforms?

    The guardrails apply to any AI platform with persistent memory, project storage, or third-party integrations — which currently includes Claude, ChatGPT, Gemini, and most enterprise AI tools. The specific location of memory settings and integration controls differs by platform, but the underlying practice is the same. This article was written from direct experience with Claude but the logic transfers.

    What is the single most important guardrail for a beginner to start with?

    Reading your persistent memory in full (Guardrail 2) is the single most clarifying action most users can take. Most people have never done it. The exercise alone — reading every stored entry and asking whether it is still true, still relevant, and leak-safe — surfaces more about your current risk posture than any abstract audit. Start there.

    Should credentials ever appear in an AI conversation?

    As a general rule, no. Credentials should live in a secrets manager and be passed to AI contexts via references or proxy calls that keep the raw credential out of the conversation. In practice, most operators have pasted at least one credential into a conversation at some point. When that happens, the right response is to treat that credential as potentially exposed and rotate it promptly — not to wait and see.

    How do scoped AI projects differ from just having separate browser tabs?

    Separate browser tabs share the same account, session state, and in most platforms the same persistent memory layer. Scoped projects, by contrast, are explicitly separated contexts where project-specific knowledge, uploaded files, and custom instructions are isolated from one another. A problem in one project scope does not contaminate another the way a shared session state might.

    What does an integration audit actually involve?

    An integration audit means opening your AI platform’s connected apps or OAuth settings, reading every authorized connection, and revoking anything you are not actively using or that has broader permissions than it needs. Most users find at least one integration they had forgotten about. The audit takes about 20 minutes and should be repeated quarterly, or any time you add a new connection.

    Is AI hygiene only relevant for operators running AI at depth, or does it apply to casual users too?

    The stakes scale with usage depth, but the basic practices apply at every level. A casual user who primarily uses AI for writing help has lower exposure than an operator running AI across client work, credentials, and integrated infrastructure. But even casual users have persistent memory, chat history, and connected apps that merit a periodic look. The starter stack is designed to be relevant across the full range.

    What is the difference between AI hygiene and AI safety?

    AI safety typically refers to research and policy work focused on the long-term behavior of powerful AI systems at a societal level — alignment, misuse at scale, existential risk. AI hygiene is a narrower, more immediate practice focused on how individual operators manage their personal and professional exposure within current AI tools. The two are related but operate at different scales. This article is concerned with hygiene: what you can do, in your own setup, tonight.




  • What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

    What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

    I run both Notion Custom Agents and Claude every working day. I have opinions about when each one earns its place and when each one doesn’t. This article is those opinions, named clearly, with no vendor fingers on the scale.

    Most comparative writing about AI tools is written by people with an incentive to recommend one over the other — affiliate programs, platform partnerships, the writer’s own consulting practice specializing in one side. This piece doesn’t have that problem. I use both, I pay for both, and if one of them got replaced tomorrow, the pattern I run would survive with a different tool slotted into the same role. The tools are interchangeable. The judgment about which one to reach for is not.

    Here’s the honest map.


    The short version

    Use Notion Custom Agents when: the work is a recurring rhythm, the context lives in Notion, the output is a Notion page or database change, and you’re willing to spend credits on it running in the background.

    Use Claude when: the work needs real judgment, the context is complex or contested, the output is something that needs a human’s voice and review, or the workflow crosses enough systems that the agent’s world is too small.

    Those two sentences will save most operators ninety percent of the architecture mistakes I see people make. The rest of this article is specificity about why, because general rules only take you so far before you need to know what’s actually going on under the hood.


    Where Notion Custom Agents genuinely shine

    I’m going to start with the positive because anyone who only reads the critical part of a comparative article will walk away with a warped picture. Custom Agents are genuinely impressive when they fit the job.

    Recurring synthesis tasks across workspace data. The daily brief pattern I’ve written about works better in a Custom Agent than in Claude. The agent runs on schedule, reads the right pages, writes the synthesis back into the workspace, and is done. Claude can do this too, but Custom Agents do it without you remembering to prompt them. That’s the whole point of the “autonomous teammate” framing, and for rhythmic synthesis work, it genuinely delivers.

    Inbox triage. An agent watching a database with a clear decision tree — categorize incoming requests, assign a priority, route to the right owner — is a sweet-spot Custom Agent. It does the boring sort every day, flags the ones it’s unsure about, and keeps the pile from growing. Real teams are reportedly triaging at over 95% accuracy on inbound tickets with this pattern.

    Q&A over workspace knowledge. Agents that answer company policy questions in Slack or provide onboarding guidance for new hires are quietly some of the most valuable agents in production. They replace hours of repetitive answer-the-same-question work, and because the answers come from actual workspace content, the accuracy is high when the workspace is well-maintained.

    Database enrichment. An agent that watches for new rows in a database, looks up additional context, and fills in fields automatically is a beautiful fit. The agent is doing deterministic-adjacent work with just enough judgment to handle edge cases. This is exactly what Custom Agents were designed for.

    Autonomous reporting. Weekly sprint recaps, monthly OKR reports, Friday retrospectives. Reports that would otherwise require someone to sit down and write them, now drafted automatically from the workspace state.

    For these categories, Custom Agents are the right tool, and Claude is the wrong tool even though Claude would technically work. The wrong-tool-even-though-it-works framing matters because operators often default to Claude for everything, which is expensive in different ways.


    Where Notion Custom Agents break down

    Now the honest part. Custom Agents have real limits, and pretending otherwise is how operators get burned.

    1. Anything that requires serious reasoning across contested information

    Custom Agents are capable of synthesis, but the quality of their synthesis degrades when the inputs disagree with each other, when the right answer isn’t on the page, or when the task requires actually thinking through a problem rather than summarizing existing context.

    The signal that you’ve hit this limit: the agent produces an output that sounds plausible, reads well, and is subtly wrong. If you need to double-check every agent output in a category of work because you can’t trust the judgment, that category of work shouldn’t be going through an agent. Use Claude in a conversation where you can actually interrogate the reasoning.

    Specific examples where this shows up: strategic decisions, conflicting client feedback, legal or compliance-adjacent questions, anything that involves weighing tradeoffs. The agent will produce an answer. The answer will often be wrong in a specific way.

    2. Long-horizon work that needs to hold nuance across steps

    Custom Agents are designed for bounded tasks with clear inputs and clear outputs. When you try to use them for work that requires holding nuance across many steps — drafting a long document, executing a multi-stage strategic plan, navigating a complex workflow — the wheels come off.

    Part of this is architectural: agents have limited ability to carry state across runs in the way an extended Claude conversation can. Part of it is practical: the “one agent, one job” principle Notion itself recommends is a hard constraint, not a style guideline. When you try to make an agent do multiple things, you get an agent that does each of them worse than a single-purpose agent would.

    If the job you’re thinking about is genuinely one coherent thing that happens to have many steps, and the steps inform each other, it’s probably a Claude conversation, not a Custom Agent.

    3. Work that needs a specific human voice

    This one is more important than most operators realize. Agents write in a synthesized style. It’s a perfectly fine style. It’s also recognizable as a perfectly fine style, which is the problem.

    If the output is going to have your name on it — client communications, thought leadership, outbound that should sound like you — the agent’s default voice will flatten whatever was distinctive about your writing. You can push back on this with instructions, and good instructions help a lot. But the underlying truth is that Custom Agents optimize for “sounds like a competent business writer,” and competent business writing is a commodity. If you sell distinctiveness, the agent is a liability.

    Claude in a conversation, with your active voice-shaping, produces writing that can actually sound like you. Custom Agents optimize for a different thing.

    4. Anything requiring real-time web context

    Custom Agents can reach external tools via MCP, but they don’t have a general ability to browse the live web and integrate what they find into their reasoning. If the work requires recent news, real-time market data, or anything that isn’t in a known database the agent can query, the agent will either fail, hallucinate, or return stale information from whatever workspace snapshot it had.

    Claude — with web search enabled, with the ability to fetch arbitrary URLs, with research capabilities — handles this class of work dramatically better. The right architectural response: use Claude for anything with a live-web dependency, let Custom Agents handle the parts that don’t.

    5. Deep technical work

    Custom Agents can technically do technical work. They should mostly not be asked to. Writing code, debugging failures, analyzing logs, reasoning through system architecture — these live in Claude Code’s territory, not Custom Agents’ territory. The Custom Agent framework was built for operational workflows, and while it will attempt technical tasks, it attempts them at the quality of a generalist, not a specialist.

    The sign you’ve crossed this line: the agent is producing code or technical reasoning that a competent human reviewer would push back on. Move the work to Claude Code, which was built for exactly this.

    6. High-stakes writes with permanent consequences

    Agents execute. They don’t second-guess themselves. An agent configured to send emails will send emails. An agent configured to update client records will update client records. An agent configured to delete rows will delete rows.

    When the cost of the agent doing the wrong thing is high — sending a message you can’t unsend, overwriting data you can’t recover, triggering a payment you can’t reverse — the discipline is: don’t let the agent do it without human approval. Use “Always Ask” behavior. Use a draft-and-review pattern. Use anything that puts a human in the loop before the irreversible action.

    Operators who ship fast and iterate freely tend to underweight this category. The day you discover it’s been quietly overwriting the wrong database field for two weeks is the day you wish you’d built the review gate.

    7. Credit efficiency for genuinely reasoning-heavy work

    This one is practical rather than architectural. Starting May 4, 2026, Custom Agents run on Notion Credits at roughly $10 per 1,000 credits. Internal Notion data suggests Custom Agents run approximately 45–90 times per 1,000 credits for typical tasks — meaning tasks that require more steps, more tool calls, or more context cost proportionally more credits per run. That means simple recurring tasks are cheap. Complex reasoning-heavy tasks add up.

    If you’re building an agent that does heavy reasoning work many times per day, the credit cost can exceed what the same work would cost through Claude’s API directly, especially on higher-capability Claude models called directly without the Notion overhead. For high-frequency reasoning work, run the math before you commit to the agent architecture.


    Where Claude genuinely wins

    The other side of the honest comparison. Claude earns its place in categories where Custom Agents either can’t operate or operate poorly.

    Strategic thinking conversations. When you’re working through a decision, evaluating a tradeoff, or thinking through a strategy, Claude in an extended conversation is the right tool. The back-and-forth is the whole point. You can interrogate reasoning, push back on conclusions, reframe the problem mid-conversation. An agent that produces a one-shot answer, no matter how good, is the wrong shape for this kind of work.

    Drafting with voice. Writing that needs to sound like a specific person is Claude’s territory. You can load up Claude with context about your voice — past writing, tonal preferences, things to avoid — and get output that actually reads as yours. Notion Custom Agents will always produce generic-flavored writing. That’s fine for internal reports. It’s a problem for anything external.

    Code and technical work. Claude Code specifically is built for technical depth. It reads codebases, executes in a terminal, calls tools, iterates on failures. Custom Agents will flail at the same work.

    Research synthesis across live sources. Claude with web search and fetch capabilities handles “go read this, this, and this, and tell me what the current state actually is” in a way Custom Agents structurally can’t. Anything that requires reaching outside a known data universe is Claude.

    Work that crosses many systems. When a workflow needs to touch code, Notion, a database, an external API, and a human review, Claude Code with the right MCP servers connected coordinates across them better than a Custom Agent inside Notion does. The agent’s world is Notion-plus-connected-integrations. Claude’s world is wider.

    Anything requiring judgment about whether to proceed. Agents execute. Claude in a conversation can pause, check with you, and ask “should I actually do this?” That judgment layer is frequently the most important part of the workflow.


    The pattern that actually works (both, in the right places)

    The operators who get this right aren’t choosing one tool over the other. They’re running both, in specific roles, with clear handoffs.

    The pattern I run:

    Rhythmic operational work lives in Custom Agents. Morning briefs, triage, weekly reviews, database enrichment, Q&A over workspace knowledge. Things that happen repeatedly, have clear inputs, and produce workspace-shaped outputs.

    Judgment-heavy work lives in Claude conversations. Strategic decisions, drafting with voice, research, anything requiring back-and-forth. I do this work in Claude chat sessions with the Notion MCP wired in, so Claude has real context when I need it to.

    Technical work lives in Claude Code. Building scripts, managing infrastructure, debugging, writing code. Custom Agents don’t touch this.

    Handoffs are explicit. When I make a decision in Claude that needs to become operational, it lands as a task or brief in a Notion database, and from there a Custom Agent can pick it up. When a Custom Agent surfaces something that needs judgment, it creates an escalation entry that shows up on my Control Center, where I engage Claude to think through it.

    The two systems pass work back and forth through the workspace. Neither tries to do the other’s job. The seams are the Notion databases where state lives.

    This is not the vendor-shaped pattern. The vendor-shaped pattern says “Custom Agents can handle everything.” The operator-shaped pattern says “Custom Agents handle what they’re good at, and when the work exceeds their reach, another tool takes over with a clean handoff.”


    The decision tree, when you’re not sure

    For a specific piece of work, run these questions in order. Stop at the first “yes.”

    Does this task need a specific human voice, or could it be written by any competent person? If it needs your voice, reach for Claude. If it doesn’t, move on.

    Does this task require reasoning across contested or ambiguous information? If yes, Claude. If no, move on.

    Does this task need real-time web context, live external data, or information not already in a known database? If yes, Claude. If no, move on.

    Does this task involve code, system architecture, or technical depth? If yes, Claude Code. If no, move on.

    Does this task have high-stakes irreversible consequences? If yes, wrap it in a human-approval gate — either run it through Claude where the human is in the loop, or use Custom Agents with “Always Ask” behavior.

    Does this task happen repeatedly on a schedule or in response to workspace events? If yes, Custom Agent. This is the sweet spot.

    Is the output a Notion page, database row, or something that stays in the workspace? If yes, Custom Agent is usually the right call.

    Is the task bounded enough that it could be described in a couple of clear sentences? If yes, Custom Agent. If it’s sprawling, it’s probably too big for an agent.

    If you’re through the tree and still not sure, default to Claude. Claude is more expensive in money and cheaper in hidden cost than a Custom Agent running the wrong job.


    The failure modes I’ve seen

    Specific patterns that go wrong, in my observation:

    The “agent for everything” operator. Someone who just got access to Custom Agents and is building agents for tasks that don’t need agents. The agents mostly work. The ones that mostly work waste credits on tasks a template or a simple automation would handle. The ones that partially work produce quiet low-grade mistakes that accumulate.

    The “Claude for everything” operator. The inverse. Someone who got comfortable with Claude and hasn’t made the leap to letting agents handle the rhythmic work. They’re paying the context-loss tax every morning, doing the triage manually, writing every brief from scratch. Claude is too expensive a tool — in attention, if not dollars — to run routine work through.

    The operator who built one giant agent. Custom Agents are meant to be narrow. Someone violates the “one agent, one job” principle by building an agent that does inbox triage and database updates and weekly reports and client communications. The agent becomes hard to debug, expensive to run, and unreliable across its many hats. The fix is almost always breaking it into three or four single-purpose agents.

    The operator who didn’t build review gates. An agent sending emails without human approval. An agent deleting rows based on inferred criteria. An agent updating client-facing pages from an unchecked data source. The cost of the first real mistake exceeds the cost of the review gate that would have prevented it, every time.

    The operator who never checked credit consumption. Custom Agents consume credits based on model, steps, and context size. An operator who built ten agents and never looked at the dashboard ends up surprised when the monthly bill is much higher than expected. The fix is easy — Notion ships a credits dashboard — but it has to actually get checked.


    The timing honest note

    A piece of this article that ages. These comparisons are true in April 2026. Custom Agents are new enough that the feature set will expand significantly over the next year. Claude is evolving rapidly. The specific gaps I’ve named may close; new gaps may open in different directions.

    What won’t change is the pattern: some work wants a specialized tool, some work wants a general-purpose one. Some work is rhythmic, some is judgment-driven. Some work lives inside a workspace, some crosses systems. The vocabulary for when to use which tool will evolve; the underlying truth that different shapes of work deserve different tools will not.

    If you’re reading this in 2027 and Custom Agents have shipped fifteen new capabilities, the specific “can’t do” list will be shorter. The decision tree at the top of this article will still work. That’s the part worth holding onto.


    What I’m not saying

    A few clarifications because I want to be clear about what this article is and isn’t.

    I’m not saying Custom Agents are bad. They’re genuinely good at what they’re good at. They’re saving me hours per week on work I used to do manually.

    I’m not saying Claude is strictly better. Claude is more capable at a broader set of tasks, but it also costs more, requires active operator engagement, and can’t sit in the background running overnight rhythms the way Custom Agents can.

    I’m not saying there’s one right answer for every operator. Different operators with different businesses and different workflows will land on different splits. The decision tree helps, but it’s a starting point, not a conclusion.

    I’m not saying this is permanent. Tool landscapes change fast. Six months from now there may be categories where Custom Agents beat Claude that don’t exist today, and vice versa. What matters is developing the habit of asking “which tool is this work actually shaped for?” instead of defaulting to whichever one you learned first.


    The one thing I’d want you to walk away with

    If you read nothing else in this article, this is the sentence I’d want in your head:

    Rhythmic operational work wants an agent; judgment-heavy work wants a conversation.

    That distinction — rhythm versus judgment — cuts through almost every architecture question you’ll have when deciding what to route where. It’s not the only dimension that matters, but it’s the one that settles the most decisions correctly.

    Work that happens on a schedule or in response to an event, with bounded inputs and clear outputs? That’s rhythm. Build a Custom Agent.

    Work that requires thinking through tradeoffs, integrating disparate information, or producing output with specific voice and judgment? That’s a conversation. Engage Claude.

    Get that right for most of your workflows and the rest of the architecture tends to sort itself out.


    FAQ

    Can’t Custom Agents do everything Claude can do, just inside Notion? No. Custom Agents are optimized for bounded, rhythmic, workspace-shaped tasks. They can technically attempt work that requires deep reasoning, specific voice, or live external context, but the results degrade in predictable ways. Claude — in a conversation or in Claude Code — handles those categories better.

    Should I just use Claude for everything then? No. Rhythmic operational work — morning briefs, triage, weekly reports, database enrichment — is genuinely better in Custom Agents than in Claude, because the “autonomous teammate running while you sleep” property matters. The right answer is running both, in their respective sweet spots.

    What’s the cost comparison? Starting May 4, 2026, Custom Agents cost roughly $10 per 1,000 Notion Credits. Internal Notion data suggests agents run approximately 45–90 times per 1,000 credits depending on task complexity. Claude’s subscription pricing is flat. For high-frequency simple tasks, Custom Agents are usually cheaper. For heavy reasoning work done many times per day, running Claude directly can be more cost-efficient.

    What about Notion Agent (the personal one) versus Claude? Notion Agent is Notion’s on-demand personal AI — you prompt it, it responds. It’s fine for in-workspace tasks where you need AI help with content you’re already looking at. For deeper reasoning, complex drafting, or cross-tool work, Claude is more capable. Notion Agent is a good ambient utility; Claude is a general-purpose intelligence layer.

    Which should I learn first if I’m new to both? Claude. Learn to think with an AI as a thinking partner before you try to build autonomous agents. Once you understand what AI can and can’t do in a conversation, the design decisions for Custom Agents become much clearer. Jumping to Custom Agents without the Claude foundation is how operators end up with agents that don’t work as expected.

    Can Custom Agents use Claude models? Yes. Custom Agents let you pick the AI model they run on. Claude Sonnet and Claude Opus are both available, along with GPT-5 and various other models. This means the underlying intelligence of a Custom Agent can be Claude — you’re choosing between Claude-as-conversation (claude.ai, Claude Desktop, Claude Code) and Claude-as-embedded-agent (Custom Agent running Claude). Different interfaces, same underlying model in that case.

    What if I want Claude to work autonomously on a schedule like Custom Agents do? Possible, but requires more work. Claude Code can be scripted; you can run it on a cron job; you can set up headless workflows. But the “out of the box autonomous teammate” experience is Notion’s current strength, not Anthropic’s. If you want autonomous-background-work without building your own infrastructure, Custom Agents are easier.

    How do I decide for my specific situation? Run the decision tree in the article. If you’re still unsure, default to Claude — it’s the more general-purpose tool, and the cost of using the wrong tool for judgment-heavy work is higher than the cost of using the wrong tool for rhythmic work. You can always migrate a recurring workflow to a Custom Agent once you understand the shape.


    Closing note

    The honest comparison isn’t one tool versus the other. It’s understanding that different shapes of work want different shapes of tool, and that most operators lose more time to the mismatch than to any individual tool’s limitations.

    Custom Agents are good at being Custom Agents. Claude is good at being Claude. Neither is good at being the other. Use both, in the places each belongs, with clean handoffs between them, and the stack hums.

    Skip the vendor narratives. Read your own workflows. Route each piece to the tool it’s actually shaped for. That’s the whole game.


    Sources and further reading

    Related Tygart Media pieces:

  • How to Wire Claude Into Your Notion Workspace (Without Giving It the Keys to Everything)

    How to Wire Claude Into Your Notion Workspace (Without Giving It the Keys to Everything)

    The step most tutorials skip is the one that actually matters.

    Every guide to connecting Claude to Notion walks you through the same mechanical sequence — OAuth flow, authentication, running claude mcp add, and done. It works. The connection lights up, Claude can read your pages, write to your databases, and suddenly your AI has the run of your workspace. The tutorials stop there and congratulate you.

    Here’s the part they don’t mention: according to Notion’s own documentation, MCP tools act with your full Notion permissions — they can access everything you can access. Not the pages you meant to share. Everything. Every client folder. Every private note. Every credential you ever pasted into a page. Every weird thing you wrote about a coworker in 2022 and forgot was there.

    In most setups the blast radius is enormous, the visibility is low, and the decision to lock it down happens after something goes wrong instead of before.

    This is the guide that takes the extra hour. Wiring Claude into your Notion workspace is straightforward. Wiring Claude into your Notion workspace without giving it the keys to everything takes a few additional decisions, a handful of specific configuration choices, and a mental model for what should and shouldn’t flow across the connection. That’s the hour worth spending.

    I run this setup across a real production workspace with dozens of active properties, real client work, and data I genuinely don’t want an AI to have unbounded access to. The pattern below is what works. It is also honest about what doesn’t.


    Why Notion + Claude is worth doing carefully

    Before the mechanics, it’s worth being clear about what you get when you wire this up correctly.

    Claude with access to Notion is not Claude with a better search function. It is a Claude that can read the state of your business — briefs, decisions, project status, open loops — and reason across them to help you run the operation. It can draft follow-ups to conversations it finds in your notes. It can pull together summaries across projects. It can take a decision you’re weighing, find every related piece of context in the workspace, and give you a grounded opinion instead of a generic one.

    That’s the version most operator-grade users want. And it’s only valuable if the trust boundary is drawn correctly. A Claude that has access to your relevant context is a superpower. A Claude that has access to everything you’ve ever written is a liability waiting to catch up with you.

    The whole article is about drawing that boundary on purpose.


    The two connection options (and which one you actually want)

    There are two ways to connect Claude to Notion in April 2026, and the right one depends on what you’re doing.

    Option 1: Remote MCP (Notion’s hosted server). You connect Claude — whether that’s Claude Desktop, Claude Code, or Claude.ai — to Notion’s hosted MCP endpoint at https://mcp.notion.com/mcp. You authenticate through OAuth, which opens a browser window, you approve the connection, and it’s live. Claude can now read from and write to your workspace based on your access and permissions.

    This is the officially-supported path. Notion’s own documentation explicitly calls remote MCP the preferred option, and the older open-source local server package is being deprecated in favor of it. For most operators, this is the right answer.

    Option 2: Local MCP (the legacy / open-source package). You install @notionhq/notion-mcp-server locally via npm, create an internal Notion integration to get an API token, and configure Claude to talk to the local server with your token. You then have to manually share each Notion page with the integration one by one — the integration only sees pages you explicitly grant access to.

    This path is more work and is being phased out. But there’s one genuine reason to still use it: the local path uses a token and the remote path uses OAuth, which means the local path works for headless automation where a human isn’t around to click OAuth buttons. Notion MCP requires user-based OAuth authentication and does not support bearer token authentication. This means a user must complete the OAuth flow to authorize access, which may not be suitable for fully automated workflows.

    For 95% of setups, remote MCP is the right answer. For the 5% running true headless agents, the local package is still the pragmatic choice even though it’s on its way out.

    The rest of this guide assumes remote MCP. I’ll flag the places the advice differs for local.


    The quiet part Notion tells you out loud

    Before we get to the setup, one more thing you need to internalize because it shapes every decision below.

    From Notion’s own help center: MCP tools act with your full Notion permissions — they can access everything you can access.

    Read that sentence twice.

    If you are a workspace member with access to 140 pages across 12 databases, your Claude connection can access 140 pages across 12 databases. Not the 15 you’re working on today. All of them. OAuth doesn’t scope you down to “this project.” It says yes or no to “can Claude see your workspace.”

    This is fine when your workspace is already organized the way you’d want an AI to see it. It is catastrophic when it isn’t, because most workspaces have accumulated years of drift, private notes, credential-adjacent content, sensitive client data, and old experiments that nobody bothered to clean up.

    So before you connect anything, you do the workspace audit. Not because Notion says so. Because your future self will thank you.


    The pre-connection audit (the step tutorials skip)

    Fifteen minutes with the workspace, before you click the OAuth button. Here’s the checklist I run through:

    Find anything that looks like a credential. Search your workspace for the words: password, API key, token, secret, bearer, private key, credentials. Read the results. Move anything sensitive to a credential manager (1Password, Bitwarden, a password-protected vault — not Notion). Delete the Notion copies.

    Find anything you wouldn’t want an AI to read. Search for: divorce, legal, lawsuit, personal, venting, complaint, therapist. Yes, really. People put things in Notion they’ve forgotten are in Notion. An AI that has access to everything you can access will find those things and occasionally surface them in responses. This is embarrassing at best and career-ending at worst.

    Look at your database of clients or contacts. Is there anything in there that shouldn’t travel through an AI provider’s servers? Notion processes MCP requests through Notion’s infrastructure, not yours. Sensitive legal matters, medical information, financial details about third parties — these may deserve a workspace or sub-page that stays outside of what Claude is allowed to see.

    Identify what Claude actually needs. Make a short list: your active projects, your working databases, your briefs page, your daily/weekly notes. This is what you actually want Claude to have context on. The rest is noise.

    Decide your posture. Two options here. You can run Claude against your main workspace and accept the blast radius, or you can create a separate workspace (or a teamspace) that contains only the pages and databases you want Claude to see, and connect Claude to that one. The second option is more work upfront. It is also the only version that actually draws the boundary.

    I run the second option. My Claude-facing workspace is genuinely a subset of what I work with, and the rest of my Notion is on a different membership. It took an hour to set up. It was worth it.


    Connecting remote MCP to Claude Desktop

    Now the mechanics. Starting with Claude Desktop because it’s the simplest.

    Claude Desktop gets Notion MCP through Settings → Connectors (not the older claude_desktop_config.json file, which is being phased out for remote MCP). This is available on Pro, Max, Team, and Enterprise plans.

    Open Claude Desktop. Settings → Connectors. Find Notion (or add a custom MCP server with the URL https://mcp.notion.com/mcp). Click Connect. A browser window opens, Notion asks you to authenticate, you approve. Done.

    The connection now lives in your Claude Desktop. You can start a new conversation and ask Claude to read a specific page, summarize a database, or draft something based on workspace content, and it will.

    One hygiene note: Claude Desktop connections are per-account. If you have multiple Claude accounts (say, a personal Pro and a work Max), each one needs its own connection to Notion. The good news is you can point each one at a different Notion workspace — personal Claude at personal Notion, work Claude at work Notion. This is the operator pattern I recommend for anyone running more than one business context through Claude.


    Connecting remote MCP to Claude Code

    Claude Code is the path most operators actually run at depth, because it’s the version of Claude that lives in your terminal and can compose MCP calls into real workflows.

    The command is one line:

    claude mcp add --transport http notion https://mcp.notion.com/mcp

    Need this set up for your team?

    I set up Claude integrations, GCP infrastructure, and AI workflows for businesses. If you’d rather ship than configure — will@tygartmedia.com

    Then authenticate by running /mcp inside Claude Code and following the OAuth flow. Browser opens, Notion asks you to authorize, you approve, and the connection is live.

    A few options worth knowing about at setup time:

    Scope. The --scope flag controls who gets access to the MCP server on your machine. Three options: local (default, just you in the current project), project (shared with your team via a .mcp.json file), and user (available to you across all projects). For Notion, user scope is usually right — you’ll want Claude to reach Notion from any project you’re working in, not just the current one.

    The richer integration. Notion also ships a plugin for Claude Code that bundles the MCP server along with pre-built Skills and slash commands for common Notion workflows. If you’re doing this seriously, install the plugin. It adds commands like generating briefs from templates and opening pages by name, and saves you from writing your own.

    Checking what’s connected. Inside Claude Code, /mcp lists every MCP server you’ve configured. /context tells you how many tokens each one is consuming in your current session. For Notion specifically, this is useful because MCP servers have non-zero context cost even when you’re not actively using them — every tool exposed by the server sits in Claude’s context, eating tokens. Running /context occasionally is how you notice when an MCP connection is heavier than you expected.


    The permissions pattern that actually protects you

    Now we’re past the mechanics and into the hygiene layer — the part that most guides don’t cover.

    Once Claude is connected to your Notion workspace, there are three specific configuration moves worth making. None of them are hard. All of them pay rent.

    1. Scope the workspace, don’t scope the connection

    The OAuth connection doesn’t let you say “Claude can see these pages but not those.” It lets you say “Claude can see this workspace.” So the place to draw the boundary is at the workspace level, not at the connection level.

    If you have sensitive content in your main workspace, move it. Create a separate workspace for Claude-facing content and keep the sensitive stuff out. Or use Notion’s teamspace feature (Business and Enterprise) to isolate access at the teamspace level.

    This feels like over-engineering until the first time Claude surfaces something in a response that you had forgotten was in your workspace. After that, it doesn’t feel like over-engineering.

    2. For Enterprise: turn on MCP Governance

    If you’re on the Enterprise plan, there’s an admin-level control worth enabling even if you trust your team. From Notion’s docs: with MCP Governance, Enterprise admins can approve specific AI tools and MCP clients that can connect to Notion MCP — for example Cursor, Claude, or ChatGPT. The approved-list pattern is opt-in: Settings → Connections → Permissions tab, set “Restrict AI tools members can connect to” to “Only from approved list.”

    Even if you only approve Claude today, the control gives you the ability to see every AI tool anyone on your team has connected, and to disconnect everything at once with the “Disconnect All Users” button if you ever need to. That’s the kind of control you want to have configured before you need it, not after.

    3. For local MCP: use a read-only integration token

    If you’re using the local path (the open-source @notionhq/notion-mcp-server), you have more granular control than the remote path gives you. Specifically: when you create the integration in Notion’s developer settings, you can set it to “Read content” only — no write access, no comment access, nothing but reads.

    A read-only integration is the right default for anything exploratory. If you want Claude to be able to write too, enable write access later when you’ve decided you trust the specific workflow. Don’t give write access by default just because the integration setup screen presents it as an option.

    This is the one place the local path is actually stronger than remote — you can shape the integration’s capabilities before you grant it access, and the integration only sees the specific pages you share with it. For high-sensitivity setups, this granularity is worth the tradeoff of running the legacy package.


    Prompt injection: the risk nobody wants to talk about

    One more thing before we leave the hygiene section. It’s the thing the industry is least comfortable being direct about.

    When Claude has access to your Notion workspace, Claude also reads whatever is in your Notion workspace. Including pages that came from outside. Including meeting notes that were imported from a transcript service. Including documents shared with you by clients. Including anything you pasted from the web.

    Every one of those is a potential vector for prompt injection — hidden instructions buried in content that, when Claude reads the content, hijack what Claude does next.

    This is not theoretical. Anthropic itself flags prompt injection risk in the MCP documentation: be especially careful when using MCP servers that could fetch untrusted content, as these can expose you to prompt injection risk. Notion has shipped detection for hidden instructions in uploaded files and flags suspicious links for user approval, but the attack surface is larger than any detection system can fully cover.

    The practical operator response is three-part:

    Don’t give Claude access to content you didn’t write, without reading it first. If a client sends you a document and you paste it into Notion and Claude has access to that database, you have effectively given Claude the ability to be instructed by your client’s document. This might be fine. It might be a problem. Read the document before it goes into a Claude-accessible location.

    Be suspicious of workflows that chain untrusted content into actions. A workflow where Claude reads a web-scraped summary and then uses that summary to decide which database row to update is a prompt injection target. If the scraped content can shape Claude’s action, the scraped content can be weaponized.

    Use write protections for anything consequential. Anything where the cost of Claude doing the wrong thing is real — sending an email, deleting a record, updating a client-facing page — belongs behind a human-approval gate. Claude Code supports “Always Ask” behavior per-tool; use it for writes.

    This sounds paranoid. It’s not paranoid. It’s the appropriate level of caution for a class of attack that is genuinely live and that the industry has not yet figured out how to fully defend against.


    What this actually enables (the payoff section)

    Once you’ve done the setup and the hygiene work, here’s what you now have.

    You can sit down at Claude and ask it questions that require real workspace context. What’s the status of the three projects I touched last week? Pull together everything we’ve decided about pricing across the client work this quarter. Draft a response to this incoming email using context from our ongoing conversation with this client. Claude reads the relevant pages, synthesizes across them, and responds with actual grounding — not a generic answer shaped by whatever prompt you happen to type.

    You can run Claude Code against your workspace for development-adjacent operations. Generate a technical spec from our product page notes. Create release notes from the changelog and feature pages. Find every page where we’ve documented this API endpoint and reconcile the inconsistencies.

    You can set up workflows that flow across tools. Claude reads from Notion, acts on another system via a different MCP server, writes results back to Notion. This is the agentic pattern the industry keeps talking about — and with the right permissions hygiene, it actually becomes usable instead of scary.

    None of this is theoretical. I use this pattern every working day. The value is real. The hygiene discipline is what keeps the value from turning into a liability.


    When this setup goes wrong (troubleshooting honestly)

    Five failure modes I’ve seen, in order of frequency.

    Claude doesn’t see the page you asked about. For remote MCP, this almost always means the page is in a workspace you’re not a member of, or in a teamspace you don’t have access to. For local MCP, it means the integration hasn’t been granted access to that specific page — you have to go to the page, click the three-dot menu, and add the integration manually.

    OAuth flow doesn’t complete. Usually a browser issue — popup blocker, wrong Notion account signed in, session expired. Clear auth, try again. If Claude Desktop, disconnect the connector entirely and re-add.

    The connection succeeds but Claude doesn’t seem to be using it. Run /mcp in Claude Code to verify the server is listed and connected. If it’s there and Claude still isn’t invoking it, the issue is usually in how you’re asking — Claude won’t reach for MCP tools just because they exist; you need to phrase the request in a way that makes it obvious the tool is relevant. Find the page about X in Notion works better than tell me about X.

    MCP server crashes or returns errors. For remote, this is rare and usually resolves itself — Notion’s hosted server has the standard cloud-reliability profile. For local, check your Node version (the server requires Node 18 or later), your config file syntax (JSON is unforgiving about trailing commas), and your token format.

    Context token budget goes through the roof. Every MCP server in your connected list contributes tools to Claude’s context on every request. If you have five MCP servers configured, that’s five sets of tool descriptions being loaded into every conversation. Run /context in Claude Code to see the cost. If it’s painful, disconnect the servers you’re not actively using.


    The mental model that keeps you sane

    Here’s the mental model I use for the whole setup. It’s short.

    Claude plus Notion is like giving a new, very capable employee access to your business. You wouldn’t hand a new hire every password, every file, every client record, every private note on day one. You’d give them access to the specific things they need to do the job, watch how they use that access, and expand trust over time based on track record.

    The MCP connection works exactly that way. You decide what Claude gets to see. You decide what Claude gets to write. You watch how it uses that access. You expand the boundary as trust earns itself.

    The operators who get hurt by this kind of setup are the ones who skip the first step and give Claude everything on day one. The operators who get the real value out of it are the ones who treat the connection the way they’d treat any other employee — with deliberate scope, real oversight, and the willingness to revoke access if something goes wrong.

    That’s the discipline. That’s the whole thing.


    FAQ

    Do I need to install anything to connect Claude to Notion? For remote MCP (the recommended path), no installation is required — you connect via OAuth through Claude Desktop’s Settings → Connectors or Claude Code’s claude mcp add command. For local MCP (legacy), you install @notionhq/notion-mcp-server via npm and create an internal Notion integration.

    What’s the URL for Notion’s remote MCP server? https://mcp.notion.com/mcp. Use HTTP transport (not the deprecated SSE transport).

    Can Claude see my entire Notion workspace by default? Yes. MCP tools act with your full Notion permissions — they can access everything you can access. The boundary is set by your workspace membership and teamspace access, not by the MCP connection itself. If you need finer-grained control, isolate Claude-facing content into a separate workspace or teamspace.

    Can I use Notion MCP with automated, headless agents? Remote Notion MCP requires OAuth authentication and doesn’t support bearer tokens, which makes it unsuitable for fully automated or headless workflows. For those cases, the legacy @notionhq/notion-mcp-server with an API token still works, but it’s being phased out.

    What plans support Notion MCP? Notion MCP works with all plans for connecting AI tools via MCP. Enterprise plans get admin-level MCP Governance controls (approved AI tool list, disconnect-all). Claude Desktop MCP connectors are available on Pro, Max, Team, and Enterprise plans.

    Can my company’s admins control which AI tools connect to our Notion workspace? Yes, on the Enterprise plan. Admins can restrict AI tool connections to an approved list through Settings → Connections → Permissions tab. Only admin-approved tools can connect.

    Is Notion MCP secure for confidential business data? The MCP protocol itself respects Notion’s permissions — it can’t bypass what you have access to. However, content flowing through MCP is processed by the AI tool you’ve connected (Claude, ChatGPT, etc.), which has its own data handling policies. For highly sensitive content, the right move is to isolate it in a workspace that Claude doesn’t have access to, rather than relying on the protocol alone to contain it.

    What about prompt injection attacks through Notion content? Real risk. Anthropic explicitly flags it in their MCP documentation. Notion has shipped detection for hidden instructions and flags suspicious links, but no detection system catches everything. The operator response: don’t give Claude access to content you didn’t write without reviewing it first, be suspicious of workflows where untrusted content shapes Claude’s actions, and put human-approval gates on anything consequential.

    What’s the difference between Notion’s built-in AI and connecting Claude via MCP? Notion’s built-in AI (Notion Agent and Custom Agents) runs inside Notion and uses Notion’s integration with frontier models. Connecting Claude via MCP brings Claude — your chosen model, in your chosen interface, with its full capability — to your workspace as an external client. The built-in option is simpler; the MCP option is more powerful and composable across other tools.


    Closing note

    Most tutorials treat the connection as the goal. The connection is the easy part. The hygiene is the part that matters.

    If you wire Claude into your Notion workspace thoughtlessly, you’ve given a capable AI access to every corner of your operational history, and you’ll be surprised how much of what’s in there you’d forgotten. If you wire it in deliberately — with a scoped workspace, with the permissions you’ve thought about, with the posture of giving a new employee measured access — you’ve built something that pays rent every day without ever becoming the liability it could have been.

    One hour of setup. One hour of cleanup. And then one of the most useful AI configurations currently possible in April 2026.

    The intersection of Notion and Claude is where the operator work actually happens now. Worth setting up right.


    Sources and further reading