Category: AI Strategy

  • Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Claude Opus vs Sonnet vs Haiku: Model Comparison Guide (2026)

    Model Accuracy Note — Updated May 2026

    Current flagship: Claude Opus 4.7 (claude-opus-4-7). Current models: Opus 4.7 · Sonnet 4.6 · Haiku 4.5. Claude Opus 4.6 referenced in this article has been superseded. See current model tracker →

    Anthropic’s Claude model lineup in 2026 breaks down into three distinct tiers: Opus 4.7 for maximum capability, Sonnet 4.6 for the best balance of performance and cost, and Haiku 4.5 for speed and high-volume work. Picking the wrong model costs money or performance — sometimes both. This guide covers every meaningful difference so you can make the right call for your use case.

    Quick answer: Sonnet 4.6 handles 80–90% of tasks at 40% less cost than Opus. Use Opus 4.7 when you need maximum reasoning depth, the largest output window, or agentic coding at frontier quality. Use Haiku 4.5 when speed and cost are the priority and the task is straightforward.

    The Current Claude Model Lineup (April 2026)

    As of April 2026, Anthropic’s three recommended models are Claude Opus 4.7, Claude Sonnet 4.6, and Claude Haiku 4.5. All three support text and image input, multilingual output, and vision processing. They differ significantly in pricing, context window, output limits, and capability.

    Feature Opus 4.7 Sonnet 4.6 Haiku 4.5
    Input price $5 / MTok $3 / MTok $1 / MTok
    Output price $25 / MTok $15 / MTok $5 / MTok
    Context window 1M tokens 1M tokens 200K tokens
    Max output 128K tokens 64K tokens 64K tokens
    Extended thinking No Yes Yes
    Adaptive thinking Yes Yes No
    Latency Moderate Fast Fastest
    Knowledge cutoff Jan 2026 Aug 2025 Feb 2025

    Pricing is per million tokens (MTok) via the Claude API. Source: Anthropic Models Overview, April 2026.

    Claude Opus 4.7: When to Use It

    Opus 4.7 is Anthropic’s most capable generally available model as of April 2026. Anthropic describes it as a step-change improvement in agentic coding over Opus 4.6, with a new tokenizer that contributes to improved performance on a range of tasks. Note that this new tokenizer may use up to 35% more tokens for the same text compared to previous models — a cost consideration worth factoring in for high-volume workflows.

    Key differentiators for Opus 4.7 over the other two models:

    • 128K max output tokens — double Sonnet and Haiku’s 64K cap. This matters for generating long-form code, detailed reports, or complete document drafts in a single call.
    • 1M token context window — same as Sonnet 4.6, meaning Opus can process entire codebases or book-length documents in a single session.
    • Adaptive thinking — Opus 4.7 and Sonnet 4.6 both support adaptive thinking, which lets the model adjust reasoning depth based on task complexity.
    • Most recent knowledge cutoff — January 2026, versus August 2025 for Sonnet and February 2025 for Haiku.

    Opus does not support extended thinking — that capability lives on Sonnet 4.6 and Haiku 4.5. Extended thinking lets the model reason step-by-step before generating output, which is particularly useful for complex math, science, and multi-step logic problems.

    Use Opus 4.7 for: complex architecture decisions, large codebase analysis, multi-agent orchestration tasks, outputs that require more than 64K tokens, tasks demanding the latest possible knowledge, and any work where you need the absolute frontier of Anthropic’s reasoning capability.

    Skip Opus 4.7 for: routine content generation, customer support pipelines, high-volume classification or extraction, real-time applications requiring low latency, or any task where Sonnet scores within your acceptable quality threshold.

    Claude Sonnet 4.6: The Workhorse

    Sonnet 4.6 is the model Anthropic recommends as the best combination of speed and intelligence. Released in February 2026, it delivers a 1M token context window at $3 input / $15 output per million tokens — the same context window as Opus at 40% lower cost.

    Sonnet 4.6 also uniquely offers extended thinking, which Opus 4.7 does not. When extended thinking is enabled, Sonnet can perform additional internal reasoning before generating its response — useful for reasoning-heavy tasks like complex debugging, multi-step research, and technical problem-solving where chain-of-thought depth matters.

    For developers and teams using Claude Code, Sonnet 4.6 is the standard daily driver. It handles tool calling, agentic workflows, and multi-file code reasoning reliably, at a price point that makes heavy daily use economically viable.

    Use Sonnet 4.6 for: most production workloads, Claude Code sessions, long-document analysis, content generation, coding tasks, research synthesis, customer-facing applications, and any workflow requiring the 1M context window where Opus’s premium isn’t justified.

    Skip Sonnet 4.6 for: high-volume pipelines where Haiku’s lower cost is acceptable, simple classification or extraction tasks, or real-time applications where Haiku’s faster latency is required.

    Claude Haiku 4.5: Speed and Volume

    Haiku 4.5 is the fastest model in the Claude family and the most cost-efficient at $1 input / $5 output per million tokens. It has a 200K token context window — smaller than Opus and Sonnet’s 1M, but still substantial for most single-task work. It supports extended thinking but not adaptive thinking.

    The 200K context limit is the most important practical constraint. Most single-document, single-task workflows fit within 200K. Multi-file codebases, long books, or extended conversation histories that push past that threshold need Sonnet or Opus.

    Haiku 4.5 has the oldest knowledge cutoff of the three: February 2025. For tasks requiring awareness of events or developments from mid-2025 onward, Haiku won’t have that context baked in.

    Use Haiku 4.5 for: content moderation, classification pipelines, entity extraction, customer support triage, real-time chat interfaces, simple Q&A, high-volume API workflows where cost and speed dominate, and any task where quality requirements are modest.

    Skip Haiku 4.5 for: complex reasoning, large codebase analysis, tasks requiring recent knowledge (post-February 2025), multi-step agent workflows, or any output requiring more than 200K tokens of input context.

    Pricing: What the Numbers Actually Mean in Practice

    All three models price output tokens at 5x the input rate — a ratio that holds across the entire Claude lineup. This means verbose, long-form outputs cost significantly more than short, targeted responses. Minimizing generated output length is the highest-leverage cost optimization available before you touch model routing or caching.

    To put the pricing in concrete terms: generating one million output tokens (roughly 750,000 words of generated text) costs $25 on Opus, $15 on Sonnet, and $5 on Haiku. For input-heavy workloads like document analysis where you’re feeding in large amounts of text but getting shorter responses, the cost gap narrows.

    Three additional pricing levers apply across all models:

    • Prompt caching: Cuts cache-read input costs by up to 90% for repeated system prompts or documents. If your application reuses a large system prompt across many requests, caching is the single highest-impact cost reduction available.
    • Batch API: Provides a 50% discount for non-time-sensitive workloads processed asynchronously. Combine with prompt caching for up to 95% savings on qualifying workflows.
    • Model routing: Running a mix of Haiku for simple tasks, Sonnet for production workloads, and Opus for complex reasoning — rather than using one model for everything — can reduce total API costs by 60–70% without meaningful quality loss on the tasks that don’t require a flagship model.

    Context Windows: 1M Tokens vs. 200K

    Opus 4.7 and Sonnet 4.6 both offer a 1M token context window at standard pricing — no premium surcharge for extended context. For reference, 1 million tokens is roughly 750,000 words, enough to hold a large codebase, a full academic textbook, or months of business communications in a single conversation.

    Haiku 4.5 has a 200K token context window. That’s still roughly 150,000 words — sufficient for most single-document tasks, but it creates a hard ceiling for anything requiring multi-file code review, book-length document analysis, or lengthy conversation histories.

    If your workflow consistently requires more than 200K tokens of input, Sonnet 4.6 is the cost-efficient choice. Opus 4.7 is the right call only when the input load requires the additional reasoning capability Opus provides, not just the context window size — because Sonnet gets you the same 1M window at 40% lower cost.

    Extended Thinking vs. Adaptive Thinking

    These are two distinct features that appear together in the comparison table but serve different purposes.

    Extended thinking (available on Sonnet 4.6 and Haiku 4.5, not Opus 4.7) lets Claude perform additional internal reasoning before generating its response. When enabled, the model produces a “thinking” content block that exposes its reasoning process — step-by-step problem decomposition before the final answer. Extended thinking tokens are billed as standard output tokens at the model’s output rate. A minimum thinking budget of 1,024 tokens is required when enabling this feature.

    Adaptive thinking (available on Opus 4.7 and Sonnet 4.6, not Haiku 4.5) adjusts reasoning depth dynamically based on task complexity — the model allocates more reasoning for harder problems and less for simpler ones, without requiring explicit configuration.

    The practical implication: if you need transparent, controllable step-by-step reasoning that you can inspect and use in your application, Sonnet 4.6’s extended thinking is often the right tool — and at lower cost than Opus.

    Which Claude Model Should You Choose?

    The right framework for model selection in 2026 is to start with Sonnet 4.6 as your default and escalate selectively. Most production workloads — coding, writing, analysis, research, customer-facing applications — are well-served by Sonnet. Opus 4.7 earns its premium in specific scenarios: tasks requiring more than 64K output tokens, agent workflows demanding maximum reasoning depth, or applications where Anthropic’s latest knowledge cutoff is a meaningful factor.

    Haiku 4.5 belongs in any pipeline where you’ve identified tasks that don’t require Sonnet’s capability. High-volume routing, triage, classification, and real-time response scenarios are Haiku’s natural territory. Building a 70/20/10 routing split across Haiku, Sonnet, and Opus — rather than using a single model for everything — is the standard approach for cost-efficient production deployments.

    Frequently Asked Questions

    What is the difference between Claude Opus, Sonnet, and Haiku?

    Opus is Anthropic’s most capable model, optimized for complex reasoning, large outputs, and agentic tasks. Sonnet offers a balance of capability and cost, handling most production workloads at lower price. Haiku is the fastest and cheapest option, suited for high-volume, lower-complexity tasks. All three share the same core Claude architecture and safety training.

    Is Claude Opus worth the extra cost over Sonnet?

    For most tasks, no. Sonnet 4.6 handles the majority of coding, writing, and analysis work at 40% lower cost. Opus 4.7 is worth the premium when you need outputs longer than 64K tokens, maximum agentic coding capability, or the most recent knowledge cutoff (January 2026 vs. Sonnet’s August 2025).

    Which Claude model is best for coding?

    Sonnet 4.6 is the standard recommendation for most coding work, including Claude Code sessions. Opus 4.7 is preferred for large codebase analysis, complex architecture decisions, or multi-agent coding workflows where maximum reasoning depth is required. Haiku 4.5 can handle simple code edits and explanations at much lower cost.

    What is the Claude context window?

    Claude Opus 4.7 and Sonnet 4.6 both have a 1 million token context window — roughly 750,000 words of combined input and conversation history. Claude Haiku 4.5 has a 200,000 token context window. Context window size determines how much information Claude can hold and reference in a single conversation.

    Does Claude Opus support extended thinking?

    No. Extended thinking is available on Claude Sonnet 4.6 and Claude Haiku 4.5, but not on Claude Opus 4.7. Opus 4.7 supports adaptive thinking instead, which dynamically adjusts reasoning depth based on task complexity.

    What is the cheapest Claude model?

    Claude Haiku 4.5 is the least expensive model at $1 per million input tokens and $5 per million output tokens. It is also the fastest Claude model, making it well-suited for high-volume, latency-sensitive applications.

    Can I use Claude through Amazon Bedrock or Google Vertex AI?

    Yes. All three current Claude models — Opus 4.7, Sonnet 4.6, and Haiku 4.5 — are available through Amazon Bedrock and Google Vertex AI in addition to the direct Anthropic API. Bedrock and Vertex AI offer regional and global endpoint options. Pricing on third-party platforms may vary from direct Anthropic API rates.

  • How to Get Hired Without Applying: The 30-Minute Daily Job-Seeking Protocol

    How to Get Hired Without Applying: The 30-Minute Daily Job-Seeking Protocol

    The short version: If you want a job in a flooded market, stop trying to be employable in general. Pick one specific corner of your industry. Spend 30 minutes in the morning learning it. Spend the day forgetting most of what you read. Spend 30 minutes at night posting about whatever survived. The forgetting is the filter. The publishing is the proof. Six months in, you are not looking for a job. The job is looking for you.

    Most career advice is built around a quiet lie: that the way to stand out is to be a little better at everything everyone else is also a little better at. Sharpen your resume. Add a certification. Take another course. Write another cover letter. Put it all on LinkedIn and hope the algorithm notices.

    It does not work. It cannot work. The market is not short on generalists. It is starving for specialists, especially specialists who have visibly done the thing in public.

    What follows is a job-seeking strategy that takes about an hour a day, requires no extra money, and exploits two pieces of cognitive science most career coaches do not mention: spaced repetition and spaced retrieval. The whole point is to use forgetting as a feature, not a bug — and to publish the part that survives.

    The four-step protocol

    1. Pick three things from your industry that are the most valuable. Not the most popular. Not the most discussed. The three problems that, when someone solves them, money moves.
    2. Pick one of the three you actually want to become an expert on. The one you would willingly read about on a Sunday with no one watching.
    3. Spend 30 minutes in the morning researching it. Read primary sources. Take rough notes. Do not try to remember everything. You will not.
    4. Spend 30 minutes in the evening posting about it. Whatever you can still articulate without notes is the thing worth publishing. The rest was noise.

    That is the entire system. It is shorter than most morning routines. It will outperform almost any other career-building activity you can do in the same time.

    Why morning study and evening publishing actually works

    The forgetting is doing the editing

    When you study something in the morning and then go live a normal day, your brain runs a quiet triage process. Most of what you read decays. The handful of things that connect to something you already understand — or that genuinely surprised you, or that you can imagine using — survive.

    By evening, what is left in your head is not a complete summary of what you read. It is the signal of what you read. The compression happened automatically.

    This is why the evening publishing step matters. You are not trying to teach the morning’s full reading. You are publishing what survived eight hours of normal life. That is, by definition, the part most likely to be useful, memorable, and original.

    Spaced repetition is one of the most-validated learning techniques in cognitive science

    The morning-then-evening rhythm is a lightweight version of spaced repetition, the practice of revisiting information at intervals rather than cramming it in one session. A 2024 prospective cohort study published through the American Board of Family Medicine tracked thousands of practicing physicians and found spaced repetition produced significantly better long-term knowledge retention than repeated study sessions.

    A separate quasi-experimental study at Jawaharlal Nehru Medical College found students using spaced repetition scored 16.24 versus 11.89 on post-test assessments compared to traditional study — a statistically significant difference (p < 0.0001) that held across multiple disciplines.

    The mechanism is not mysterious. Each time you successfully retrieve information after a delay, the neural pathway gets reinforced. Each time you fail to retrieve it, you learn something more important: that piece was not load-bearing. You can let it go.

    When you publish in the evening what you can still remember from the morning, you are running this loop in public. You are letting your brain tell you what mattered, then giving the world the part that mattered.

    The publishing layer is what changes your career

    Studying alone makes you smarter. Publishing what you study makes you findable.

    The career-changing leverage is in the second half. A junior marketer who quietly reads about LinkedIn ads for construction companies in rural areas for six months becomes a slightly better junior marketer. A junior marketer who publishes one short post per evening for six months about the same thing becomes the person every rural construction company finds when they search “how to run LinkedIn ads for a contractor.”

    That is not the same outcome. That is a different career.

    Specificity is the multiplier

    “LinkedIn ads” is a saturated topic. Hundreds of generalists post about it daily. Each new post fights for the same shrinking attention slice.

    “LinkedIn ads for construction companies in rural markets” is almost empty. The total competing supply of content might be a dozen serious posts a year. The total demand from rural construction company owners trying to figure this out is significant. The ratio is what makes the niche valuable.

    The specific corner you pick is the entire game. The narrower it is, the faster you become the visible expert in it. The narrower it is, the easier it is for the right buyer or hiring manager to find you. The narrower it is, the less you have to compete on resume and the more you compete on demonstrated thinking.

    What gets cited by AI is not what gets the most engagement

    There is a quiet shift happening in how hiring managers and buyers find people. They no longer search Google and scroll through ten blue links. They ask ChatGPT, Gemini, Perplexity, or Google’s AI Overview “who’s good at X?” and read what the AI says.

    The thing is — AI systems do not cite content based on follower count or engagement. They cite based on relevance, specificity, and structure. A short, well-structured LinkedIn article from someone with 200 followers is regularly cited above a viral post from someone with 200,000 followers, because the smaller account wrote something specific and useful.

    This is the most underpriced opportunity in personal branding right now. You do not need an audience. You need a corner you own and a publishing rhythm you can sustain. The AI does the distribution.

    What the evening 30 minutes should actually look like

    Do not overthink the format. The post is not the product. The practice is the product. Here is a workable template:

    • One observation from the morning’s reading. Not the main point. The thing that surprised you.
    • One concrete example of how it shows up in your specific niche.
    • One short opinion on what most people get wrong about it.

    That is roughly 150 to 250 words. It takes ten minutes to write if you let yourself write badly. The other twenty minutes are for the next day’s reading list and any replies to the previous day’s post.

    You do not need to post on LinkedIn. You can post anywhere your industry actually reads. But LinkedIn rewards consistent professional output more than almost any other platform, especially for B2B niches, and AI systems are increasingly citing LinkedIn articles in answer to professional queries. So the platform pays its own freight.

    Six months from now

    If you do this for six months — and almost no one does — three things are true at once.

    First, you actually know your niche better than 95% of the people who claim to. You have read primary sources every morning for 180 mornings. You have wrestled with the material publicly. You have gotten things wrong, gotten corrected by other practitioners, and updated your understanding in front of an audience.

    Second, you have a public record of that learning. Your LinkedIn — or whatever surface you chose — is now a longitudinal proof of competence in a specific area. Anyone vetting you can see exactly how you think about the problem they need solved.

    Third, the math has flipped. You are no longer trying to find a job. You are getting messages from people who need exactly what you have spent six months publishing about. Some of those messages are job offers. Some are consulting opportunities. Some are partnerships you would not have known existed.

    The whole strategy rests on a quiet observation: most people will not do this. Not because it is hard. Because it is slow at the start, requires saying things in public before you feel qualified, and pays nothing for the first few months. Most career advice optimizes around making people feel like they are doing something. This optimizes around making the market notice you have done something.

    The compounding loop

    The longer this runs, the better it gets. Six months of daily 30-minute morning study is roughly 90 hours of focused reading in a single domain — more than most working professionals invest in any specific topic outside of formal education. Six months of daily evening posting is roughly 180 short-form pieces of public-facing thinking in your niche.

    Compare that to the alternative: another resume rewrite, another certification, another generic course. None of those produce a public footprint. None of those compound. None of them make you findable to the people who are actually trying to solve the problem you have spent six months understanding.

    An hour a day. One narrow niche. Spaced repetition doing the editing. Evening publishing doing the marketing. The forgetting is the filter. The publishing is the proof. The compounding is what changes your career.

    Frequently asked questions

    How do I pick the right niche if I have not started a career yet?

    Pick the intersection of: a problem real businesses pay money to solve, an industry you find genuinely interesting, and an angle that is not already saturated. Specific is always better than general. “B2B SaaS marketing” is too broad. “Onboarding email sequences for vertical SaaS in healthcare” is the size of niche that wins.

    What if I already have a job and want to use this to switch fields?

    The protocol is identical. Do the morning study and evening publishing in the niche you want to move into, not the one you currently work in. Six months of public output in the new field is more credible to a hiring manager in that field than ten years of unrelated experience.

    What if I do not know enough to write anything yet?

    Write what you are learning, with that framing. “I have been studying X for two weeks. Here is the most surprising thing I have found so far.” Beginner-as-narrator is one of the most engaging voices on LinkedIn. People follow learning journeys. They scroll past finished experts.

    Does this work for technical fields too?

    Especially well. Engineers, scientists, and analysts who can publish clearly about their narrow domain are vanishingly rare and disproportionately valuable. The 30-minute evening post can be a code walkthrough, a paper summary, a debugging story, or a single counterintuitive finding. The format does not matter. The consistency does.

    What if I post for a month and nothing happens?

    Expected. The first 30 to 60 days are unread. The compounding starts somewhere between day 90 and day 180 for most people. The point of the practice is the practice. The audience is a side effect of the discipline, not the goal of it.

    How is this different from a traditional content marketing strategy?

    Traditional content marketing optimizes for traffic and conversions. This optimizes for being findable in the moment a buyer or hiring manager is searching for someone who understands their specific problem. It is closer to a slow-cooking authority strategy than a fast-twitch growth strategy. The output is the same — published material — but the goal is positioning, not pageviews.

    The bottom line

    The short post that became this article said: pick three things from your industry, choose one, study it 30 minutes in the morning, post about it 30 minutes at night. That is the whole strategy.

    What that short post did not say is why it works. The morning input gives your brain something to process. The day in between lets the trivial stuff fall away. The evening output forces you to publish what survived — which is, by the cleanest possible test, the part worth publishing. Repeat for six months. Pick the right niche. Watch what happens to your inbox.

    The career advice industry sells motion. This is the opposite. This is a small, slow, compounding bet on becoming visibly excellent at one specific thing. Almost no one will do it. That is what makes it work.


  • Pay for the Compute Once: How Saving Your AI Work Saves You Money

    Pay for the Compute Once: How Saving Your AI Work Saves You Money

    The Compute-Once Principle: Every AI response costs real infrastructure — GPU time, inference compute, and engineering overhead. When you discard that output without saving it, you pay the same cost again the next time the same question arises. Saving AI work to a structured knowledge base converts a recurring compute cost into a one-time investment.

    Pay for the Compute Once: How Saving Your AI Work Saves You Money

    Every time you open a new AI conversation and ask Claude or ChatGPT to research something, write something, or figure something out — you are paying for compute. Maybe you’re on a flat-rate subscription, so it doesn’t feel like a direct cost. But it is. The servers running inference on your query cost real money, and that cost is baked into whatever you’re paying monthly. More importantly, your time has a cost too. When you close that tab and that work disappears into the void, you’ve paid twice for the same problem the next time it comes up.

    This is the “pay for the compute twice” trap — and most people using AI tools are stuck in it without realizing it.

    What Does “Compute” Actually Mean in Plain Terms?

    When you send a message to an AI model, a server somewhere processes your request. It runs inference — meaning it uses a large language model to generate a response token by token. That inference costs electricity, GPU time, and engineering infrastructure. Whether you’re on a $20/month Claude Pro plan or building with the Anthropic API at $3 per million tokens, every response has a real compute cost attached to it.

    For API users, this is explicit — you see it on your bill. For subscription users, it’s implicit — it’s why your plan has usage limits and why the pricing tiers exist. The compute is never free. You are always paying for it, one way or another.

    The problem isn’t that compute costs money. The problem is that most people treat AI like a search engine — ask, get answer, close tab, repeat. That workflow throws away the value you just paid to generate.

    The Real Cost of Starting Over

    Here’s a real scenario. You spend 45 minutes with Claude building a competitive analysis for a new market you’re entering. Claude pulls together the key players, the positioning gaps, the pricing dynamics. It’s good work. You read it, feel informed, close the tab.

    Three weeks later, a colleague asks about that same market. You open a new Claude conversation and start over. Same 45 minutes. Same compute. Same cost. You’ve now paid for that analysis twice.

    Now multiply that across a team of five people over a year. The same research gets regenerated dozens of times. The same frameworks get rebuilt from scratch in every new session. The same onboarding context gets re-explained to the AI in every conversation. This is the silent tax on AI-native work — and it compounds fast.

    The Fix: Notion as Your AI Memory Layer

    The solution is deceptively simple: save the output before you close the tab. But simple doesn’t mean thoughtless. The way you save matters as much as whether you save.

    At Tygart Media, we use Notion as the AI memory layer for everything we build. The principle is straightforward: Notion is the storage layer, the publishing platform is the distribution layer, and cloud compute is where the inference happens. Nothing that Claude generates disappears without a home. Every research output, every strategic framework, every content brief, every integration spec — it goes to Notion first.

    This isn’t just about saving money on API calls. It’s about building institutional memory that compounds over time. When a piece of research lives in Notion with proper structure and tagging, it becomes a retrieval asset. Future conversations can reference it. Future team members can learn from it. Future AI sessions can build on it rather than rebuilding it.

    What’s Actually Worth Saving — and How to Structure It

    Not everything needs to be saved. A throwaway brainstorm session doesn’t need a permanent home. But anything that required real reasoning — research synthesis, strategic analysis, technical architecture decisions, content strategy frameworks — that’s compute you want to pay for exactly once.

    When you save AI work to Notion, structure matters. A flat dump of the conversation isn’t useful. What you want is:

    • A clear title that describes what was produced, not what was asked
    • Context at the top — what problem was being solved, what constraints existed
    • The actual output — the research, the framework, the decision, the artifact
    • Status and date — so you know if it’s still current
    • Next steps or open questions — so the work isn’t just archived but actionable

    This structure transforms a one-time AI output into a living knowledge asset. It’s the difference between a file you’ll never open again and a resource that actively makes future work faster.

    The ROI Math: What You Actually Save

    Let’s be concrete. If you’re on the Claude Max plan at $100/month and you spend an average of two hours per day doing meaningful AI-assisted work, your effective hourly compute rate is roughly $1.50/hour — just for the subscription cost, not counting your own time.

    If half of that work is regenerating things you’ve already generated — research you’ve lost, frameworks you’ve rebuilt, context you’ve re-explained — you’re burning roughly $50/month on duplicate compute. Over a year, that’s $600 in subscription costs paying for work you’ve already done.

    For a team of five using AI at similar intensity, duplicate compute waste can easily reach $3,000–$5,000 annually — just from not saving outputs systematically.

    But the time cost is the bigger number. A knowledge worker billing at $100/hour who regenerates 30 minutes of AI work three times per week is losing significant billable time to the compute-twice trap every month. The subscription cost is the small number. Your time is the big one.

    How to Build the Save Habit

    The save habit is behavioral before it’s technical. The hardest part isn’t setting up Notion — it’s remembering to save before you close the tab. A few practices that help:

    End every meaningful AI session with a save step. Before you close the conversation, ask yourself: did this session produce something I might need again? If yes, it goes to Notion before the tab closes. This takes 60 seconds and eliminates the compute-twice problem for that piece of work.

    Build a lightweight intake structure. Create a Notion database with a “Research & AI Outputs” category. Give it a Status field (Draft, Active, Archived) and a Date field. That’s enough to make your saved work searchable and retrievable without turning saving into a second job.

    Use the AI to write its own summary. At the end of a useful session, ask Claude: “Summarize what we just figured out in a format I can save to my knowledge base.” It will produce a clean, structured summary ready to paste into Notion. You paid for the compute to produce the work — use a few cents more of compute to make it saveable.

    Tag by problem type, not by date. Date is useful metadata, but problem type is what makes retrieval fast. “Competitive analysis,” “integration architecture,” “content strategy,” “cost modeling” — these are the tags that let you find the right output in six months when you need it again.

    Beyond Saving: Feeding Outputs Back to the AI

    Saving is the first half. The second half is retrieval — and this is where the real compounding happens.

    When you start a new AI session that needs context from previous work, you can paste the saved Notion output directly into the conversation. Claude can read it, build on it, and extend it without you having to re-explain everything from scratch. You’ve effectively given the AI persistent memory across sessions — something it doesn’t have natively.

    At scale, this is the difference between an AI that feels like a perpetual intern who never learns your business and an AI that feels like a senior colleague who knows your entire history. The AI gets smarter about your specific context with every session — because the outputs accumulate rather than evaporate.

    The Philosophy: Treat AI Output as an Asset

    The underlying shift here is philosophical. Most people treat AI conversations as disposable — a means to an end, like a Google search. You get the answer, you move on.

    The businesses that will build durable competitive advantage with AI are the ones that treat AI output as an asset class. Research is an asset. Frameworks are assets. Decision logs are assets. Competitive intelligence is an asset. Every meaningful AI conversation produces something that has value — and that value compounds when it’s saved, structured, and retrievable.

    Compute is a commodity. Knowledge is not. When you pay for compute once and preserve the knowledge it produces, you’re converting a recurring cost into a one-time investment. That’s the real economics of AI-native work — and it’s available to anyone willing to close the tab two minutes later than usual.

    Getting Started Today

    You don’t need a complex system to start capturing compute value. Start with this: create a single Notion page called “AI Research & Outputs.” Every time you have a meaningful AI conversation this week, paste the key output there before you close the tab. Do it for one week and look at what you’ve built. You’ll have a knowledge base worth more than the subscription that generated it — and you’ll never pay for the same compute twice again.

    Frequently Asked Questions

    What does “paying for AI compute” mean for subscription users?

    Even on flat-rate plans like Claude Pro or ChatGPT Plus, compute costs are real — they’re built into the subscription price. Usage limits, tier pricing, and rate caps all reflect the underlying infrastructure cost. Every conversation consumes real resources, whether you see an itemized bill or not.

    Why is Notion a good place to save AI outputs?

    Notion combines structured databases, free-form pages, searchable content, and team-sharing in one place. More importantly, it integrates with AI tools via API, meaning future AI sessions can read from your Notion knowledge base directly — turning saved outputs into active context rather than archived files.

    What types of AI work are worth saving?

    Anything that required substantive reasoning: competitive research, strategic frameworks, technical architecture decisions, content briefs, cost models, process documentation, and integration specs. Casual brainstorming and one-off quick answers generally aren’t worth the overhead of saving.

    How do I get Claude to summarize a session for saving?

    At the end of any useful conversation, simply ask: “Summarize the key outputs from this session in a structured format I can save to my knowledge base.” Claude will produce a clean, titled summary with context, outputs, and next steps — ready to paste directly into Notion.

    Can I feed saved Notion content back into future AI conversations?

    Yes. Paste the Notion content directly into a new Claude conversation as context. Claude will read it, build on it, and extend it without requiring you to re-explain the background. This is how you give AI persistent memory across sessions — something it doesn’t have natively.

    How much money does the compute-twice trap actually cost?

    For individual users, duplicate compute waste typically runs $50–$100/month in subscription value plus several hours of time. For teams of five or more using AI intensively, the annual cost of not saving outputs systematically can reach $5,000–$10,000 when both subscription waste and time cost are included.



  • Cortex, Hippocampus, and the Consolidation Loop: The Neuroscience-Grounded Architecture for AI-Native Workspaces

    Cortex, Hippocampus, and the Consolidation Loop: The Neuroscience-Grounded Architecture for AI-Native Workspaces

    I have been running a working second brain for long enough to have stopped thinking of it as a second brain.

    I have come to think of it as an actual brain. Not metaphorically. Architecturally. The pattern that emerged in my workspace over the last year — without me intending it, without me planning it, without me reading a single neuroscience paper about it — is structurally isomorphic to how the human brain manages memory. When I finally noticed the pattern, I stopped fighting it and started naming the parts correctly, and the system got dramatically more coherent.

    This article names the parts. It is the architecture I actually run, reported honestly, with the neuroscience analogy that made it click and the specific choices that make it work. It is not the version most operators build. Most operators build archives. This is closer to a living system.

    The pattern has three components: a cortex, a hippocampus, and a consolidation loop that moves signal between them. Name them that way and the design decisions start falling into place almost automatically. Fight the analogy and you will spend years tuning a system that never quite feels right because you are solving the wrong problem.

    I am going to describe each part in operator detail, explain why the analogy is load-bearing rather than decorative, and then give you the honest version of what it takes to run this for real — including the parts that do not work and the parts that took me months to get right.


    Why most second brains feel broken

    Before the architecture, the diagnosis.

    Most operators who have built a second brain in the personal-knowledge-management tradition report, eventually, that it does not feel right. They can not put words to exactly what is wrong. The system holds their notes. The search mostly works. The tagging is reasonable. But the system does not feel alive. It feels like a filing cabinet they are pretending is a collaborator.

    The reason is that the architecture they built is missing one of the three parts. Usually two.

    A classical second brain — the library-shaped archive built around capture, organize, distill, express — is a cortex without a hippocampus and without a consolidation loop. It is a place where information lives. It is not a system that moves information through stages of processing until it becomes durable knowledge. The absence of the other two parts is exactly why the system feels inert. Nothing is happening in there when you are not actively working in it. That is the feeling.

    An archive optimized for retrieval is not a brain. It is a library. Libraries are excellent. You can use a library to do good work. But a library is not the thing you want to be trying to replicate when you are trying to build an AI-native operating layer for a real business, because the operating layer needs to process information, not just hold it, and archives do not process.

    This diagnosis was the move that let me stop tuning my system and start re-architecting it. The system was not bad. The system was incomplete. It had one of the three parts built beautifully. It had the other two parts either missing or misfiled.


    Part one: the cortex

    In neuroscience, the cerebral cortex is the outer layer of the brain responsible for structured, conscious, working memory. It is where you hold what you are actively thinking about. It is not where everything you have ever known lives — that is deeper, and most of it is not available to conscious access at any given moment. The cortex is the working surface.

    In an AI-native workspace, your knowledge workspace is the cortex. For me, that is Notion. For other operators, it might be Obsidian, Roam, Coda, or something else. The specific tool is less important than the role: this is where structured, human-readable, conscious memory lives. It is where you open your laptop and see the state of the business. It is where you write down what you have decided. It is where active projects live and active clients are tracked and active thoughts get captured in a form you and an AI teammate can both read.

    The cortex has specific design properties that differ from the other two parts.

    It is human-readable first. Everything in the cortex is structured for you to look at. Pages have titles that make sense. Databases have columns that answer real questions. The architecture rewards a human walking through it. Optimize for legibility.

    It is relatively small. Not everything you have ever encountered lives in the cortex. It is the active working surface. In a human brain, the cortex holds at most a few thousand things at conscious access. In an AI-native workspace, your cortex probably wants to hold a few hundred to a few thousand pages — the active projects, the recent decisions, the current state. If it grows to tens of thousands of pages with everything you have ever saved, it is trying to do the hippocampus’s job badly.

    It is organized around operational objects, not knowledge topics. Projects, clients, decisions, deliverables, open loops. These are the real entities of running a business. The cortex is organized around them because that is what the conscious, working layer of your business is actually about.

    It is updated constantly. The cortex is where changes happen. A new decision. A status flip. A note from a call. The consolidation loop will pull things out of the cortex later and deposit them into the hippocampus, but the cortex itself is a churning working surface.

    If you have been building a second brain the classical way, this is probably the part you built best. You have a knowledge workspace. You have pages. You have databases. You have some organizing logic. Good. That is the cortex. Keep it. Do not confuse it for the whole brain.


    Part two: the hippocampus

    In neuroscience, the hippocampus is the structure that converts short-term working memory into long-term durable memory. It is the consolidation organ. When you remember something from last year, the path that memory took from your first experience of it into your long-term storage went through the hippocampus. Sleep plays a large role in this. Dreams may play a role. The mechanism is not entirely understood, but the function is: short-term becomes long-term through hippocampal processing.

    In an AI-native workspace, your durable knowledge layer is the hippocampus. For me, that is a cloud storage and database tier — a bucket of durable files, a data warehouse holding structured knowledge chunks with embeddings, and the services that write into it. For other operators it might be a different stack: a structured database, an embeddings store, a document warehouse. The specific tool is less important than the role: this is where information lives when it has been consolidated out of the cortex and into a durable form that can be queried at scale without loading the cortex.

    The hippocampus has different design properties than the cortex.

    It is machine-readable first. Everything in the hippocampus is structured for programmatic access. Embeddings. Structured records. Queryable fields. Schemas that enable AI and other services to reason across the whole corpus. Humans can access it too, but the primary consumer is a machine.

    It is large and growing. Unlike the cortex, the hippocampus is allowed to get big. Years of knowledge. Thousands or tens of thousands of structured records. The archive layer that the classical second brain wanted to be — but done correctly, as a queryable substrate rather than a navigable library.

    It is organized around semantic content, not operational state. Chunks of knowledge tagged with source, date, embedding, confidence, provenance. The operational state lives in the cortex; the semantic content lives in the hippocampus. This is the distinction most operators get wrong when they try to make their cortex also be their hippocampus.

    It is updated deliberately. The hippocampus does not change every minute. It changes on the cadence of the consolidation loop — which might be hourly, nightly, or weekly depending on your rhythm. This is a feature. The hippocampus is meant to be stable. Things in it have earned their place by surviving the consolidation process.

    Most operators do not have a hippocampus. They have a cortex that they keep stuffing with old information in the hope that the cortex can play both roles. It cannot. The cortex is not shaped for long-term queryable semantic storage; the hippocampus is not shaped for active operational state. Merging them is the architectural choice that makes systems feel broken.


    Part three: the consolidation loop

    In neuroscience, the process by which information moves from short-term working memory through the hippocampus into long-term storage is called memory consolidation. It happens constantly. It happens especially during sleep. It is not a single event; it is an ongoing loop that strengthens some memories, prunes others, and deposits the survivors into durable form.

    In an AI-native workspace, the consolidation loop is the set of pipelines, scheduled jobs, and agents that move signal from the cortex through processing into the hippocampus. This is the part most operators miss entirely, because the classical second brain paradigm does not include it. Capture, organize, distill, express — none of those stages are consolidation. They are all cortex-layer activities. The consolidation loop is what happens after that, to move the durable outputs into durable storage.

    The consolidation loop has its own design properties.

    It runs on a schedule, not on demand. This is the most important design choice. The consolidation loop should not be triggered by you manually pushing a button. It should run on a cadence — nightly, weekly, or whatever fits your rhythm — and do its work whether you are paying attention or not. Consolidation is background work. If it requires attention, it will not happen.

    It processes rather than moves. Consolidation is not a file-copy operation. It extracts, structures, summarizes, deduplicates, tags, embeds, and stores. The raw cortex content is not what ends up in the hippocampus; the processed, structured, queryable version is. This is the part that requires actual engineering work and is why most operators do not build it.

    It runs in both directions. Consolidation pushes signal from cortex to hippocampus. But once information is in the hippocampus, the consolidation loop also pulls it back into the cortex when it is relevant to current work. A canonical topic gets routed back to a Focus Room. A similar decision from six months ago gets surfaced on the daily brief. A pattern across past projects gets summarized into a new playbook. The loop is bidirectional because the brain is bidirectional.

    It has honest failure modes and health signals. A consolidation loop that is not working is worse than no loop at all, because it produces false confidence that information is getting consolidated when actually it is rotting somewhere between stages. You need visible health signals — how many items were consolidated in the last cycle, how many failed, what is stale, what is duplicated, what needs human attention. Without these, you do not know whether the loop is running or pretending to run.

    When I got the consolidation loop working, the cortex and hippocampus started feeling like a single system for the first time. Before that, they were two disconnected tools. The loop is what turns them into a brain.


    The topology, in one diagram

    If I were drawing the architecture for an operator who is considering building this, it would look roughly like this — and it does not matter which specific tools you use; the shape is what matters.

    Input streams flow in from the things that generate signal in your working life. Claude conversations where decisions got made. Meeting transcripts and voice notes. Client work and site operations. Reading and research. Personal incidents and insights that emerged mid-day.

    Those streams enter the consolidation loop first, not the cortex directly. The loop is a set of services that extract structured signal from raw input — a claude session extractor that reads a conversation and writes structured notes, a deep extractor that processes workspace pages, a session log pipeline that consolidates operational events. These run on schedule, produce structured JSON outputs, and route the outputs to the right destinations.

    From the consolidation loop, consolidated content lands in the cortex. New pages get created for active projects. Existing pages get updated with relevant new information. Canonical topics get routed to their right pages. This is how your working surface stays fresh without you having to manually copy things into it.

    The cortex and hippocampus exchange signal bidirectionally. The cortex sends completed operational state — finished projects, finalized decisions, archived work — down to the hippocampus for durable storage. The hippocampus sends back canonical topics, cross-references, and AI-accessible content when the cortex needs them. This bidirectional exchange is the part that most closely mirrors how neuroscience describes memory consolidation.

    Finally, output flows from the cortex to the places your work actually lands — published articles, client deliverables, social content, SOPs, operational rhythms. The cortex is also the execution layer I have written about before. That is not a contradiction with the cortex-as-conscious-memory framing; in a human brain, the cortex is both the working memory and the source of deliberate action. The analogy holds.


    The four-model convergence

    I want to pause and tell you something I did not know until I ran an experiment.

    A few weeks ago I gave four external AI models read access to my workspace and asked each one to tell me what was unique about it. I used four models from different vendors, deliberately, to catch blind spots from any single system.

    All four models converged on the same primary diagnosis. They did not agree on much else — their unique observations diverged significantly — but on the core architecture, they converged. The diagnosis, in their words translated into mine, was:

    The workspace is an execution layer, not an archive. The entries are system artifacts — decisions, protocols, cockpit patterns, quality gates, batch runs — that convert messy work into reusable machinery. The purpose is not to preserve thought. The purpose is to operate thought.

    This was the validation of the thesis I have been developing across this body of work, from an unexpected source. Four models, evaluated independently, landed on the same architectural observation. That was the moment I knew the cortex / hippocampus / consolidation-loop framing was not just mine — it was visible from the outside, to cold readers, as the defining feature of the system.

    I bring this up not to show off but to tell you that if you build this pattern correctly, external observers — human or AI — will be able to see it. The architecture is not a private aesthetic. It is a thing a well-designed system visibly is.


    Provenance: the fourth idea that makes the whole thing work

    There is a fourth component that I want to name even though it does not have a neuroscience analog as cleanly as the other three. It is the concept of provenance.

    Most second brain systems — and most RAG systems, and most retrieval-augmented AI setups — treat all knowledge chunks as equally weighted. A hand-written personal insight and a scraped web article are the same to the retrieval layer. A single-source claim and a multi-source verified fact carry the same weight. This is an enormous problem that almost nobody talks about.

    Provenance is the dimension that fixes it. Every chunk of knowledge in your hippocampus should carry not just what it means (the embedding) and where it sits semantically, but where it came from, how many sources converged on it, who wrote it, when it was verified, and how confident the system is in it. With provenance, a hand-written insight from an expert outweighs a scraped article from a low-quality source. With provenance, a multi-source claim outweighs a single-source one. With provenance, a fresh verified fact outweighs a stale unverified one.

    Without provenance, your second brain will eventually feed your AI teammate garbage from the hippocampus and your AI will confidently regurgitate it in responses. With provenance, your AI teammate knows what it can trust and what it cannot.

    Provenance is the architectural choice that separates a second brain that makes you smarter from one that quietly makes you stupider over time. Add it to your hippocampus schema. Weight every chunk. Let the retrieval layer respect the weights.


    The health layer: how you know the brain is working

    A brain that is working produces signals you can read. A brain that is broken produces silence, or worse, false confidence.

    I build in explicit health signals for each of the three components. The cortex is healthy when it is fresh, when pages are recently updated, when active projects have recent activity, and when stale pages are archived rather than accumulating. The hippocampus is healthy when the consolidation loop is running on schedule, when the corpus is growing without duplication, and when retrieval returns relevant results. The consolidation loop is healthy when its scheduled runs succeed, when its outputs are being produced, and when the error rate is low.

    I also track staleness — pages that have not been updated in too long, relative to how load-bearing they are. A canonical document more than thirty days stale is treated as a risk signal, because the reality it documents has almost certainly drifted from what the page describes. Staleness is not the same as unused; some pages are quietly load-bearing and need regular refreshes. A staleness heatmap across the workspace tells you which pages are most at risk of drifting out of reality.

    The health layer is the thing that lets you trust the system without having to re-check it constantly. A brain you cannot see the health of is a brain you will eventually stop trusting. A brain whose health is visible is one you can keep leaning on.


    What this costs to build

    I want to be honest about what actually getting this working takes. Not because it is prohibitive, but because the classical second-brain literature underestimates it and operators get blindsided.

    The cortex is the easy part. Any capable workspace tool, a few weeks of deliberate organization, and a commitment to keeping it small and operational. Cost: low. Most operators have some version of this already.

    The hippocampus is harder. You need durable storage. You need an embeddings layer. You need schemas that capture provenance and not just content. For a solo operator without technical capability, this is a real build project — probably a few weeks to months of focused work or a partnership with someone technical. It is also the part that, once built, becomes genuinely durable infrastructure.

    The consolidation loop is hardest. Because the loop is a set of services that extract, process, structure, and route, it is the most engineering-intensive part. This is where most operators stall. The solve is either to use tools that ship consolidation-like capabilities natively (Notion’s AI features are approximately this), or to build a small set of extractors and pipelines yourself with Claude Code or equivalent. For me, the loop took months of iteration to run reliably. It is now the highest-leverage part of the whole system.

    Total cost for an operator with moderate technical capability: a few months of evenings and weekends, some cloud infrastructure spend, and an ongoing maintenance commitment of maybe eight to ten percent of working hours. In exchange, you get an operating system that compounds with use rather than decaying.

    For operators who do not want to build the hippocampus and loop themselves, the vendor-shaped version of this architecture is starting to become available in 2026 — Notion’s Custom Agents edge toward a consolidation loop, Notion’s AI offers hippocampus-like capability at small scale, and various startups are working on the layers. None are complete yet. Most operators serious about this will need to build some of it.


    What goes wrong (the honest failure modes)

    Three failure modes are worth naming, because I have hit all three and the pattern recovered only because I caught them.

    The cortex that tries to be the hippocampus. Operators who get serious about a second brain often try to put everything in the cortex — every article they have ever read, every transcript of every meeting, every bit of research. The cortex then gets too big to be legible, starts running slowly, and the search stops returning useful results. The fix is to build the hippocampus separately and move the bulk of the corpus there. The cortex should be small.

    The hippocampus that gets polluted. Without provenance weighting and without deduplication, the hippocampus accumulates low-quality content that then gets retrieved and surfaced in AI responses. The fix is provenance, deduplication, and periodic hippocampal pruning. The archive is not sacred; some things earn their place and some things do not.

    The consolidation loop that nobody maintains. The loop is background infrastructure. Background infrastructure rots if nobody owns it. A consolidation loop that was working six months ago might be quietly broken today, and you only notice because your cortex is drifting out of sync with your operational reality. The fix is health signals, monitoring, and a weekly ritual of checking that the loop is running.

    None of these are dealbreakers. All of them are things the pattern has to work around.


    The one sentence I want you to walk away with

    If you take nothing else from this piece:

    A second brain is not a library. It is a brain. Build it with the three parts — cortex, hippocampus, consolidation loop — and it will behave like one.

    Most operators have built the cortex and called it a second brain. They have a library with the sign out front updated. The system feels broken because it is not a brain yet. Build the other two parts and the system stops feeling broken.

    If you can only add one part this month, add the consolidation loop, because the loop is the thing that makes everything else work together. A cortex without a loop is still a library. A cortex with a loop but no hippocampus is a library whose books walk into the back room and disappear. A cortex with a loop and a hippocampus is a brain.


    FAQ

    Is this just a metaphor, or does the neuroscience actually apply?

    It is a metaphor at the level of mechanism — the way neurons consolidate memories is not identical to the way a scheduled pipeline does. But the functional role of each component maps cleanly enough that the analogy is load-bearing rather than decorative. Where the architecture borrows from neuroscience, it inherits genuine design principles that compound the system’s coherence.

    Do I need all three parts to benefit?

    No. A well-built cortex alone is better than no system. A cortex plus a consolidation loop is significantly more powerful. Add the hippocampus when you have enough volume to justify it — usually once your cortex starts straining under its own weight, somewhere in the low thousands of pages.

    Which tool should I use for the cortex?

    The tool is less important than how you organize it. Notion is what I use and what I recommend for most operators because its database-and-template orientation maps cleanly to object-oriented operational state. Obsidian and Roam are better for pure knowledge work but weaker for operational state. Coda is similar to Notion. Pick the one whose grain matches how your brain already organizes work.

    Which tool should I use for the hippocampus?

    Any durable storage that supports embeddings. Cloud object storage plus a vector database. A cloud data warehouse like BigQuery or Snowflake if you want structured queries alongside semantic search. Managed services like Pinecone or Weaviate for pure vector workloads. The decision depends on what else you are running in your cloud environment and how technical you are.

    How do I actually build the consolidation loop?

    For operators with technical capability, a combination of Claude Code, scheduled cloud functions, and a few targeted extractors will get you there. For operators without technical capability, Notion’s built-in AI features approximate parts of the loop. For true coverage, you will eventually either need technical help or to wait for the vendor-shaped version to mature.

    Does this mean I need to rebuild my whole system?

    Not necessarily. If your existing workspace is serving as a cortex, keep it. Add a hippocampus as a separate layer underneath it. Build the consolidation loop between them. The cortex does not have to be rebuilt for the pattern to work; it has to be complemented.

    What if I just want a simpler version?

    A simpler version is fine. A cortex plus a lightweight consolidation loop that runs once a week is already far better than what most operators have. Do not let the fully-built pattern be the enemy of the partially-built version that still earns its place.


    Closing note

    The thing I want to convey in this piece more than anything else is that the architecture revealed itself to me over time. I did not sit down and design it. I built pieces, noticed they were not enough, built more pieces, noticed something was still missing, and eventually the neuroscience analogy clicked and the three-part structure became obvious.

    If you are building a second brain and it does not feel right, you are probably missing one or two of the three parts. Find them. Name them. Build them. The system starts feeling like a brain when it actually has the parts of a brain, and not before.

    This is the longest-running architectural idea in my workspace. I have been iterating on it for over a year. The version in this article is the one I would give a serious operator who was willing to do the work. It is not a quick start. It is an operating system.

    Run it if the shape fits you. Adapt it if some of the parts translate better to a different context. Reject it if you honestly think your current pattern works better. But if you are in the large middle ground where your system kind of works and kind of does not, the missing part is usually the hippocampus, the consolidation loop, or both.

    Go find them. Name them. Build them. Let your second brain actually be a brain.


    Sources and further reading

    Related pieces from this body of work:

    On the external validation: the cross-model convergent analysis referenced in this article was conducted using multiple frontier models evaluating workspace structure independently. The finding that the workspace behaves as an execution layer rather than an archive was independently surfaced by all evaluated models, which I took as meaningful corroboration of the internal architectural thesis.

    The neuroscience analogy is drawn from standard memory-consolidation literature, particularly work on hippocampal consolidation during sleep and the role of the cortex in conscious working memory. This article does not attempt to make rigorous claims about neuroscience; it borrows the functional analogy where the analogy is useful and drops it where it is not.

  • The Exit Protocol: The Section of Your Digital Life You Haven’t Written Yet

    The Exit Protocol: The Section of Your Digital Life You Haven’t Written Yet

    Every tool you enter, you will someday leave. Most operators don’t plan the exit until the exit is already happening. This is the protocol written before the catastrophe, not after.

    Target keyword: digital exit protocol Secondary: tool exit strategy, digital legacy planning, AI tool offboarding, operator continuity planning Categories: AI Hygiene, AI Strategy, Notion Tags: exit-protocol, ai-hygiene, operator-playbook, continuity, digital-legacy


    Every tool you enter, you will someday leave.

    You don’t know which exit you’ll face first. The breach that ends a Tuesday. The policy change that ends a vendor relationship in thirty days. The voluntary migration to something better. The one nobody plans for — the terminal one, where you’re gone or incapacitated and someone else has to figure out how your digital life was organized.

    The cheapest time to plan any of those exits is at the moment of entry. The most expensive time is the moment the exit is already underway.

    Most operators never write this section of their digital life. They enter tools. They stack data. They accumulate credentials. They build automations that depend on twelve other automations that depend on accounts they don’t remember creating. And if you asked them today, “if this specific tool vanished tomorrow, what happens?” — the honest answer is usually I don’t know, I’ve never looked.

    That’s the section this article is about. The exit protocol. The will-and-testament layer of digital life, written before the catastrophe rather than after.

    I’m going to describe the four exits every operator faces, the runbook for each, and the pre-entry checklist that keeps the whole stack from becoming a trap you can’t get out of. None of this is theoretical — it’s the protocol I actually run, cleaned up enough to be useful to someone else building their own version.


    Why this matters more in 2026 than it did in 2020

    For most of the personal-computing era, “exit” meant closing a browser tab. You used a tool, you were done, you left. The consequences of not planning the exit were small because the surface was small.

    That’s not the shape of digital life in 2026. The operator running a real business now sits on top of a stack that typically includes:

    • A knowledge workspace (Notion, Obsidian, or similar) holding years of operational state
    • An AI layer (Claude, ChatGPT, or similar) with memory, projects, and connections to your workspace
    • A cloud provider account running compute, storage, and services
    • Web properties with published content and user data
    • Scheduling, CRM, and communication tools with their own data stores
    • A password manager sitting behind all of it
    • An identity root (usually a Google or Apple account) holding the keys

    Any one of these can end. By breach. By policy change. By price increase you can’t absorb. By vendor shutdown. By personal rupture that isn’t business at all. By death, which is the scenario nobody wants to write about and exactly the one that makes the planning most valuable.

    And every piece is entangled with the pieces above and below it. Your Notion workspace references your Gmail. Your Gmail authenticates your cloud provider. Your cloud provider runs the services your web properties depend on. Your password manager holds the recovery codes for everything. The stack is a single living system with many failure modes, and the only version of “exit planning” that works is the one that treats the stack as a whole.


    The seven questions

    Before you can plan an exit, you need to be able to answer seven questions about every tool in your stack. If you can’t answer them, the exit plan is a fiction.

    1. What lives there? Data, credentials, intellectual property. Not “everything” — specifically, what is in this tool that doesn’t exist anywhere else?

    2. Who else has access? Human collaborators. Service accounts. OAuth connections. API keys you gave out and forgot about. Every form of access is a potential inheritance path.

    3. How does it get out? The export surface. Format. Cadence. Whether the export includes everything or just some things. Whether the export requires the UI or has an API.

    4. What deletes on what trigger? Vendor retention policies. Your own rotation schedule. End-of-engagement deletion for client work. What happens to data if you stop paying.

    5. Who inherits what? Family. Team. Clients. The answer is usually “nobody, by default” — and that default is the whole problem.

    6. How do downstream systems keep working? If this tool ends, what else breaks? What continuity can be preserved without handing over live credentials to somebody who shouldn’t have them?

    7. How do I know the exit still works? Drill cadence. When was the last time you actually exported the data and opened the export on a clean machine to verify it was intact?

    If you answer these seven questions for every tool in your stack, you will find things that surprise you. Credentials that have been in live rotation for three years. Tools whose “export” button produces a file that can’t be opened by anything else. Dependencies on your Gmail that would make inheritance a nightmare. That’s fine — finding those things is the point. You can’t fix what you haven’t looked at.


    The four exit scenarios

    Every exit fits into one of four shapes. The shape determines the runbook. Getting this taxonomy right is what lets the rest of the protocol be specific.

    Sudden: breach or compromise

    The credential leaked. The account got taken over. A vendor breach exposed data you didn’t know was even there. Minutes matter. The goal is to contain the damage, not to plan the migration.

    Forced: policy or shutdown

    The vendor killed the product. The terms changed in a way you can’t live with. The price went up by an order of magnitude. Days to weeks, usually. The goal is to export cleanly and migrate to a successor before the window closes.

    Terminal: death or incapacity

    You are gone or can’t operate. Someone else has to keep things running or wind them down cleanly. This is the scenario most operators never plan for, and it’s the one with the highest cost if the plan doesn’t exist.

    Voluntary: better option or done

    You chose to leave. Migration to a new tool. End of a client engagement. Lifestyle change. Weeks to months of runway. The goal is a clean handoff with no orphan state left behind.

    Each of these has its own runbook. Running the wrong one for the situation is a common failure — treating a forced shutdown like a voluntary migration wastes the window; treating a breach like a forced shutdown fails to contain the damage.


    Runbook: Sudden

    The situation is: something leaked or got taken over. You find out either because a monitoring alert fired or because something visibly broke. Either way, the clock started before you noticed.

    1. Contain. Pull the compromised credential immediately. Rotate the key. Revoke every token you issued through that credential. Sign out of every active session. This is the first ten minutes.

    2. Scope. List every system the credential touched in the last thirty days. Assume the blast radius is wider than it looks — adjacent systems often share trust in ways you forgot about. The goal is to understand what the attacker could have done, not just what they did do.

    3. Notify. If client or customer data is in scope, notify according to your contracts and any applicable law. Today, not tomorrow. Breach disclosure windows are tight and getting tighter; the legal risk of delay is usually worse than the embarrassment of early notification.

    4. Rebuild. Issue a new credential. Scope it to minimum permissions. Never restore the old credential — the temptation to “reuse it once we figure out what happened” is how re-compromise works.

    5. Postmortem. Write it the same week. Not a blameless postmortem for PR purposes; a real one, for your own internal knowledge. What was the failure mode? What signal did you miss? What changes to the protocol would have caught it earlier? The postmortem is the only way the Sudden scenario makes the rest of the stack safer instead of just more anxious.


    Runbook: Forced

    A vendor is shutting down the product, changing the terms in an unacceptable way, or pricing you out. You have some window of runway — days to weeks — before the tool goes dark.

    1. Triage. How long until the tool goes dark? What is the critical-path data — the stuff that doesn’t exist anywhere else? Separate that from everything else.

    2. Export. Run the full export immediately, even before you’ve decided what to migrate to. A cold archive is cheap; a missed export window is permanent. This is the most common failure mode of the Forced scenario — operators wait until they’ve chosen a successor before exporting, and the window closes.

    3. Verify. Open the export on a clean machine. Not the one you usually work on. A clean machine, with no existing context, so you can confirm that the export is actually usable without the source system. Many “export” features produce files that look complete but reference data that only exists in the source system.

    4. Choose a successor. Match on data shape, not feature list. The data is the asset; the UI is rentable. A successor tool that imports your data cleanly but doesn’t have every feature you liked is a better choice than one with more features and a lossy import path.

    5. Cutover. Migrate. Run both systems in parallel for one full operational cycle. Then decommission the old one. The parallel cycle is where you discover what the export missed.


    Runbook: Terminal

    This is the runbook most operators never write. Writing it is the whole point of this article.

    If you are gone or can’t operate, someone else needs to know: what’s running, who depends on it, and how to either keep things going or wind them down cleanly. The default state — no plan — is a nightmare for whoever inherits the problem.

    The Terminal runbook has five components, and each one can be written in an evening. Don’t let the scope of the topic talk you out of writing the simple version now.

    Primary steward. One named person who becomes the point of contact if you can’t operate. Usually a spouse, partner, or trusted family member. They don’t need to understand how the stack works. They need to know where the instructions are and who the operational steward is.

    Operational steward. A named professional who can keep systems running during the transition. For technical infrastructure, this is typically a trusted developer or consultant who already knows your stack. For legal and financial, this is an attorney and accountant. Name them. Have the conversation with them before you need it.

    What the primary steward gets immediately. A one-page document describing the situation. Access to a password manager recovery kit. A list of active clients and the minimum needed to pause operations gracefully. Contact information for the operational steward. Nothing more than this. Specifically, they do not get live admin credentials to client systems, live cloud provider keys, or live AI project memory — those are inheritance paths that go through the operational steward or the attorney, not into a drawer.

    Trigger documents. A signed letter of instruction, stored with the attorney and copied to a trusted location at home. It references the operational runbook by URL or location. It names who is authorized to do what, under what conditions, for how long.

    Digital legacy settings. Most major platforms have inactive-account or legacy-contact features built in. Configure them. Google has Inactive Account Manager. Apple has Legacy Contact. Notion has workspace admin inheritance. Configuring these is fifteen minutes per platform and they do real work when they’re needed.

    Crucial: do not store live credentials in a will. Wills become public record in probate. The recovery path is a letter of instruction pointing at a password manager whose emergency kit is held by a trusted professional, not credentials written into a legal document.


    Runbook: Voluntary

    You chose to leave. Good. This is the least stressful exit because you have runway, you chose the timing, and the data is not under siege.

    1. Announce the exit window. To yourself. To your team. To any client whose work touches this tool. Set a specific date and commit to it.

    2. Freeze net-new. Stop adding data to the system being retired. New data goes to the successor; old data stays put until migration.

    3. Export and verify. Same as the Forced runbook. Full export, clean machine, integrity check.

    4. Migrate. Move data to the successor. Re-point automations, integrations, and any external references. Update documentation and internal links.

    5. Archive. Keep a cold copy of the old system’s export in durable storage, labeled with the exit date. Do not delete the original account for at least ninety days. Things you forgot about will surface during that window and you will want the ability to recover them.

    6. Decommission. Revoke remaining keys. Cancel billing. Close the account. Remove the tool from your password manager. Update any documentation that still mentioned it.


    The drill cadence (the thing that actually makes the protocol real)

    A protocol nobody practices is a protocol that doesn’t exist. The only way to know your exit plan works is to test it, repeatedly, on a schedule that makes failures cheap.

    Quarterly — thirty minutes. Pick one tool. Run its export. Open the export on a clean machine. Log the result. If the export is broken, fix it now, while there’s no emergency. Thirty minutes, four times a year. That’s two hours of investment to know your stack is actually recoverable.

    Semi-annual — two hours. Rotate every credential in the stack. Prune AI memory down to what’s actually load-bearing. Re-read the exit protocol end-to-end and update anything that’s drifted out of date. The credential rotation alone catches more problems than any other single practice in the hygiene layer.

    Annual — half a day. Run a full Terminal scenario dry run. Sit with your primary steward. Walk through the letter of instruction. Verify that your attorney has the current version. Update the digital legacy settings on every major platform. Confirm that the operational steward is still willing and available.

    These cadences add up to roughly eight hours of exit-related work per year. Eight hours against the cost of a stack that could otherwise catastrophically collapse on the worst day of your life. It’s a trade you want to make.


    The pre-entry checklist

    The most important protocol move is the one that happens before the tool enters the stack at all. Every new tool you adopt creates an exit you’ll eventually need. Planning it at entry is radically cheaper than planning it in crisis.

    Before adopting any new tool, answer these questions:

    What is the export format, and have you opened a sample export? If the vendor doesn’t offer export, or the export is a proprietary format nothing else reads, the tool is a data trap. Accept the tradeoff knowingly or pick a different tool.

    Is there an API that would let you back up without the UI? UI-only exports scale poorly. An API you can call on a schedule gives you durable backup without depending on the vendor to maintain the export feature.

    What is the vendor’s retention and deletion policy? How long does data stick around after you stop paying? What happens to the data if the vendor is acquired? What’s their policy on third-party data processing?

    What credentials or tokens will this tool hold, and where do they rotate? A tool that holds an OAuth token to your primary email is a very different risk profile from one that holds only its own password. Inventory the credentials at entry.

    If the vendor raises the price ten times, what is your Plan B? This question sounds paranoid. Vendors raise prices tenfold more often than you’d expect. Having a Plan B in mind at entry is very different from scrambling for one at the three-week mark of a forced migration.

    If you died tomorrow, how would someone downstream keep this working or shut it down cleanly? If the answer is “they couldn’t,” you haven’t finished adopting the tool. Keep this in mind particularly for anything where you’re the only person with access.

    Does this tool belong in your knowledge workspace, your compute layer, or neither? Not every new tool earns a place in the stack. Some are better rented briefly for a specific project and then left behind. The pre-entry moment is when you decide which tier this tool lives in.

    Seven questions. Fifteen minutes of thinking. The return on those fifteen minutes is everything you don’t have to untangle later.


    What this protocol is not

    Three clarifications to close the frame correctly.

    This isn’t paranoid. It’s ordinary due diligence applied to a category of risk that most operators have not caught up to yet. Every legal entity has a wind-down plan. Every serious business has a disaster recovery plan. The digital life of a one-human operator running a real business has the same obligations; it just hasn’t had them named before.

    This isn’t purely defensive. The exit protocol produces upside beyond catastrophe avoidance. The discipline of knowing what’s in every tool, who has access, and how to get data out makes the whole stack more coherent. Operators who run this protocol find themselves making cleaner choices about new tools, which means less sprawl, which means less hygiene debt. The protocol pays rent every month, not just when things break.

    This isn’t a one-time project. It’s a standing practice. The stack changes. Tools enter. Tools leave. Credentials rotate. Family situations evolve. The protocol is never finished; it’s maintained. That’s why the drill cadence matters. The one-time-project version of this decays into fiction within a year. The standing-practice version stays alive because it gets touched regularly.


    The one thing I’d want you to walk away with

    One sentence. If you only remember one, let it be this:

    Every tool you enter, you will someday leave — and the cheapest time to plan the leaving is at entry.

    If that sentence changes how you approach the next tool you consider adopting, it changed the shape of your stack. Not in a dramatic way. In the small, compounding way that good hygiene always works.

    The operators I know who have survived the roughest exits — the breaches, the vendor shutdowns, the personal emergencies — all share one thing in common. They planned the exit before they needed it. Not because they expected the catastrophe. Because they understood that the exit was coming, eventually, in some form, for every single thing they’d built, and that planning it in calm was radically cheaper than planning it in crisis.

    The exit is coming. For every tool. For every account. For every service. For every credential. Eventually.

    Plan it now.


    FAQ

    What’s the most important piece of this protocol if I only have an hour to spend?

    Write the one-page Terminal scenario letter. Name your primary steward. Name your operational steward. Put the password manager emergency kit in a place they can find. That one hour, invested now, is the highest-leverage thing in the entire protocol.

    I’m a solo operator with no family. Does the Terminal runbook still apply?

    Yes, and it’s more important for you than for operators with a family who would step in by default. You need an operational steward — a professional or trusted peer — who can wind things down if you can’t. Without that named person, client work will orphan in a way that creates real harm for people who depended on you.

    How often should I rotate credentials?

    Every six months at a minimum for anything load-bearing, immediately on any suspected compromise, and whenever someone with access leaves a collaboration. The Quarterly drill cadence catches stale credentials on a regular rhythm; full rotation on Semi-annual catches the long-tail.

    What about AI-specific exits — Claude, ChatGPT, Notion’s AI?

    Treat AI memory as a liability to be pruned, not an asset to be preserved. Export what’s genuinely valuable (artifacts, specific conversations you want as reference), then prune aggressively. AI memory that sits around accumulating is increasing your blast radius in every other exit scenario. The hygiene move is minimal memory, not maximum memory.

    Do I need an attorney for this?

    For the Terminal scenario specifically, yes. The letter of instruction and any trigger documents that grant authority in your absence are legal documents and should be reviewed by a professional. The rest of the protocol (exports, credential rotation, drill cadence) doesn’t need legal help.

    What about my password manager? What happens if I lose access to it?

    Every major password manager has an emergency access feature — a trusted contact who can request access to your vault after a waiting period. Configure it. It’s the single most important configuration item in the entire protocol, because the password manager is the root of recovery for everything else.

    How do I know when my export is actually complete?

    Open it on a different machine, in a different tool, and try to answer three specific questions using only the export: “What was the state of X project?”, “Who had access to Y?”, “When did Z happen?” If you can answer all three, the export is usable. If any question requires reaching back to the source system, the export is incomplete.

    What if my spouse or partner isn’t technical? Can they still be the primary steward?

    Yes. The primary steward’s job is not to operate the systems. Their job is to know where the instructions are and who to call. If you write the operational runbook clearly enough that a non-technical person can follow it to the operational steward, the division of responsibility works.


    Closing note

    The section of your digital life you haven’t written yet is the exit. Almost nobody writes it until they need it, and the moment you need it is the worst moment to write it.

    Write it now, in calm, with time to think. Don’t try to write it perfectly. A rough version that exists is infinitely better than a perfect version that doesn’t. The drill cadence will improve the rough version over years; the blank document never improves at all.

    If this article leads you to spend a single evening on a single runbook — even just the Terminal scenario, even just the one-page letter to your primary steward — it has done its job. The rest of the protocol can build from there.

    Every tool you enter, you will someday leave. Leave on purpose.


    Sources and further reading

    Related pieces from this body of work:

    On the Terminal scenario specifically, the Google Inactive Account Manager and Apple Legacy Contact features are both worth configuring today. Fifteen minutes apiece. Search your account settings for “inactive” or “legacy.”

  • Archive vs Execution Layer: The Second Brain Mistake Most Operators Make

    Archive vs Execution Layer: The Second Brain Mistake Most Operators Make

    I owe Tiago Forte a thank-you note. His book and the frame he popularized saved a lot of people — including a younger version of me — from living entirely inside their email inbox. The second brain concept was the right idea for the era it emerged in. It taught a generation of knowledge workers that their thinking deserved a system, that notes were worth taking seriously, that personal knowledge management was a discipline and not a character flaw.

    But the era changed.

    Most operators still building second brains in April 2026 are investing in the wrong thing. Not because the second brain was ever a bad idea, but because the goal it was built around — archive your knowledge so you can retrieve it later — has been quietly eclipsed by a different goal that the same operators actually need. They haven’t noticed the eclipse yet, so they’re spending evenings tagging notes and building elaborate retrieval systems while the job underneath them has shifted.

    This article is about the shift. What the second brain was for, what it isn’t for anymore, and what it should be replaced with — or rather, what it should be promoted to, because the new goal isn’t the opposite of the second brain; it’s the next version.

    I’m going to use a single distinction that has saved me more architecture mistakes than any other in the last year: archive versus execution layer. Once you can tell them apart, most of the confusion about knowledge systems resolves itself.


    What the second brain actually was (and why it worked)

    Before the critique, credit where credit is due.

    The second brain frame, as Tiago Forte articulated it starting around 2019 and formalized in his 2022 book, was a response to a specific problem. Knowledge workers were drowning in information — articles to read, books to remember, meetings to process, ideas to capture. The brain, the original one, is not great at holding all of that. Things slipped. Valuable thinking got lost. The second brain proposed a systematic external memory: capture widely, organize intentionally (the PARA method — Projects, Areas, Resources, Archives), distill progressively, express creatively.

    It worked because it named the problem correctly. For someone whose job required integrating lots of information into creative output — writers, researchers, analysts, knowledge workers — the capture-organize-distill-express loop produced real leverage. Over 25,000 people took the course. The book was a bestseller. An entire productivity-content ecosystem grew up around it. Notion became popular partly because it was a good place to build a second brain. Obsidian and Roam Research exploded for the same reason.

    I want to be unambiguous: the second brain frame was a good idea, correctly articulated, in the right moment. If you built one between 2019 and 2023 and it served you, it served you. You weren’t wrong to do it.

    You just might be wrong to still be doing it the same way in 2026.


    The thing that quietly changed

    Here’s what shifted between the era the second brain frame emerged and now.

    In 2019, the bottleneck was retrieval. If you had captured a piece of information — an article, a quote, an insight — the question was whether you could find it again when you needed it. Your system had to help the future-you pull the right thing out of the archive at the right time. Tagging mattered. Folder structure mattered. Search mattered. The whole architecture was designed to solve the retrieval bottleneck.

    In 2026, retrieval is no longer a meaningful bottleneck. Claude can read your entire workspace in seconds. Notion’s AI can search across everything you’ve ever put in the system. Semantic search finds things your tagging couldn’t. If you captured it, you can find it — without ever having to think about where you put it or what you called it.

    The retrieval problem got solved.

    So now the question is: what is the knowledge system actually for?

    If its job was to help you retrieve things, and retrieval is a solved problem, then the whole architecture of a second brain — the capture discipline, the PARA hierarchy, the progressive summarization — is solving a problem that is no longer the binding constraint on your productivity.

    The new bottleneck, the one that actually determines whether an operator ships meaningful work, is not retrieval. It’s execution. Can you actually act on what you know? Can your system not just surface information but drive action? Can the thing you built help you run the operation, not just remember it?

    That’s a different job. And a system optimized for the first job is not automatically good at the second job. In fact, it’s often actively bad at it.


    Archive vs execution layer: the distinction

    Let me name the distinction clearly, because the whole article depends on it.

    An archive is a system whose primary job is to hold information faithfully so that it can be retrieved later. Libraries are archives. Filing cabinets are archives. A well-organized Google Drive is an archive. A second brain, in its classical formulation, is an archive — a carefully indexed personal library of captured thought.

    An execution layer is a system whose primary job is to drive the work actually happening right now. It holds the state of what’s in flight, what’s decided, what’s next. It surfaces what matters for current action. It interfaces with the humans and AI teammates who are doing the work. An operations console is an execution layer. A well-designed ticketing system is an execution layer. A Notion workspace set up as a control plane (which I’ve written about elsewhere in this body of work) is an execution layer.

    Both have their place. They are not competing for the same real estate. You need some archive capability — legal records, signed contracts, historical decisions worth preserving. You need some execution layer — for the actual work in motion.

    The mistake most operators make in 2026 is treating their entire knowledge system like an archive, when their bottleneck has become execution. They pour energy into capture, organization, and retrieval. They get very little back because those activities no longer compound into leverage the way they used to. Meanwhile, their execution layer — the thing that would actually move their work forward — is underbuilt, undertooled, and starved of attention.

    The shift isn’t abandoning archiving. It’s recognizing that archiving is now the boring, solved utility layer underneath, and the real system design question is about the execution layer above it.


    Why the second brain architecture actively gets in the way

    This is the part that’s going to be uncomfortable for some readers, and I want to name it directly.

    The classical second-brain architecture doesn’t just fail to produce leverage for operators. It actively fights against what you actually need your system to do.

    Capture everything becomes capture too much. The core discipline of a second brain is wide capture — save anything that might be useful, sort it out later. In a retrieval-bound world this was fine because the downside of over-capture was only disk space. In an AI-read world, over-capture has a new cost: the AI you’ve wired into your workspace now has to reason across a corpus full of things you shouldn’t have saved. Old half-formed ideas. Articles that turned out not to matter. Drafts of thinking you would never let see daylight. Your AI teammate is seeing all of it, weighting it in responses, occasionally surfacing it in ways that are embarrassing.

    PARA optimizes for archive navigation, not current action. Projects, Areas, Resources, Archives. It’s a taxonomy for finding things. A taxonomy for doing things looks different: what’s active, what’s on deck, what’s blocked, what’s decided, what’s watching. Many people’s PARA systems silently morph into graveyards where active projects die because the structure doesn’t surface them — it files them.

    Progressive summarization trains the wrong reflex. The Forte method of progressively bolding, highlighting, and distilling notes is brilliant for a future-retrieval world. The reflex it trains — “I’ll process this later, the value is in the distillation” — is poisonous for an execution world. The value now is in doing the work, not in preparing the notes for the work.

    The system becomes the job. The most common failure mode I’ve watched play out is operators who spend more time tending their second brain than they spend on actual output. Tagging. Reorganizing. Restructuring their PARA hierarchy for the fourth time this year. The second brain becomes a hobby that feels productive because it’s complicated, but produces nothing the world actually sees. This has always been a risk of personal knowledge management, but it compounds dramatically in 2026 because the system-tending is now competing with a different, higher-leverage use of the same time: building the execution layer.

    I am not saying these failure modes are inherent to Tiago’s teaching. He’s explicit that the system should serve the work, not become the work. But the architecture makes the wrong path easier than the right one, and a lot of practitioners take it.


    What an execution layer actually looks like

    If you’ve followed the rest of my writing this month, you’ve seen pieces of it. Let me name it directly now.

    An execution layer is a workspace organized around the actual objects of your business — projects, clients, decisions, open loops, deliverables — rather than around categories of knowledge. Each object has a status, an owner, a next action, and a surface where it lives. The system exists to drive those objects forward, not to hold them for contemplation.

    A functioning execution layer has:

    A Control Center. One page you open first every working day that surfaces the live state — what’s on fire, what’s moving, what needs your call. Not a dashboard in the BI sense. A living summary updated continuously, readable in ninety seconds.

    An object-oriented database spine. Projects, Tasks, Decisions, People (external), Deliverables, Open Loops. Each one a real operational entity. Each one with a clear status taxonomy. Each one answerable to the question “what changed recently and what does that mean I should do?”

    Rhythms embedded in the system itself. A daily brief that writes itself. A weekly review that drafts itself. A triage that sorts itself. The system does the operational rhythm work so the human can do the judgment work.

    A small, deliberate archive underneath. Yes, you still need to preserve some things. Completed project records. Signed contracts. Important decisions for the historical record. But the archive is the sub-basement of the execution layer, not the whole building. You visit it occasionally. You don’t live there.

    Wired-in intelligence. Claude, Notion AI, or whatever intelligence layer you’ve chosen, reading from and writing to the execution layer so it can actually participate in the work rather than just answering questions about your notes.

    Compare that to what a classical second brain prioritizes — capture discipline, PARA hierarchy, progressive summarization — and you can see the difference immediately. The second brain is a library. The execution layer is a workshop.

    Operators need workshops, not libraries. Libraries are lovely. Workshops get things built.


    The migration path (how to change without blowing up what you have)

    If this article has landed and you’re looking at your own carefully-built second brain and realizing it’s mostly an archive, here’s how I’d approach the transition. I’ve done this in my own system, so this isn’t theoretical.

    Don’t delete anything yet. The worst move is to blow up the existing structure and rebuild from scratch. You have years of context in there. You’ll lose some of it even if you try to be careful. The right move is a layered transition, where you build the execution layer above the archive while leaving the archive intact underneath.

    Build the Control Center first. Before you touch any existing content, create the new anchor. One page. Two screens long. Links to the databases you actually work from. Live state at the top. This is the new front door to your workspace.

    Identify the active objects. What are you actually working on? Which clients, projects, deliverables, decisions? Make clean new databases for those, separate from whatever PARA folders you’ve accumulated. Move live work into those new databases. Let dead work stay in the archive where it already is.

    Install one rhythm agent. Pick the one operational rhythm that costs you the most attention — usually the morning context-gathering. Build a Custom Agent that handles it. See what it changes. Add another agent only after the first one is actually working.

    Gradually migrate what matters, archive what doesn’t. Over time, anything in your old second-brain structure that you actually reference will reveal itself by showing up in searches and references. Move those into the execution layer. Anything that doesn’t come up in a year genuinely belongs in the archive, not in your working system.

    Accept that the archive will shrink in importance over time. Not because it’s useless, but because its role changes from “primary workspace” to “occasional reference.” That’s fine. The archive was never the point. You just thought it was because the frame you were working from told you so.

    The whole transition can happen over a month of evenings. It doesn’t require a weekend rebuild. It requires a mental shift from “the system is a library” to “the system is a workshop with a small library attached.”


    What this is not

    A few clarifications before the critique side of this article leaves the wrong impression.

    I’m not saying don’t take notes. Taking notes is still valuable. Capturing thinking is still valuable. The shift isn’t away from writing things down; it’s away from treating the collection of written-down things as the system’s point.

    I’m not saying Tiago Forte was wrong. He was right for the era. He’s also shifted with the era — his AI Second Brain announcement in March 2026 is an explicit acknowledgment that the frame needs to evolve. Anyone still teaching the pure 2022 version of second-brain methodology without integrating what AI changed is the one not keeping up. Tiago himself is keeping up.

    I’m not saying archives are obsolete. Some things deserve archiving. Legal records, contracts, finished projects you might revisit, historical decisions, creative work you’ve produced. Archives are still a useful subcomponent of a functioning operator system. They just aren’t the system anymore.

    I’m not saying everyone who built a second brain made a mistake. If yours is working for you, keep it. The question is whether, if you sat down to design a knowledge system from scratch in April 2026 knowing what you now know about AI-as-teammate, you would build the same thing. My guess is most operators honestly answering that question would say no. If that’s your answer, this article is for you. If it isn’t, you can ignore me and carry on.


    The generalization: every layer eventually gets demoted

    There’s a broader pattern here worth naming because it keeps happening and most operators don’t see it coming.

    Every system that was load-bearing in one era gets demoted to a utility layer in the next. This isn’t a failure of the old system; it’s evidence that something else got built on top.

    Filing cabinets were a primary interface to knowledge work in the mid-20th century. They’re now a sub-basement of most offices. Email was a revolution in the 1990s. It’s now a backchannel for notifications from actual productivity systems. Spreadsheets were the original personal computing killer app. They’re now mostly a data-plumbing layer underneath dashboards and applications.

    The second brain is on the same arc. In 2019 it was revolutionary. In 2026 it’s becoming the quiet plumbing underneath the actual workspace. The frame that wanted it to be the whole system is going to age badly. The frame that treats archiving as a useful utility layer under something more alive is going to age well.

    The prediction that matters: five years from now, the operators who get the most leverage will be running execution layers with archives attached, not archives with execution layers grafted on. The architecture will be inverted from the second-brain orientation, and the second-brain era will look like the phase where people learned they needed a system — before the system learned what it was for.


    The one thing I want you to walk away with

    If you only remember one sentence from this article, let it be this:

    Your system’s job is to drive action, not to preserve context.

    Preserving context is a useful secondary function. The whole point of the system — the thing that justifies the time, the maintenance, the architectural decisions, the discipline — is that it helps you act. Not remember. Not retrieve. Not feel organized. Act.

    Every design decision you make about your knowledge system should be tested against that criterion. Does this help me act on what matters? If yes, keep it. If no, archive it or remove it. The discipline is ruthless about what earns its place, because everything that doesn’t earn its place is stealing attention from the thing that would.

    Most second brains I see in 2026 fail that test for most of their bulk. That’s the polite version. The honest version is that many operators have built elaborate systems that feel productive to maintain but produce nothing measurable in the world.

    The execution layer is the fix. Not as a replacement for archiving, but as the shift in orientation: from “preserve knowledge” to “drive work,” from library to workshop, from the discipline of capture to the discipline of action.

    If you take one evening this week and spend it rebuilding your workspace around that question, you will get more leverage from that evening than from a month of tagging.


    FAQ

    Is the second brain dead? No. The frame — “build a system that serves as external memory for your thinking” — is still useful. What’s changed is that the architecture Tiago Forte taught was optimized for a retrieval-bound world, and retrieval is no longer the binding constraint. The concept lives on; the implementation has evolved.

    What about Tiago’s new AI Second Brain course? It’s an honest update to the frame. Tiago announced his AI Second Brain program in March 2026 as a response to the same shift this article describes — Claude Code, agent harnesses, and AI that can actually read and act on your files. His version and mine may differ in emphasis, but we’re pointing at the same underlying change.

    Should I delete my existing second brain? No. Build the execution layer on top of it, migrate what matters, let the rest stay archived. Deleting your historical work is a loss you can’t undo. Reorienting what you focus on going forward is a gain that doesn’t require destroying what you have.

    What if I’m not an operator? What if I’m a student, writer, or creative? The archive-versus-execution-layer distinction still applies but weights differently. Students and creatives may still benefit from an archive orientation because their work actually does involve deep research and synthesis that’s retrieval-bound. Operators running businesses have a different bottleneck. Match the system to the actual bottleneck in your specific work.

    What do you use for your own execution layer? Notion, with Claude wired in via MCP, and a handful of operational agents running in the background. The specific stack is described in my earlier articles in this series; the pattern is tool-independent. Any capable workspace plus a capable AI layer can implement it.

    What about systems like Obsidian, Roam, or Logseq? All excellent archives. Less suited to the execution-layer role because they were designed around the knowledge-graph-and-retrieval use case. You can build execution layers in them, but you’re fighting the grain of the tool. Notion’s database-and-template orientation is a better fit for the operator pattern.

    Isn’t this just reinventing project management? Partially, yes. The execution layer shares DNA with project management systems. The difference is that project management systems are typically built for teams coordinating across many people, while the operator execution layer is built for one human (or a very small team) leveraged by AI. The priorities and design choices differ accordingly.

    How long does this transition take? The minimum viable version — Control Center, object-oriented databases, one rhythm agent — is a week of part-time work. The full transition from a classical second brain to a working execution layer is usually two to three months of gradual iteration. You don’t have to do it all at once.


    Closing note

    I wrote this knowing some readers will push back, and pushback on this one will be easier to dismiss than to engage with. That’s worth flagging up front.

    The easy dismissal: “You’re attacking Tiago Forte.” I’m not. I’m updating the frame he built, using tools he didn’t have access to, for problems that weren’t the binding constraint when he built it. If he’s updated his own frame — and he has — then updating mine is just keeping honest.

    The harder dismissal: “My second brain works for me.” Great. Keep it. If it actually produces leverage you can measure, the article doesn’t apply to you. If you’re being defensive because you’ve invested time in something you suspect isn’t paying rent, sit with that honestly before rejecting the argument.

    The operators I most want to reach with this piece are the ones who have a working second brain but feel a quiet sense that it isn’t quite delivering what they thought it would. That feeling is signal. It’s telling you the bottleneck has moved. The system you built was right for the problem it was solving; the problem has shifted underneath it.

    Promote the archive to a utility. Build the execution layer above. Let the system drive the work instead of holding it for review. That’s the whole move.

    Thanks for reading. If this one lands for you, the rest of this body of work goes deeper into how to actually build what I’m describing. If it doesn’t, no harm — there are plenty of places to read the traditional frame, and I’m not trying to convert anyone who’s still getting value from that version.

    The point is to have the argument out loud, because most operators haven’t heard it yet, and knowing what the argument is gives you the ability to decide for yourself.


    Sources and further reading

    Related pieces from this body of work:

  • What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

    What Notion Agents Can’t Do Yet (And When to Reach for Claude Instead)

    I run both Notion Custom Agents and Claude every working day. I have opinions about when each one earns its place and when each one doesn’t. This article is those opinions, named clearly, with no vendor fingers on the scale.

    Most comparative writing about AI tools is written by people with an incentive to recommend one over the other — affiliate programs, platform partnerships, the writer’s own consulting practice specializing in one side. This piece doesn’t have that problem. I use both, I pay for both, and if one of them got replaced tomorrow, the pattern I run would survive with a different tool slotted into the same role. The tools are interchangeable. The judgment about which one to reach for is not.

    Here’s the honest map.


    The short version

    Use Notion Custom Agents when: the work is a recurring rhythm, the context lives in Notion, the output is a Notion page or database change, and you’re willing to spend credits on it running in the background.

    Use Claude when: the work needs real judgment, the context is complex or contested, the output is something that needs a human’s voice and review, or the workflow crosses enough systems that the agent’s world is too small.

    Those two sentences will save most operators ninety percent of the architecture mistakes I see people make. The rest of this article is specificity about why, because general rules only take you so far before you need to know what’s actually going on under the hood.


    Where Notion Custom Agents genuinely shine

    I’m going to start with the positive because anyone who only reads the critical part of a comparative article will walk away with a warped picture. Custom Agents are genuinely impressive when they fit the job.

    Recurring synthesis tasks across workspace data. The daily brief pattern I’ve written about works better in a Custom Agent than in Claude. The agent runs on schedule, reads the right pages, writes the synthesis back into the workspace, and is done. Claude can do this too, but Custom Agents do it without you remembering to prompt them. That’s the whole point of the “autonomous teammate” framing, and for rhythmic synthesis work, it genuinely delivers.

    Inbox triage. An agent watching a database with a clear decision tree — categorize incoming requests, assign a priority, route to the right owner — is a sweet-spot Custom Agent. It does the boring sort every day, flags the ones it’s unsure about, and keeps the pile from growing. Real teams are reportedly triaging at over 95% accuracy on inbound tickets with this pattern.

    Q&A over workspace knowledge. Agents that answer company policy questions in Slack or provide onboarding guidance for new hires are quietly some of the most valuable agents in production. They replace hours of repetitive answer-the-same-question work, and because the answers come from actual workspace content, the accuracy is high when the workspace is well-maintained.

    Database enrichment. An agent that watches for new rows in a database, looks up additional context, and fills in fields automatically is a beautiful fit. The agent is doing deterministic-adjacent work with just enough judgment to handle edge cases. This is exactly what Custom Agents were designed for.

    Autonomous reporting. Weekly sprint recaps, monthly OKR reports, Friday retrospectives. Reports that would otherwise require someone to sit down and write them, now drafted automatically from the workspace state.

    For these categories, Custom Agents are the right tool, and Claude is the wrong tool even though Claude would technically work. The wrong-tool-even-though-it-works framing matters because operators often default to Claude for everything, which is expensive in different ways.


    Where Notion Custom Agents break down

    Now the honest part. Custom Agents have real limits, and pretending otherwise is how operators get burned.

    1. Anything that requires serious reasoning across contested information

    Custom Agents are capable of synthesis, but the quality of their synthesis degrades when the inputs disagree with each other, when the right answer isn’t on the page, or when the task requires actually thinking through a problem rather than summarizing existing context.

    The signal that you’ve hit this limit: the agent produces an output that sounds plausible, reads well, and is subtly wrong. If you need to double-check every agent output in a category of work because you can’t trust the judgment, that category of work shouldn’t be going through an agent. Use Claude in a conversation where you can actually interrogate the reasoning.

    Specific examples where this shows up: strategic decisions, conflicting client feedback, legal or compliance-adjacent questions, anything that involves weighing tradeoffs. The agent will produce an answer. The answer will often be wrong in a specific way.

    2. Long-horizon work that needs to hold nuance across steps

    Custom Agents are designed for bounded tasks with clear inputs and clear outputs. When you try to use them for work that requires holding nuance across many steps — drafting a long document, executing a multi-stage strategic plan, navigating a complex workflow — the wheels come off.

    Part of this is architectural: agents have limited ability to carry state across runs in the way an extended Claude conversation can. Part of it is practical: the “one agent, one job” principle Notion itself recommends is a hard constraint, not a style guideline. When you try to make an agent do multiple things, you get an agent that does each of them worse than a single-purpose agent would.

    If the job you’re thinking about is genuinely one coherent thing that happens to have many steps, and the steps inform each other, it’s probably a Claude conversation, not a Custom Agent.

    3. Work that needs a specific human voice

    This one is more important than most operators realize. Agents write in a synthesized style. It’s a perfectly fine style. It’s also recognizable as a perfectly fine style, which is the problem.

    If the output is going to have your name on it — client communications, thought leadership, outbound that should sound like you — the agent’s default voice will flatten whatever was distinctive about your writing. You can push back on this with instructions, and good instructions help a lot. But the underlying truth is that Custom Agents optimize for “sounds like a competent business writer,” and competent business writing is a commodity. If you sell distinctiveness, the agent is a liability.

    Claude in a conversation, with your active voice-shaping, produces writing that can actually sound like you. Custom Agents optimize for a different thing.

    4. Anything requiring real-time web context

    Custom Agents can reach external tools via MCP, but they don’t have a general ability to browse the live web and integrate what they find into their reasoning. If the work requires recent news, real-time market data, or anything that isn’t in a known database the agent can query, the agent will either fail, hallucinate, or return stale information from whatever workspace snapshot it had.

    Claude — with web search enabled, with the ability to fetch arbitrary URLs, with research capabilities — handles this class of work dramatically better. The right architectural response: use Claude for anything with a live-web dependency, let Custom Agents handle the parts that don’t.

    5. Deep technical work

    Custom Agents can technically do technical work. They should mostly not be asked to. Writing code, debugging failures, analyzing logs, reasoning through system architecture — these live in Claude Code’s territory, not Custom Agents’ territory. The Custom Agent framework was built for operational workflows, and while it will attempt technical tasks, it attempts them at the quality of a generalist, not a specialist.

    The sign you’ve crossed this line: the agent is producing code or technical reasoning that a competent human reviewer would push back on. Move the work to Claude Code, which was built for exactly this.

    6. High-stakes writes with permanent consequences

    Agents execute. They don’t second-guess themselves. An agent configured to send emails will send emails. An agent configured to update client records will update client records. An agent configured to delete rows will delete rows.

    When the cost of the agent doing the wrong thing is high — sending a message you can’t unsend, overwriting data you can’t recover, triggering a payment you can’t reverse — the discipline is: don’t let the agent do it without human approval. Use “Always Ask” behavior. Use a draft-and-review pattern. Use anything that puts a human in the loop before the irreversible action.

    Operators who ship fast and iterate freely tend to underweight this category. The day you discover it’s been quietly overwriting the wrong database field for two weeks is the day you wish you’d built the review gate.

    7. Credit efficiency for genuinely reasoning-heavy work

    This one is practical rather than architectural. Starting May 4, 2026, Custom Agents run on Notion Credits at roughly $10 per 1,000 credits. Internal Notion data suggests Custom Agents run approximately 45–90 times per 1,000 credits for typical tasks — meaning tasks that require more steps, more tool calls, or more context cost proportionally more credits per run. That means simple recurring tasks are cheap. Complex reasoning-heavy tasks add up.

    If you’re building an agent that does heavy reasoning work many times per day, the credit cost can exceed what the same work would cost through Claude’s API directly, especially on higher-capability Claude models called directly without the Notion overhead. For high-frequency reasoning work, run the math before you commit to the agent architecture.


    Where Claude genuinely wins

    The other side of the honest comparison. Claude earns its place in categories where Custom Agents either can’t operate or operate poorly.

    Strategic thinking conversations. When you’re working through a decision, evaluating a tradeoff, or thinking through a strategy, Claude in an extended conversation is the right tool. The back-and-forth is the whole point. You can interrogate reasoning, push back on conclusions, reframe the problem mid-conversation. An agent that produces a one-shot answer, no matter how good, is the wrong shape for this kind of work.

    Drafting with voice. Writing that needs to sound like a specific person is Claude’s territory. You can load up Claude with context about your voice — past writing, tonal preferences, things to avoid — and get output that actually reads as yours. Notion Custom Agents will always produce generic-flavored writing. That’s fine for internal reports. It’s a problem for anything external.

    Code and technical work. Claude Code specifically is built for technical depth. It reads codebases, executes in a terminal, calls tools, iterates on failures. Custom Agents will flail at the same work.

    Research synthesis across live sources. Claude with web search and fetch capabilities handles “go read this, this, and this, and tell me what the current state actually is” in a way Custom Agents structurally can’t. Anything that requires reaching outside a known data universe is Claude.

    Work that crosses many systems. When a workflow needs to touch code, Notion, a database, an external API, and a human review, Claude Code with the right MCP servers connected coordinates across them better than a Custom Agent inside Notion does. The agent’s world is Notion-plus-connected-integrations. Claude’s world is wider.

    Anything requiring judgment about whether to proceed. Agents execute. Claude in a conversation can pause, check with you, and ask “should I actually do this?” That judgment layer is frequently the most important part of the workflow.


    The pattern that actually works (both, in the right places)

    The operators who get this right aren’t choosing one tool over the other. They’re running both, in specific roles, with clear handoffs.

    The pattern I run:

    Rhythmic operational work lives in Custom Agents. Morning briefs, triage, weekly reviews, database enrichment, Q&A over workspace knowledge. Things that happen repeatedly, have clear inputs, and produce workspace-shaped outputs.

    Judgment-heavy work lives in Claude conversations. Strategic decisions, drafting with voice, research, anything requiring back-and-forth. I do this work in Claude chat sessions with the Notion MCP wired in, so Claude has real context when I need it to.

    Technical work lives in Claude Code. Building scripts, managing infrastructure, debugging, writing code. Custom Agents don’t touch this.

    Handoffs are explicit. When I make a decision in Claude that needs to become operational, it lands as a task or brief in a Notion database, and from there a Custom Agent can pick it up. When a Custom Agent surfaces something that needs judgment, it creates an escalation entry that shows up on my Control Center, where I engage Claude to think through it.

    The two systems pass work back and forth through the workspace. Neither tries to do the other’s job. The seams are the Notion databases where state lives.

    This is not the vendor-shaped pattern. The vendor-shaped pattern says “Custom Agents can handle everything.” The operator-shaped pattern says “Custom Agents handle what they’re good at, and when the work exceeds their reach, another tool takes over with a clean handoff.”


    The decision tree, when you’re not sure

    For a specific piece of work, run these questions in order. Stop at the first “yes.”

    Does this task need a specific human voice, or could it be written by any competent person? If it needs your voice, reach for Claude. If it doesn’t, move on.

    Does this task require reasoning across contested or ambiguous information? If yes, Claude. If no, move on.

    Does this task need real-time web context, live external data, or information not already in a known database? If yes, Claude. If no, move on.

    Does this task involve code, system architecture, or technical depth? If yes, Claude Code. If no, move on.

    Does this task have high-stakes irreversible consequences? If yes, wrap it in a human-approval gate — either run it through Claude where the human is in the loop, or use Custom Agents with “Always Ask” behavior.

    Does this task happen repeatedly on a schedule or in response to workspace events? If yes, Custom Agent. This is the sweet spot.

    Is the output a Notion page, database row, or something that stays in the workspace? If yes, Custom Agent is usually the right call.

    Is the task bounded enough that it could be described in a couple of clear sentences? If yes, Custom Agent. If it’s sprawling, it’s probably too big for an agent.

    If you’re through the tree and still not sure, default to Claude. Claude is more expensive in money and cheaper in hidden cost than a Custom Agent running the wrong job.


    The failure modes I’ve seen

    Specific patterns that go wrong, in my observation:

    The “agent for everything” operator. Someone who just got access to Custom Agents and is building agents for tasks that don’t need agents. The agents mostly work. The ones that mostly work waste credits on tasks a template or a simple automation would handle. The ones that partially work produce quiet low-grade mistakes that accumulate.

    The “Claude for everything” operator. The inverse. Someone who got comfortable with Claude and hasn’t made the leap to letting agents handle the rhythmic work. They’re paying the context-loss tax every morning, doing the triage manually, writing every brief from scratch. Claude is too expensive a tool — in attention, if not dollars — to run routine work through.

    The operator who built one giant agent. Custom Agents are meant to be narrow. Someone violates the “one agent, one job” principle by building an agent that does inbox triage and database updates and weekly reports and client communications. The agent becomes hard to debug, expensive to run, and unreliable across its many hats. The fix is almost always breaking it into three or four single-purpose agents.

    The operator who didn’t build review gates. An agent sending emails without human approval. An agent deleting rows based on inferred criteria. An agent updating client-facing pages from an unchecked data source. The cost of the first real mistake exceeds the cost of the review gate that would have prevented it, every time.

    The operator who never checked credit consumption. Custom Agents consume credits based on model, steps, and context size. An operator who built ten agents and never looked at the dashboard ends up surprised when the monthly bill is much higher than expected. The fix is easy — Notion ships a credits dashboard — but it has to actually get checked.


    The timing honest note

    A piece of this article that ages. These comparisons are true in April 2026. Custom Agents are new enough that the feature set will expand significantly over the next year. Claude is evolving rapidly. The specific gaps I’ve named may close; new gaps may open in different directions.

    What won’t change is the pattern: some work wants a specialized tool, some work wants a general-purpose one. Some work is rhythmic, some is judgment-driven. Some work lives inside a workspace, some crosses systems. The vocabulary for when to use which tool will evolve; the underlying truth that different shapes of work deserve different tools will not.

    If you’re reading this in 2027 and Custom Agents have shipped fifteen new capabilities, the specific “can’t do” list will be shorter. The decision tree at the top of this article will still work. That’s the part worth holding onto.


    What I’m not saying

    A few clarifications because I want to be clear about what this article is and isn’t.

    I’m not saying Custom Agents are bad. They’re genuinely good at what they’re good at. They’re saving me hours per week on work I used to do manually.

    I’m not saying Claude is strictly better. Claude is more capable at a broader set of tasks, but it also costs more, requires active operator engagement, and can’t sit in the background running overnight rhythms the way Custom Agents can.

    I’m not saying there’s one right answer for every operator. Different operators with different businesses and different workflows will land on different splits. The decision tree helps, but it’s a starting point, not a conclusion.

    I’m not saying this is permanent. Tool landscapes change fast. Six months from now there may be categories where Custom Agents beat Claude that don’t exist today, and vice versa. What matters is developing the habit of asking “which tool is this work actually shaped for?” instead of defaulting to whichever one you learned first.


    The one thing I’d want you to walk away with

    If you read nothing else in this article, this is the sentence I’d want in your head:

    Rhythmic operational work wants an agent; judgment-heavy work wants a conversation.

    That distinction — rhythm versus judgment — cuts through almost every architecture question you’ll have when deciding what to route where. It’s not the only dimension that matters, but it’s the one that settles the most decisions correctly.

    Work that happens on a schedule or in response to an event, with bounded inputs and clear outputs? That’s rhythm. Build a Custom Agent.

    Work that requires thinking through tradeoffs, integrating disparate information, or producing output with specific voice and judgment? That’s a conversation. Engage Claude.

    Get that right for most of your workflows and the rest of the architecture tends to sort itself out.


    FAQ

    Can’t Custom Agents do everything Claude can do, just inside Notion? No. Custom Agents are optimized for bounded, rhythmic, workspace-shaped tasks. They can technically attempt work that requires deep reasoning, specific voice, or live external context, but the results degrade in predictable ways. Claude — in a conversation or in Claude Code — handles those categories better.

    Should I just use Claude for everything then? No. Rhythmic operational work — morning briefs, triage, weekly reports, database enrichment — is genuinely better in Custom Agents than in Claude, because the “autonomous teammate running while you sleep” property matters. The right answer is running both, in their respective sweet spots.

    What’s the cost comparison? Starting May 4, 2026, Custom Agents cost roughly $10 per 1,000 Notion Credits. Internal Notion data suggests agents run approximately 45–90 times per 1,000 credits depending on task complexity. Claude’s subscription pricing is flat. For high-frequency simple tasks, Custom Agents are usually cheaper. For heavy reasoning work done many times per day, running Claude directly can be more cost-efficient.

    What about Notion Agent (the personal one) versus Claude? Notion Agent is Notion’s on-demand personal AI — you prompt it, it responds. It’s fine for in-workspace tasks where you need AI help with content you’re already looking at. For deeper reasoning, complex drafting, or cross-tool work, Claude is more capable. Notion Agent is a good ambient utility; Claude is a general-purpose intelligence layer.

    Which should I learn first if I’m new to both? Claude. Learn to think with an AI as a thinking partner before you try to build autonomous agents. Once you understand what AI can and can’t do in a conversation, the design decisions for Custom Agents become much clearer. Jumping to Custom Agents without the Claude foundation is how operators end up with agents that don’t work as expected.

    Can Custom Agents use Claude models? Yes. Custom Agents let you pick the AI model they run on. Claude Sonnet and Claude Opus are both available, along with GPT-5 and various other models. This means the underlying intelligence of a Custom Agent can be Claude — you’re choosing between Claude-as-conversation (claude.ai, Claude Desktop, Claude Code) and Claude-as-embedded-agent (Custom Agent running Claude). Different interfaces, same underlying model in that case.

    What if I want Claude to work autonomously on a schedule like Custom Agents do? Possible, but requires more work. Claude Code can be scripted; you can run it on a cron job; you can set up headless workflows. But the “out of the box autonomous teammate” experience is Notion’s current strength, not Anthropic’s. If you want autonomous-background-work without building your own infrastructure, Custom Agents are easier.

    How do I decide for my specific situation? Run the decision tree in the article. If you’re still unsure, default to Claude — it’s the more general-purpose tool, and the cost of using the wrong tool for judgment-heavy work is higher than the cost of using the wrong tool for rhythmic work. You can always migrate a recurring workflow to a Custom Agent once you understand the shape.


    Closing note

    The honest comparison isn’t one tool versus the other. It’s understanding that different shapes of work want different shapes of tool, and that most operators lose more time to the mismatch than to any individual tool’s limitations.

    Custom Agents are good at being Custom Agents. Claude is good at being Claude. Neither is good at being the other. Use both, in the places each belongs, with clean handoffs between them, and the stack hums.

    Skip the vendor narratives. Read your own workflows. Route each piece to the tool it’s actually shaped for. That’s the whole game.


    Sources and further reading

    Related Tygart Media pieces:

  • The Agency Stack in 2026: Notion + Claude + One Human

    The Agency Stack in 2026: Notion + Claude + One Human

    I’m going to describe the stack I actually run, and then I’m going to tell you honestly whether you should copy it.

    Most writing about “AI agencies” in April 2026 is either pitch deck vapor or hedged-everything consultant speak — pieces that tell you “AI is transforming agencies” without telling you which tools, which workflows, which tradeoffs. This article is the opposite. I’m going to name specifics. I’m going to say what’s working. I’m going to say what isn’t. I’m going to skip the part where I pretend this is a solved problem, because it isn’t, and pretending is how operators who listened to the pitch deck end up eighteen months into a rebuild.

    The stack that follows is what a real, paying-bills agency runs to manage dozens of active properties, real client relationships, and a content production operation that ships every day — with one human in the operator chair. It is not hypothetical. It is also not recommended for everyone, which is the part most of these articles leave out.

    Here’s the real version. You can decide whether it’s for you when we get to the bottom.


    The one-line version of the stack

    Notion is the control plane. Claude is the intelligence layer. A handful of operational services run the work. One human makes the calls.

    That’s it. That’s the whole stack at the summary level. Everything that follows is specificity about what each of those pieces does, why it’s there, and what happens when you try to run a real business through it.

    The four pieces are load-bearing in different ways. Notion holds the state of the business — what’s happening, what’s decided, what’s next. Claude provides the judgment and the synthesis when judgment is needed. The operational services (publishers, research tools, deployment pipelines) do the deterministic work that judgment shouldn’t be wasted on. The human reads, decides, approves, and occasionally gets out of the way.

    Fifteen years ago the same agency would have needed forty people. Ten years ago it would have needed twenty. Five years ago it would have needed eight. In April 2026 it needs one human plus the stack. That’s the thesis. The question is whether you can actually run it that way.


    What “AI-native” actually means in this context

    The phrase “AI-native” has been worn out enough that I need to be specific about what I mean.

    AI-native doesn’t mean “uses AI tools.” Every agency uses AI tools. Every freelancer uses AI tools. That bar is on the floor.

    AI-native means the operating model of the business assumes AI is a teammate, not a productivity tool. AI is in the loop on strategic thinking. AI is reading the state of the workspace and synthesizing it. AI is drafting, reviewing, triaging, and sometimes deciding — with human oversight, but as a continuous participant, not an occasional assistant you turn to when you get stuck.

    The practical difference: an agency that uses AI tools works the way agencies have always worked, but with ChatGPT open in a tab. An AI-native agency has rebuilt its workflows around the assumption that there’s a persistent intelligence layer in the substrate of the business.

    The stack below is what the second version looks like when you commit to it.


    The control plane: Notion

    Notion is where I live during the working day. Not where I put things when I’m done with them — where I actually do the work.

    The workspace is organized around the Control Center pattern I’ve written about before. A single root page that surfaces the live state of the business: what’s on fire today, what’s progressing, what’s waiting on me, what the week’s focus is. Under it sits a database spine that maps to the actual operational objects — properties, clients, projects, briefs, drafts, published work, decisions, open loops. Each database answers a specific question someone running the business would ask regularly.

    Every meaningful page in the workspace has a small JSON metadata block at the top — page type, status, summary, last updated. That metadata block is for the AI, not for me. It lets Claude read the state of a page in a hundred tokens instead of three thousand. Across a workspace of thousands of pages, the compounding context savings are enormous, and it changes what Claude can realistically see in a session.

    The workspace is sharded deliberately. The master context index lives as a small router page that points to larger domain-specific shards. When Claude needs to reason about a specific area of the business, it fetches the shard for that area. When it needs the whole picture, it fetches the router. This is not a product feature anyone has written about — it’s a pattern I arrived at after the main index page got too large to fit into Claude’s context window without truncation. It works. It’s probably what a lot of operators will end up doing.

    What Notion is great at: holding operational state, being legible to both humans and AI, letting you traverse the business by asking questions of the workspace rather than navigating folders, integrating cleanly with Claude via MCP, running background rhythms through Custom Agents.

    What Notion is not great at: being a database in the performance sense (anything heavy goes somewhere else), being the source of truth for code (version control is), being the source of truth for financial transactions (a real accounting system is), being reliable as the only source for anything mission-critical (it has an outage SLA, not an uptime guarantee).

    The rule I follow: Notion holds the operating company. It does not hold the substrate the operating company depends on. That distinction is what keeps the pattern stable.


    The intelligence layer: Claude

    Claude is the AI I actually run the business with. Not because Claude is strictly better than the alternatives at every task — at this point in 2026 the frontier models are all highly capable — but because Claude’s design posture matches what an operator actually needs.

    Specifically: Claude is thoughtful about uncertainty, tells me when it doesn’t know, asks for clarification instead of fabricating, and has a deep integration with Notion via MCP that makes the workspace-and-AI pattern actually work. Those qualities are worth more to me than any single-task benchmark. An AI that sometimes gets things wrong but tells me when it’s uncertain is far more useful than an AI that confidently hallucinates.

    The intelligence layer shows up in three configurations:

    Chat Claude — what I use for strategic thinking, drafting, review, and synthesis. A conversation on claude.ai or the desktop app with the Notion MCP wired in, so Claude can reach into the workspace to ground its answers in real context. This is where the high-judgment work happens. When I’m making a decision, I work through it in a Claude conversation before I commit to it.

    Claude Code — the terminal-based version that lives at the intersection of code and agent. This is where the more technical work happens — building publishers, writing scripts, managing infrastructure, executing multi-step workflows that touch multiple systems. Claude Code reads my codebase, reaches into Notion when it needs to, calls external services through MCP, and writes back run reports.

    Notion’s in-workspace AI (Custom Agents and Notion Agent) — the on-demand and autonomous agents that live inside Notion itself. These handle the rhythms: the daily brief that’s written before I wake up, the triage agent that sorts whatever lands in the inbox, the weekly review that gets drafted on Friday. I didn’t build these to be clever. I built them because I was doing the same small synthesis tasks over and over, and Custom Agents let me stop.

    Three configurations, three different jobs. Each one’s strengths map to a different kind of work. Together they cover the whole territory.

    What Claude is great at: synthesis across real context, drafting with judgment, reasoning through decisions, catching inconsistencies in my thinking, executing defined workflows with honest failure modes.

    What Claude is not great at: being the last line of defense on anything (always have a human gate), handling workflows where one error compounds (use deterministic tools for those), long-horizon autonomy without oversight (agents drift, supervise accordingly), making decisions that require context it doesn’t have access to.

    The mental model I use: Claude is a thoughtful senior teammate who happens to be infinitely patient and always awake. That framing gets the relationship right. Over-rely on it and you get hurt. Under-rely on it and you’ve hired a senior teammate and asked them to run errands.


    The operational services: the things that do the work

    The third layer is the part most agency-AI writeups skip, because it’s unglamorous. It’s the set of operational services that do the actual deterministic work. Publishing. Research. Deployment. Monitoring. The stuff that shouldn’t require judgment once you’ve set it up correctly.

    I’m going to describe the shape without naming specific tools, because the shape is what’s durable and the specific tools will change.

    Publishers — services that take content prepared upstream and push it to the properties where it needs to live. WordPress for editorial content, social media scheduling for distribution, email tools for outbound. The publisher’s job is to execute reliably and log honestly. When it fails, it fails loudly enough that I notice.

    Research infrastructure — services that pull structured data about keywords, competitors, search volumes, backlink profiles, and so on. This is where AI-native agencies diverge most sharply from traditional ones. Traditional agencies do research manually. AI-native agencies run research as a pipeline: the structured data comes in, gets processed, and lands in the workspace as briefs and intelligence reports that the human and the AI both read.

    Background pipelines — the scheduled services that keep the workspace fresh. New briefs get generated. Stale content gets flagged. Traffic data gets ingested. The kinds of things that an agency would traditionally ask a human to do on a weekly rhythm, running autonomously in the background.

    Deployment and monitoring — how the technical side ships. Version control holds the source of truth. Deployments run on triggers. When something breaks, it breaks to a channel I actually read.

    The principle that holds all of this together: deterministic work belongs in deterministic systems. Don’t use an AI agent to do something a script can do. An AI agent adds judgment, which is valuable when you need judgment, and costly when you don’t. The operational services do the work that has a right answer every time. The AI handles the work that requires judgment.

    Most agency-AI failures I’ve watched happen are cases where someone tried to use an AI agent for the deterministic work. The agent mostly succeeds, occasionally hallucinates, and introduces a class of silent failure that didn’t exist in the deterministic version. It feels like you’re being clever. You’re introducing unreliability.


    The one human in the chair

    This is the part the vendor writeups never include, and it’s the most important piece.

    There is one human in the operator chair. That human is non-optional. Every workflow, every agent, every pipeline eventually terminates at a human decision or a human review gate. The AI stack does not run the business. The AI stack is a lever that makes one human capable of running what used to take many.

    What the human does in this configuration is different from what they would have done in a traditional agency. The human is not writing every post. The human is not doing every bit of research. The human is not executing every workflow. The human is:

    Setting the posture. What are we working on this week? What’s the priority? What’s the theme? The AI is exceptional at executing against clarity. It is not exceptional at deciding what to be clear about.

    Reading the synthesis. The AI surfaces what matters. The human decides what to do about it. Every morning brief, every weekly review, every escalation flags lands in front of the human, who makes the call.

    Making the judgment calls. When a client needs a difficult conversation. When a strategy needs to change. When something the AI suggested is actually wrong. These are the moments the AI can’t be left alone with. The operator role is increasingly concentrated around exactly these moments.

    Holding the relationships. Clients don’t want to talk to an AI. They want to talk to a human who happens to be very well-supported by AI. The difference matters enormously in trust, tone, and staying power of the engagement.

    Maintaining the stack itself. The stack doesn’t maintain itself. Every week there are small adjustments, small rewirings, small improvements. The operator is also the architect of the operating company, and the architecture is a living thing.

    A person who thought they were buying “AI that runs my agency for me” is going to be disappointed. A person who understood they were buying “a lever that makes them ten times more effective at the parts of agency work that actually matter” is going to be delighted. The difference is what you think you’re getting.


    The daily rhythm (what it actually looks like)

    Let me describe a real working day in this stack, because the abstract description doesn’t convey what using it feels like.

    Morning. I open Notion. The Morning Brief Agent ran overnight; the top of today’s Daily page already has a three-paragraph synthesis of the state of the business, pulled from the active projects, the task database, yesterday’s run reports, and the overnight changes. I read it in ninety seconds. I know what’s on fire, what’s progressing, what’s waiting on me. The context tax that used to cost me the first hour of every day is already paid.

    Morning block. I work through the highest-leverage thing on the day’s priority list. If it’s strategic, I work through it in a Claude conversation with the Notion workspace wired in, because grounding the AI in real context produces dramatically better thinking than working in isolation. If it’s technical, I work in Claude Code, because the terminal version handles multi-step technical work better. Either way, I’m working with the AI as a thinking partner, not a tool I reach for occasionally.

    Mid-day. The triage agent has processed whatever landed in the inbox. I scan its decisions, override the ones I disagree with, and dispatch anything important into its real database. The escalation agent has flagged the three things that need my attention today. I make the calls. These are the moments the stack needs a human for — no amount of clever configuration replaces them.

    Afternoon block. Content operations. Research intelligence lands as structured data in the workspace. Briefs get drafted. I review them. Approved briefs flow to the publishing pipeline. The pipeline runs, logs back to the workspace, and I get notified of anything that failed. I don’t write every post. I write the ones where my voice specifically matters, and I review the rest. The ratio is maybe one in ten that I write from scratch these days.

    Evening. Five minutes of close. Anything that didn’t get done gets re-dated. Tomorrow’s priority list pre-stages. I close Notion. The overnight agents will handle the rhythms while I sleep.

    That’s the day. It is dramatically different from running a traditional agency, and dramatically more sustainable. The cognitive load is substantially lower even while the operational throughput is substantially higher. That’s the whole promise of the pattern, and it’s the part that’s real.


    What this stack actually costs (and doesn’t)

    The direct tool costs for the stack in April 2026, at the level I run it:

    • Notion Business plan with AI add-on
    • Claude subscription (Max tier for the agent budget)
    • A cloud provider account for the operational services (running pennies to small dollars per day at my volume)
    • A handful of research and analysis tool subscriptions
    • Domain, email, and the usual small-business infrastructure

    Total monthly direct tool cost is the equivalent of what a traditional agency would spend on a single junior employee’s salary for one week. The leverage ratio is extreme, and it will get more extreme.

    What it costs that isn’t money:

    • Setup time. Weeks to stand up the initial version, months to iterate it into something that runs smoothly. This is not a weekend project.
    • Ongoing attention to the stack itself. Maybe ten percent of my week is spent on the operating company rather than on client work. That ratio is load-bearing; if I let it go below that, the stack rots.
    • Discipline about not adding cleverness. Every new tool, every new agent, every new integration is a tax on the coherence of the system. Most weeks I’m resisting the urge to add something, not looking for something to add.
    • Loneliness of the role. One-human agencies are lonely. You don’t have a team meeting. You don’t have a coffee conversation with a coworker. The stack is not a substitute for colleagues. This is the part nobody writes about and it’s genuinely significant.

    What this stack is not good for

    If I’m being honest about who should not run this pattern, it includes:

    Agencies that want to scale headcount. This stack is designed to make one human capable of more. It’s not designed to coordinate ten humans. A ten-person agency on this stack would have chaos problems I haven’t solved.

    Businesses where the work is primarily relational. Sales-heavy businesses, high-touch consulting, therapy practices. The stack is strong at operational and production work. It is weak at anything where the work is fundamentally “I am present with this other person.”

    Anyone uncomfortable with AI making meaningful decisions. The stack assumes you’re willing to let AI make decisions that have real consequences — triage, synthesis, drafting under your name. If that crosses your line philosophically, don’t force it. The stack won’t be fun for you.

    People looking for a plug-and-play system. This is a living architecture. It requires ongoing maintenance. It never stops being built. If you want something that works out of the box and stays working, buy software; don’t build an operating company.

    Early-stage businesses without a clear shape yet. The stack rewards clarity about what your business is. If you’re still figuring that out, the stack will accelerate whatever direction you’re going — which is great if the direction is right and brutal if it isn’t. Figure out the direction first, then build the stack.


    Who this stack is good for

    The operators I’ve seen get the most out of this pattern share a specific profile:

    • Running businesses with high operational complexity but small team size. Multi-property content operations, advisory practices, specialist agencies. The kind of business where one capable person with leverage beats a team without it.
    • Comfortable with systems thinking. The stack rewards people who think in terms of flows, interfaces, and substrates. If that vocabulary feels alien, the stack will feel alien.
    • Honest about what they’re good at and what they aren’t. The stack amplifies the operator. If the operator is strong at strategy and weak at execution, the stack handles the execution. If the operator is strong at execution and weak at strategy, the stack does not magically produce strategy. Know which version you are.
    • Willing to maintain the architecture. The stack is a long commitment to the operating company, not a one-time setup. Operators who enjoy tending the system do well. Operators who resent tending the system should not run it.

    If you recognize yourself in the good-fit list and not the bad-fit list, this pattern is probably worth the investment. If you’re on the fence, it probably isn’t yet — come back when the decision is clearer.


    The part I want to be brave about

    Here’s the part this article is supposed to be honest about.

    This pattern works for me. It might not work for you. The vendor-shaped narrative says every business should be AI-native, every agency should be running this stack, every operator should be ten times leveraged. That narrative is wrong. It’s wrong in the boring, everyday way that industry narratives are always wrong: it oversells, it under-discloses the costs, and it creates an expectation gap that a lot of operators are going to run into eighteen months from now.

    The accurate narrative is this: for a specific kind of operator running a specific kind of business, this stack produces a kind of leverage that was not previously available. For everyone else, it’s a distraction from what they should actually be doing, which is the hard work of their specific business with the tools that fit their specific situation.

    I am describing what I run because I think honest examples are more useful than vague generalities. I am not recommending you run it. I am recommending you look at your actual business, your actual operating constraints, and your actual relationship with AI tools, and decide whether a version of this pattern — adapted, simplified, or rejected — makes sense for you.

    There’s a version of this article that promises that if you copy my stack, you’ll get my outcomes. That article is lying to you. The outcomes come from matching the stack to the business, not from the stack itself.

    If you read this and it resonates, take the pieces that apply. If you read this and it doesn’t, take what you learned about what’s possible and leave the rest. Either response is correct.


    The five things I’d tell someone thinking about building something like this

    Start with the Control Center, not the agents. The Control Center is the anchor everything else builds against. If you build agents before you have the Control Center, the agents have nothing to write to. Build the workspace shape first. The rest follows.

    Resist the urge to add complexity. The operators who succeed with this pattern run simpler versions than they could. The operators who fail run more elaborate versions than they need. Every piece of the stack should be earning its place every week.

    Write everything down as you go. The operating company is a living architecture. Six months from now you will have forgotten why you made a specific configuration choice. Document the choices in the workspace as you make them. Future-you will thank present-you.

    Don’t over-trust the AI. It’s a teammate, not an oracle. It’s wrong sometimes. It’s confident when it shouldn’t be sometimes. Build review gates. Assume failure. The stack is resilient when you don’t assume otherwise.

    Accept that you are building an operating company, not deploying software. This is a long game. It doesn’t work in the first week. It starts working in the second month. It starts compounding in the sixth month. If you’re not willing to tend it for that long, don’t start.


    A closing observation

    I’ve been running variations of this stack for long enough to have opinions that don’t match what I thought I believed when I started. The biggest surprise has been how much of the work is operational hygiene rather than AI cleverness. Building an agent was the easy part. Running an agency on the operating company pattern has mostly been a discipline problem — staying consistent about metadata, about documentation, about review gates, about when to let the AI decide and when to intervene.

    The AI is not the interesting part anymore. The interesting part is the operating model the AI makes possible. That’s the part this article has tried to describe honestly, and that’s the part worth thinking about if you’re considering something similar.

    If you do build a version of this, I’d genuinely like to hear how it turns out. The frontier here is being figured out by operators sharing what works and doesn’t, and every honest report makes the next person’s build better. This is my report. I hope it helps.


    FAQ

    Can I run this stack solo? Yes. The stack is explicitly designed for solo operators or very small teams. One-human operation is the whole point. Multi-person teams work too but introduce coordination complexity the pattern doesn’t directly solve.

    How long does it take to build? The minimum viable version — Control Center, a handful of databases, one Custom Agent, Claude wired in — is a week of part-time work. The version that actually earns its place takes two to three months of iteration. It never stops getting built; it compounds over time.

    Do I need to know how to code? For the minimum viable version, no. Notion + Claude + Notion Custom Agents gets you a long way without writing code. For the operational services layer, some technical comfort is needed or you’ll need a technical collaborator. Claude Code dramatically lowers the bar here.

    What if Notion gets replaced by a competitor? The pattern survives. The Control Center, the database spine, the metadata discipline, the workspace-as-control-plane posture — all of those port to any capable workspace tool. If something displaces Notion in 2027, the migration is real work but the operating model is durable. The durable asset is the pattern, not the specific tool.

    What if Claude gets replaced by a competitor? Also fine. The pattern assumes there’s an intelligence layer wired into the workspace; Claude is the current implementation of that layer. If another frontier model becomes more suitable, swap it. The MCP standard that connects everything is model-agnostic. This is deliberate.

    Can I use ChatGPT or another AI instead of Claude? Mostly yes. The MCP-to-Notion pattern works with any AI that supports MCP, including ChatGPT, Cursor, and others. I use Claude for the reasons described above, but the stack pattern is compatible with other frontier models. Don’t let tool preferences get in the way of the architecture.

    How much does this cost to run? The tool subscription stack costs roughly what one junior employee’s weekly salary would cost per month, total. The non-monetary costs (setup time, maintenance attention, lifestyle tradeoffs of solo operation) are more significant and worth thinking about before committing.

    Is this sustainable for a growing business? Yes, up to a point. The pattern scales smoothly to a certain operational volume per human. Beyond that, you need more humans, and coordinating multiple humans on this stack introduces problems that the solo version doesn’t have. Most operators hit the natural ceiling before they hit the growth limit.


    Sources and further reading

    Related reading from the broader ecosystem:

  • The Soda Machine Thesis: A Mental Model for Running an AI-Native Business on Notion

    The Soda Machine Thesis: A Mental Model for Running an AI-Native Business on Notion

    The hardest part of running an AI-native business on Notion in 2026 isn’t the tools. The tools are fine. The tools ship regularly and they work. The hard part is that the vocabulary hasn’t caught up with the reality, and when the vocabulary is wrong, your design choices get wrong too.

    Here’s what I mean. When I started seriously composing Workers, Agents, and Triggers in Notion, I found I was making the same kinds of mistakes over and over. Building a worker for something an agent could have handled with good instructions. Attaching five tools to an agent that only needed two. Setting up a scheduled trigger for something that should have fired on an event. After the third or fourth time, I realized the mistakes had a common source: I didn’t have a mental model for when to reach for which piece.

    Notion doesn’t give you one. The documentation is accurate but it’s a list of capabilities. Vendor-shaped — here is what Custom Agents can do, here is what Workers do, here are your trigger types. All true. All useless for the question I actually had, which was given a job I want done, which piece do I build?

    So I made a mental model. It’s imperfect and it’s mine, but it has survived a few months of real use and it has saved me from a dozen architecture mistakes I would have otherwise made. This article is the model.

    I call it the Soda Machine Thesis. It might sound silly. It works.


    The core analogy

    Workers are syrups. Agents are soda fountain machines. Triggers are how the machine dispenses.

    When someone asks for a custom soda fountain — a Custom Agent — three decisions get made, in order:

    1. Which syrups (workers and tools) load into this machine? What capabilities does it need access to? What external services does it need to reach? What deterministic operations does it need to perform?
    1. How is the machine programmed? What are its instructions? What’s its job description? How does it think about what it’s doing? (This is the part where agents diverge most — two machines with identical syrups behave completely differently based on instructions.)
    1. How does it dispense? Does it pour when someone presses a button (manual trigger)? Does it pour on a schedule (timer)? Does it pour when the environment changes — a page gets created, a status flips, a comment gets added (event sensor)?

    That’s the whole model. Three questions, in that order. If you can answer all three cleanly, you have a working agent. If you can’t answer one of them, you have an agent that is going to produce noise and frustrate you.

    I have watched this analogy clarify a dozen conversations that were going nowhere. “I want an agent that…” — and then I ask the three questions, and halfway through the answers it becomes obvious what the person actually wants is a simpler thing. Sometimes they don’t need an agent at all, they need a template with a database automation. Sometimes they need a worker, not an agent. Sometimes they need an agent with zero workers and better instructions.

    The analogy does real work. That’s the whole point of a mental model.


    Where the analogy holds

    The map is cleaner than you’d expect.

    Workers are syrups. Stateless, parameterized, reusable. The same worker — fetch-url, summarize, post-to-channel, whatever — can power a dozen agents. You build it once, you use it everywhere. A worker that sends an email works the same way whether it’s being called by a triage agent, a brief-writer, or a customer-response agent. That’s what syrup means: the ingredient doesn’t care which drink it’s going into.

    Agents are machines. They select, sequence, and orchestrate. An agent knows when to reach for which worker. An agent knows what the job is and reasons about how to do it. An agent can read a database, synthesize what it finds, reach for a tool to do a specific deterministic step, synthesize again, and return a result. An agent is a little piece of judgment on top of a set of capabilities.

    Triggers are how the machine dispenses. This is the cleanest part of the map because Notion’s own trigger types map almost 1:1 onto the analogy:

    • Button press or @mention → manual dispatch (“I’m pressing the button for a Coke”)
    • Schedule → timer (“pour me a drink at 7am every day”)
    • Database event → sensor (“someone just put a cup under the dispenser; fill it”)

    You don’t need to memorize trigger type names. You need to ask “how should this machine know it’s time to pour?” Once you know the answer, the trigger type follows.


    Where the analogy leaks (and what to do about it)

    No analogy is perfect. This one has four honest leaks that are worth knowing before you rely on the model.

    1. Agents have native hands, not just syrups

    A Custom Agent can read pages, search the workspace, write to databases, and send notifications without a single worker attached. Workers are specialty syrups for the things the base machine can’t do natively — external APIs, deterministic writes to strict database schemas, code execution, anything requiring exact outputs every time.

    This means not every agent needs workers. In fact, my highest-leverage agents often have zero workers. They use the base machine’s native capabilities, combined with strong instructions, to do the job.

    The practical consequence: don’t reach for a worker reflexively. Start by asking what the agent can do with just its native hands and good instructions. Only add workers when the agent genuinely needs capability it doesn’t have.

    2. Machine programming matters as much as syrup selection

    The instructions you give an agent — its system prompt, its job description, its operating rules — are doing as much work as the workers you attach. Two agents with identical workers will behave completely differently based on how they’re instructed.

    People tend to under-invest here. They attach five workers, write three sentences of instruction, and wonder why the agent is flaky. The fix is not more workers. The fix is writing instructions the way you’d write onboarding docs for a new employee — specific, scoped, honest about edge cases, clear about what the agent should do when it’s uncertain.

    My rule: if I’m about to attach a worker because the agent “keeps getting it wrong,” I first check whether better instructions would fix the problem. Nine times out of ten they would.

    3. Workers aren’t a single thing

    This is the leak that surprised me when I learned it. There are actually three kinds of worker, and they behave differently:

    • Tools — on-demand capabilities. The classic syrup. An agent calls them when it needs them. Example: a worker that fetches a URL and returns the text.
    • Syncs — background data pipelines that run on a schedule and write to a database. Not dispensed by an agent. These are more like an ice maker — they run on the infrastructure, filling the building up, and the machines use what the ice maker produces.
    • Automations — event handlers that fire when something happens in the workspace. Like a building’s fire suppression — nobody’s pressing a button; the environment triggers it.

    This matters because syncs and automations don’t need an agent to dispatch them. They run autonomously. If you’re building something that feeds a database on a schedule, that’s a sync, not a tool, and it doesn’t need an agent. If you’re building something that reacts to a page being updated, that’s an automation, not a tool.

    Getting this wrong is one of the most common architecture mistakes. People build an agent to dispatch a sync because they think everything has to flow through an agent. It doesn’t. Let the infrastructure do the infrastructure’s job.

    4. Determinism vs. judgment is the design axis

    The thing the soda analogy doesn’t capture well is that workers and agents are not just interchangeable building blocks. They serve fundamentally different purposes:

    • Workers shine when you want deterministic behavior. Same input, same output, every time. Schema-strict writes. External API calls where the shape of the request and response are fixed.
    • Agents shine when you want judgment, composition, and natural-language reasoning. Variable inputs. Fuzzy requirements. Synthesis across multiple sources.

    The red flag: building a worker for something an agent could do reliably with good instructions. You’re over-engineering.

    The green flag: an agent keeps being flaky at a specific operation. Harden that operation into a worker. Now the agent handles the judgment part, and the worker handles the reliable part.


    The “should this be a worker?” test

    When I’m trying to decide whether to build a worker or let an agent handle something, I run a five-point checklist. If two or more are true, build a worker. If fewer than two are true, stay manual or solve it with agent instructions.

    1. You’ve done the manual thing three or more times. The third time is the signal. First time is discovery, second time is coincidence, third time is a pattern worth capturing.
    2. The steps are stable. If you’re still figuring out how to do the thing, don’t codify it yet. You’ll codify the wrong version and have to rewrite.
    3. You need deterministic schema compliance. Writes that must fit a database schema exactly are worker territory. Agents can write to databases, but if the schema has strict requirements, a worker is more reliable.
    4. You’re calling an external service Notion can’t reach natively. This is often the clearest signal. If it’s outside Notion and needs to be reached programmatically, it’s a worker.
    5. The judgment required is minimal or already encoded in rules. If the decisions are simple enough to express as code, a worker is fine. If the decisions need real reasoning, it’s agent territory.

    This test is not a strict algorithm. It’s a gut-check that catches the most common over-engineering mistakes before they happen.


    The roles matter more than the technology

    Here’s the extension of the analogy that actually made the whole thing click for me.

    Every construction project has four roles. The Soda Machine Thesis as I originally described it has three of them. The one I hadn’t named — and the one you’re probably missing in your own workspace — is the Architect.

    Construction roleYour system
    Owner / DeveloperThe human in the chair. Commissions work, approves output, holds the keys.
    ArchitectThe AI-in-conversation. Claude, Notion Agent in chat, whatever model you’re actively designing with.
    General ContractorA Custom Agent running in production.
    SubcontractorA Worker. Called in for specialty work.

    The distinction that matters: the Architect and the General Contractor are the same technology, playing different roles. When you’re chatting with a model about how to design a system, that model is acting as Architect — designing the thing before it gets built. When a Custom Agent runs autonomously against your databases overnight, it’s acting as General Contractor — executing the design.

    Same underlying AI. Completely different role.

    Getting this distinction wrong is how operators end up either (a) over-trusting autonomous agents with design decisions they shouldn’t be making, or (b) under-using conversational AI for the system-design work it’s actually best at. Chat with the Architect. Deploy the GC. Don’t confuse them.


    Levels of automation (what you’re actually doing at each stage)

    Most operators cycle through these levels as they get deeper into the pattern. Knowing which level you’re currently at — and which level a specific problem actually needs — prevents a lot of wasted effort.

    Level 0: The Owner does it. You manually do the thing. This is fine. Everything starts here. Some things should stay here.

    Level 1: Handyman. You’ve built a template, a button, a saved view. No AI involvement. Native Notion helps you do it faster. Still you doing the work.

    Level 2: Standard Build. Notion’s native automations handle it. Database triggers fire on status changes. Templates get applied automatically. Still deterministic, still no AI.

    Level 3: Self-Performing GC. A Custom Agent does the work natively — reading and writing inside Notion, reasoning about context, no workers attached. This is where agents earn their keep for the first time.

    Level 4: GC + One Trade. An agent with one specialized worker. The agent handles judgment; the worker handles a single deterministic step. This is the most common production pattern.

    Level 5: Full Project Team. An agent orchestrating multiple workers in sequence. Real project coordination. A brief-writer agent that calls a URL-capture worker, then a summarization worker, then a publishing worker, all in order.

    Level 6: Program Management. Multiple agents coordinated by an overarching structure. One agent that dispatches to specialist agents. Portfolio-level orchestration. This is where it gets complicated and where most operators don’t need to go.

    The mistake I made early on, and watch other operators make, is jumping to Level 5 when Level 3 would have worked. More pieces means more failure points. Solve it at the lowest level that works.


    Governance: permits, inspections, and change orders

    The analogy extends further than I expected into governance — which is the unsexy part of running real agents in production, but it’s the part that separates operators who keep their agents working from operators whose agents quietly stop working without them noticing.

    • Pulling a permit = Attaching a worker to an agent. You’re granting that specialty trade permission to work on your job. This is not a nothing decision. Be deliberate.
    • Building inspection = Setting a worker tool to “Always Ask” mode. Before the work ships, the human reviews it. For any worker that does something consequential, this is the default.
    • Certificate of Occupancy = The moment a capability graduates from Building to Active status in your catalog. Before that moment, treat it as construction. After, treat it as load-bearing.
    • Change Order = Editing an agent’s instructions mid-project. The scope changed. Document it.
    • Punch List = The run report every worker should write on every execution — success and failure. No silent runs. If you can’t see what your agent did, you don’t know what it did.
    • Warranty work = Iterative fixes after a worker is deployed. v0.1 to v0.2 to v0.3. This never stops.

    The governance layer sounds boring but it’s what makes agents run for months instead of days. An agent without run reports eventually drifts, fails silently, and leaves you discovering the failure weeks later when the downstream thing it was supposed to do quietly stopped happening. The governance rituals — inspections, change orders, punch lists — are not overhead. They’re what makes the system durable.


    The revised one-sentence summary

    Putting it all together, here is the whole thesis in one sentence:

    Notion is the building. Databases are the floors. The Owner runs the project. Architects design in conversation. General Contractors (agents) execute on-site. Subcontractors (workers) run specialty trades. Syncs are maintenance contracts. Triggers are permits, sensors, and dispatch radios.

    If you can hold that sentence in your head, you can design automation in Notion without getting lost in the vocabulary. When you’re about to build something, ask: which role am I playing right now? Which role does this piece need to play? Who’s the Owner, who’s the Architect, who’s the GC, who’s the sub? If you can answer, the architecture writes itself.


    Practical takeaways

    If you made it this far, here are the five things I’d want you to walk away with:

    1. Not every agent needs workers. Start with native capabilities and strong instructions. Add workers only when the agent can’t do the thing otherwise.
    1. The third time is the signal. Don’t build infrastructure for something you’ve only done twice. You’ll build the wrong version. The third time is when the pattern has stabilized enough to capture.
    1. Syncs and automations don’t need an agent. If you’re feeding a database on a schedule, or reacting to a workspace event, let the infrastructure do it. Don’t wrap it in an agent for no reason.
    1. Separate the Architect from the GC. Use conversational AI to design the system. Use Custom Agents to run the system. Don’t let an autonomous agent make design decisions that should be made in conversation.
    1. Write run reports for everything. Silent success is worse than loud failure, because silent success is indistinguishable from silent failure until weeks later. Every agent, every worker, every run — writes a report somewhere readable.

    That’s the model. It is imperfect and it is mine. If you adopt it, make it your own. If you have a better one, I’d honestly like to hear about it.


    FAQ

    What’s the difference between a Notion Worker and a Custom Agent? A Worker is a coded capability — deterministic, reusable, typically written in TypeScript — that a Custom Agent can call. A Custom Agent is an autonomous AI teammate that lives in your workspace, has instructions, runs on triggers, and can optionally use Workers to do specialized tasks. Workers are capabilities. Agents are operators that can use those capabilities.

    Do I need Workers to use Custom Agents? No. Many Custom Agents run perfectly well with zero Workers attached, using only Notion’s native capabilities (reading pages, writing to databases, searching, sending notifications) plus well-written instructions. Workers become necessary when you need to reach external services or enforce strict deterministic behavior.

    What are the three trigger types for Custom Agents? Manual (button press, @mention, or direct invocation), scheduled (recurring on a timer), and event-based (a database page is created, updated, deleted, or commented on). Pick the one that matches how the agent should know it’s time to act.

    When should I build a Worker versus letting an Agent handle something? Build a Worker when at least two of these are true: you’ve done the manual thing three or more times, the steps are stable, you need deterministic schema compliance, you’re calling an external service Notion can’t reach, or the judgment required is minimal. If fewer than two are true, stay manual or solve it with agent instructions.

    What’s the difference between a Tool, a Sync, and an Automation? A Tool is an on-demand capability that an agent calls when needed. A Sync is a background pipeline that runs on a schedule and writes to a database — no agent required. An Automation is an event handler that fires when something changes in the workspace — also no agent required. Tools are dispatched by agents; syncs and automations run on the infrastructure.

    What’s the Architect/GC distinction? When you chat with AI to design a system, the AI is playing Architect — thinking about what should be built. When a Custom Agent runs autonomously in your workspace, it’s playing General Contractor — executing the design. Same technology, different role. Don’t confuse them: let Architects design, let GCs execute.

    Does this apply outside of Notion? The Soda Machine Thesis is written around Notion’s specific implementation of Workers, Agents, and Triggers, but the underlying pattern (deterministic capabilities + judgment layer + trigger mechanism) applies to most modern agent frameworks. The vocabulary may differ. The architecture is the same.


    Closing note

    Mental models earn their place by changing the decisions you make. If the Soda Machine Thesis changes how you decide what to build next in your Notion workspace, it has done its job. If it doesn’t, discard it and find one that does.

    The reason I wrote it down is that the vocabulary available for thinking about AI-native workspaces in 2026 is still mostly vendor vocabulary, and vendor vocabulary optimizes for describing what a product can do rather than helping operators make good choices. The operator vocabulary has to come from operators. This is mine, offered in that spirit.

    If you’re running this pattern and have refinements, they’re welcome. The thesis is a living document in my own workspace. It gets smarter every time someone pushes back.


    Sources and further reading

    This mental model builds on earlier conceptual work across multiple AI tools (Notion Agent, Claude, GPT) contributing to the same thesis over a series of architecture conversations. The framing evolved through disagreement more than consensus, which is how mental models usually get better.

  • The Notion Operating Company: How to Actually Run a Business on a Workspace in 2026

    The Notion Operating Company: How to Actually Run a Business on a Workspace in 2026

    There is a version of Notion most people use, and there is the version a small number of operators have quietly built — and in April 2026 those two versions are now so far apart that they’re barely the same product.

    The version most people use is a wiki. It is a place you put information you intend to come back to, and most of the time you don’t. Pages go stale. Databases grow faster than they get organized. The search gets worse as the content gets larger. You know this because you have seen your own Notion and felt the tug of guilt when you open it, the small calculation of whether it is worth the effort to fix any of this versus just writing the thing you need to write in a fresh page and adding it to the pile.

    The version a smaller number of people have built is an operating company. It runs on Notion. The human in the chair reads briefs written by AI, approves work, watches reports come back, adjusts priorities, and hands the next job out — and the human never leaves Notion. Everything that is expensive to move between tools does not move. The work comes to them.

    Those aren’t the same product anymore. They used to be. Notion was, for years, fundamentally a block editor with databases bolted on. What changed — what actually changed, not what the vendor said changed — is that over the last six months Notion stopped being a place you put things and started being a place you run things. Custom Agents shipped in late February. The Workers framework followed. MCP support matured. The Skills layer made repeatable workflows into commandable capabilities. What used to be a workspace is now closer to an operating system for a small business.

    Most coverage of this shift is either vendor-positive cheerleading or a product tour disguised as a guide. This is neither. This is how an actual operator runs a real, unglamorous business — dozens of properties, content production cycles, client work, all of it — out of Notion in 2026. The shape, the databases, the ritual, what goes inside the workspace and what stays outside, and where it still breaks.

    If you want a product tour you can find one on Notion’s own blog. If you want the honest operator version, keep reading.


    What “operating company” actually means

    The frame matters, so let’s be concrete about what it is.

    An operating company, in the sense I mean it, is the set of decisions, assets, people, and ongoing commitments that make a business actually go. Not the legal entity. The operating layer. In a traditional small business, that operating company lives in someone’s head, a few spreadsheets, a calendar, a CRM, an email inbox, a project tool, a file drive, a slack, a billing system, and the recurring pain of trying to hold all of it in mind at once.

    Running a business on Notion in 2026 means collapsing as much of that operating layer as possible into a single workspace that knows what it is. Not a place where you write things down. A place where the work is actually happening, where the state of the business is legible at a glance, where a decision made on Monday shows up in Thursday’s automatically-generated brief without anyone having to remember to copy it forward.

    The term I have started using is the Notion Operating Company. It captures the thing correctly: Notion is not the tool you use to run the company, it is the operating layer of the company. The humans make the calls, set the priorities, and absorb the parts that cannot be delegated. Everything else lives in the workspace and operates against the workspace.

    If that sounds like a personal productivity system scaled up, it is not. Personal productivity systems are closed loops. The Notion Operating Company is an open system that other humans, AI teammates, and external services read from and write to. The difference is legibility and composability, and in 2026 those are the qualities that separate a workspace that earns its place from a workspace that is a second pile.


    Why this suddenly works in 2026 (and didn’t in 2024)

    A few things had to be true at the same time for this pattern to become reliably available to small teams and solo operators. None of them were true two years ago.

    Custom Agents shipped. On February 24, 2026, Notion released Custom Agents as part of Notion 3.3. These are autonomous AI teammates that live inside your Notion workspace and handle recurring workflows on your behalf, 24 hours a day, 7 days a week. They do not wait for you to prompt them. You give them a job description, a trigger or schedule, and the data they need, and they run. That one change is the hinge the whole operating-company pattern swings on. Before Custom Agents, automation inside Notion was cosmetic — property updates, templated pages, simple reminders. After Custom Agents, a workspace can actually operate itself between human check-ins.

    The pricing makes it viable. Custom Agents are free to try through May 3, 2026, so teams have time to explore and see what works. Starting May 4, 2026, they use Notion Credits, available as an add-on for Business and Enterprise plans. The pricing matters because it turns out many workflows are cheap enough to run continuously, and the ones that aren’t are easy to audit once the dashboards shipped. Custom Agents are now 35–50% cheaper to run across the board, especially ones with repetitive tasks like email triage. They’re even more cost efficient when you pick new models like GPT-5.4 Mini & Nano, Haiku 4.5, and MiniMax M2.5 that use up to 10× fewer credits. The 10× model-routing move means a well-designed agent for an operator’s workspace costs real-world pennies to run daily.

    MCP connects the workspace to everything else. The Model Context Protocol, opened by Anthropic, gives the workspace a standardized way to reach external tools and services. Notion ships MCP support; most serious AI tools do. The practical consequence: a Custom Agent inside Notion can reach into a source-control system, post to a messaging tool, query a database, or trigger an external worker, without anyone writing glue code. Not every integration is seamless, but the floor has lifted.

    Skills turned workflows into commandable capabilities. Skills turn “that thing you always ask Notion Agent to do” into something it can do on command. Save your best workflows as skills like drafting weekly updates, reshaping a doc in your team’s format, or prepping briefs before a meeting. That matters because the skills layer is where institutional pattern-capture lives. The first time you solve a problem in your workspace, you solve it. The second time, you turn it into a skill. The third time, you invoke it by name. A workspace that accumulates skills gets faster over time instead of slower.

    Autofill became real. Use Autofill to keep your data fresh and up to date, now with all the power and intelligence of Custom Agents. Continuously enrich, extract, and categorize information across every row, so your database stays trustworthy without manual review. That changes what a Notion database is. Databases used to rot without manual maintenance. A self-maintaining database is a different kind of object.

    None of these individually would have tipped Notion from workspace to operating system. All of them together, shipped inside a twelve-month window, did.


    The shape of an operating company in Notion

    Let me describe the actual shape. This is not theoretical. This is the operational pattern that works, stripped of the specifics that would identify any one business.

    The Control Center

    At the root of the workspace is a single page called the Control Center. It is the first page you see when you open Notion. It is the page an AI teammate is told to read first when it is helping you with anything. It is the page a new human teammate reads on day one before they read anything else.

    The Control Center does not contain content. It contains pointers. Specifically:

    • Today — a surfaced view of whatever is actively happening today, pulled from the Tasks database, filtered to today or overdue
    • The live business state — three to five sentences updated continuously (by a Custom Agent, actually) describing where the business is, what is being worked on, what is on fire
    • The database index — a linked block for each operational database, in order of how often you touch them
    • The active projects list — rolled up from the Projects database, filtered to in-flight
    • The week — the current week’s focus, the working theme, what “winning the week” looks like
    • Open loops — the short list of unresolved decisions currently parked waiting for input

    The Control Center is roughly two screens long. It tells you what is happening and gives you the jumping-off points to go deeper. Anything that belongs on the Control Center is either updated automatically or so critical that manual maintenance is worth it.

    The database spine

    Under the Control Center live the operational databases. In a functioning operating company, these map directly to the actual entities the business deals with, not to organizational categories.

    For a service business, the spine typically includes: Clients, Projects, Tasks, Leads, Decisions, People (the humans you interact with externally), Assets, and a catch-all Inbox.

    For a content business, the spine typically includes: Properties (the things you publish on), Briefs, Drafts, Published, Distribution, Ideas, and Performance.

    For a product business, the spine looks different again: Features, Customers, Feedback, Roadmap, Releases, Incidents.

    The exact databases depend on the business. The pattern does not. Each database represents a real operational object. Each relation represents a real dependency. Each view answers a question someone actually asks regularly.

    The test for whether a database belongs on the spine is simple: can you describe, in one sentence, what decision this database helps someone make? If the answer is yes, it belongs. If the answer is “it’s where I put stuff about X,” it doesn’t.

    The agents layer

    Running on top of the database spine is the agents layer. This is the part that would not have existed in 2024.

    The operational pattern, in the workspace I actually run, has a handful of agents that each do one job and do it well.

    • The Triage Agent watches the Inbox database. Anything that lands there gets a priority, a category, and a pointer to the database it actually belongs in. It does not make big decisions. It takes the pile and turns it into a sorted pile.
    • The Morning Brief Agent runs once a day. It reads the Control Center state, the active projects, the top of the Tasks database, the calendar, and the unresolved Decisions, and writes a three-paragraph brief at the top of today’s Daily page. You wake up and the state of the business is already synthesized.
    • The Review Agent runs weekly on Fridays. It pulls what was completed, what stalled, and what slipped, and writes the weekly retro. It is not asking you to fill in a form. It is writing the retro and handing it to you to review.
    • The Enrichment Agent runs on database writes. When something new lands in a key database — a lead, a project, a decision — the agent fills in the fields that would otherwise require manual data entry. Research, links, categorization.
    • The Escalation Agent watches for states that require human attention. A project stalled for too long, a task with no owner, a decision parked past its decide-by date. It surfaces them on the Control Center.

    That’s five agents. Some workspaces I’ve seen run more. Most run fewer. The number is not the point; the pattern is: each agent has one job, one data source, one output surface, and a clear signal for when it should run.

    The constraint that keeps this from sprawling into chaos is a rule I’ve internalized: one agent, one job. The moment an agent tries to do three things, it does none of them well.

    The skills layer

    Beneath the agents, you accumulate skills over time. These are not agents; they’re invoked capabilities. “Generate a weekly client report in this format.” “Convert this meeting transcript into tasks.” “Draft a response to this inbound email in my voice.” Skills are the pattern-capture layer — the place where solved-problems become invocable capabilities.

    The skills layer grows by a specific rule: the third time you notice yourself doing the same thing manually, you turn it into a skill. Not the first time, not the second. The third time is the signal that it’s going to happen again, and the cost of capturing it is less than the cost of doing it manually from here forward.

    The source-of-truth boundary

    Here is where most Notion-as-OS writeups go silent, and it’s actually the most important thing in the whole pattern.

    Notion is not the source of truth for everything. It is the source of truth for the operational state of the business — what’s happening, what’s decided, what’s being worked on, what’s next. It is not the source of truth for code, for financial transactions, for legal documents, for anything that needs to survive an outage of Notion itself.

    Code lives in a source-control system. Money data lives in whatever financial system the business uses. Legal artifacts live in signed-document storage. Heavy compute runs outside Notion and reports back. The operating company is inside Notion; the substrate is not.

    The mental model I use: Notion is the bridge of the ship. The bridge runs the ship. The ship is not inside the bridge.

    This distinction is what prevents the whole pattern from collapsing. A workspace that tries to be the whole business eventually becomes unusable because it is bloated with content that doesn’t belong in a control plane. A workspace that is a control plane stays light, stays fast, and stays legible.


    The daily ritual (what it actually looks like)

    The pattern lives or dies in daily use. Let me describe what a normal working day looks like for an operator running on this pattern — the actual sequence, not the aspirational version.

    Open Notion. The Control Center loads. The Morning Brief Agent has already run; the top of today’s Daily page has a three-paragraph synthesis of the state of the business: what’s on fire, what’s progressing, what requires a decision today. Reading that takes ninety seconds.

    Scan the Inbox. The Triage Agent has already sorted whatever landed overnight. Each item has a category, a priority, and a pointer. You’re not doing the sort. You’re spot-checking the sort — agreeing, disagreeing, occasionally fixing, and dispatching the important items into their real databases.

    Check Escalations. The Escalation Agent has flagged the three things that need attention. You make the decisions. This is the part where being a human matters.

    Open today’s active project. Whatever you are actually working on is linked from the Control Center. You go there and do the work. Sometimes the work is writing in Notion. Sometimes the work is in an IDE, a chat window, a document, a call — Notion is where you come back to log what happened and what comes next.

    At a natural stopping point, log. The log is short. Two sentences on what just got done. Notion captures the timestamp. Over time the log becomes the actual record of how the business moves.

    Evening wrap. Five minutes. The day’s work closes out. Anything that didn’t get done gets re-dated. Tomorrow’s active page pre-stages.

    That’s the ritual. It takes under twenty minutes of overhead per day and gives you a fully legible operating record. The agents do the work that would otherwise be overhead. The human does the work that requires a human.

    The difference between an operator running this pattern and an operator running without it is not productivity on any individual task. It is the absence of the context-loss tax — the tax you pay every time you sit down and have to remember where you left off, what’s happening, what’s next. Pay that tax once a day at the beginning of the brief, and the rest of the day runs on continuous context.


    Where it still breaks (the honest part)

    This pattern is not finished. There are specific places where running a real operating company on Notion still hits walls, and pretending otherwise is the kind of dishonesty that catches up to you when the tool fails you at a bad moment.

    Heavy write workloads. Notion is not a database in the performance sense. If you are trying to push hundreds of updates per minute through the API, you are going to hit rate limits and you are going to have a bad time. The operational pattern is aware of this: heavy writes go to a real database first and are reflected into Notion in summary form.

    Reliable external integration. Custom Agents’ ability to reach external systems via MCP has improved a lot in 2026, but it is not ironclad. Agents that must succeed — send this email, charge this card, update this record — still belong in a purpose-built service, not in a Custom Agent. The rule I use: if the cost of the agent silently failing is real money or real trust, it doesn’t belong in Notion.

    Mobile agent management. Building, editing, and configuring Custom Agents requires the Notion desktop or web app. Mobile access for viewing and interacting with existing agents is supported, but agent creation and configuration is desktop/web only. This is fine but worth knowing. Operators who work primarily from phone can interact with agents but cannot build them on the go.

    Prompt injection. Custom Agents can encounter “prompt injection” attempts — when someone tries to manipulate an agent through hidden instructions in content it reads. This risk exists across connected tools, uploaded documents, and even internal communications. Notion has shipped detection, but the attack surface is real and growing. The practical operator response: don’t give agents access to anything they don’t strictly need, and review any external content an agent will read before granting access.

    The shape of the workspace matters more than it used to. A messy Notion workspace was merely annoying in 2024. A messy Notion workspace in 2026 makes your agents worse, because the agents are navigating the same structure you are. Disorganized databases produce disorganized agent outputs. The cost of workspace hygiene used to be cosmetic. It’s now functional.

    Credit economics at scale. Starting May 4, 2026, Custom Agents run on Notion Credits, a usage-based add-on available for Business and Enterprise plans. The pricing is $10 per 1,000 credits. Credits are shared across the workspace and reset monthly. Unused credits do not roll over to the following month. For a small operator, this is fine. Most workflows are cheap. For larger teams running many agents, credit consumption becomes a line item worth watching. Notion has shipped a credits dashboard to help, but budget discipline is a new muscle for Notion-native teams.

    None of these are dealbreakers. All of them are things the pattern has to work around. The honest version of this article tells you that up front.


    Notion Agent vs Custom Agents (the distinction that matters)

    One clarification because the terminology can confuse newcomers to the pattern.

    Custom Agents are team-wide AI teammates that run automatically on schedules or triggers. Notion Agent is a personal AI assistant that works on-demand when you ask. All Notion users get Notion Agent. Business and Enterprise customers get Custom Agents, priced under the Notion credit system.

    The operating-company pattern uses both. Notion Agent is the on-demand assistant — the one you invoke for “rewrite this paragraph” or “summarize this doc” or “find me every page that mentions X.” Custom Agents are the autonomous teammates that run the background rhythms.

    The mistake to avoid: trying to use Notion Agent for the background rhythms. It is not built for that. It runs when you ask. Custom Agents run when the world changes or when a schedule says so. Those are different tools for different jobs.


    Who this pattern is for

    To be clear about who gets the most out of the Notion Operating Company pattern:

    • Solo operators running real businesses. The leverage is highest here because there is no team to argue with about conventions. You decide the shape, you live in it.
    • Small teams (3–15 people) with a strong operational function. The pattern works if one person owns workspace architecture. It breaks if everyone is allowed to add databases and pages ad-hoc without a maintaining hand.
    • Agencies and consultancies running multi-property operations. Anywhere you need to coordinate lots of parallel work and keep the whole portfolio legible to one or two humans.
    • Knowledge-heavy businesses. Law firms, research shops, content operations, advisory services. The operating company pattern rewards businesses where the value is produced by synthesis across prior work.

    Where the pattern fits less well: businesses where most of the work happens outside any tool (field services, physical retail, manufacturing floors). Notion can still run the management layer, but most of the actual operational data lives elsewhere.


    How to start without building a cathedral

    The pattern I’ve described can sound like a project. It isn’t. Or rather, it can be — people build beautiful elaborate versions for a year and never actually use them. The better path is embarrassingly small steps.

    Week one: build the Control Center. Just that page. Two screens long. Link to the databases you already have, even if they’re messy. The Control Center is the anchor; everything else will build against it.

    Week two: add one Custom Agent. Pick the simplest high-frequency job you do manually. The Triage Agent is a good first choice. Let it run for a week. Watch what it gets right. Adjust.

    Week three: add the Morning Brief Agent. This is the one that changes how your days open. If it works, you will know because opening Notion will stop feeling like work and start feeling like a starting line.

    Week four: look at your databases. The ones that matter will be obvious because the agents will be using them. The ones that don’t matter will be collecting dust. Delete or archive the dead ones. Formalize the live ones.

    After that, the pattern compounds. Each thing you do manually three times becomes a skill. Each repeated workflow becomes an agent. Each messy database gets cleaned when an agent trips on it. The workspace gets smarter as a function of use, not as a function of a weekend rebuild project.

    The operators I’ve seen succeed with this pattern have a specific characteristic in common: they started small and kept going. The operators I’ve seen fail had grand plans and never got to week four.


    What “AI-native business” actually means (if we have to use the phrase)

    The term “AI-native” gets thrown around enough to lose meaning. Inside this pattern, it means something specific.

    An AI-native business is one where AI is not a tool you pick up to accomplish a task. It is a teammate that is already in the workspace, already reading the state, already surfacing what matters, already handling the rhythms. The human is not using AI. The human is working with an operating company that has AI embedded into its substrate.

    That is what the Notion Operating Company pattern produces. Not a workspace that is faster because AI is speeding things up. A workspace that operates continuously because the AI is running inside it, and the human shows up to make the calls that only a human can make.

    This is why I wrote at the beginning that the version of Notion most people use and the version a smaller number have built are barely the same product anymore. They are not. They are two different conceptions of what a workspace is for, and in April 2026, one of them is still a place you put things, and the other is a place you run things.

    The whole game is picking the second one on purpose.


    FAQ

    What’s the difference between using Notion as a wiki and running an operating company on Notion? A wiki is where information lives after you’re done with it. An operating company is where the work actually happens — briefs, decisions, run reports, active projects, agents handling recurring rhythms. The operating company pattern treats Notion as a control plane, not an archive.

    Do I need Business or Enterprise plan? For Custom Agents, yes. Custom Agents require Notion’s Business or Enterprise Plan. Notion Agent (the on-demand personal AI) is available to all Notion users. The operating-company pattern benefits substantially from Custom Agents, so most serious implementations are on Business or higher.

    How much does this cost to run? Custom Agents are free to try through May 3, 2026. Starting May 4, 2026, they use Notion Credits, available as an add-on for Business and Enterprise plans — $10 per 1,000 credits, shared across the workspace, reset monthly, no rollover. In practice, for a solo operator or small team running five or so agents, credit costs are modest. Budget discipline becomes relevant at larger scale.

    What AI models can the agents use? Currently available: Auto (Notion selects), Claude Sonnet, Claude Opus, and GPT-5. Notion regularly adds new models, so expect this list to evolve. Recent additions include cost-efficient models like Haiku 4.5 and GPT-5.4 Mini/Nano that can cut credit usage significantly.

    How secure is it? Custom Agents inherit your permissions, so they can see what you see. They offer page-level access control. Every agent run is logged with full audit trails. Notion has implemented guardrails to automatically detect potential prompt injection, and has built controls for admins and workspace owners to monitor connections and restrict what agents can access. The honest answer: reasonable security defaults, real attack surface, practical precautions apply (scope agents narrowly, audit connected sources).

    Can I run this pattern solo? Yes. Solo operators get the highest leverage from the operating-company pattern because there’s no team coordination overhead. The pattern scales down cleanly.

    What if I don’t want to use Custom Agents? Does the pattern still work? The database spine and Control Center work without agents. You’ll be doing manually what the agents would be doing — daily briefs, triage, weekly reviews. The pattern is still more legible than a traditional Notion setup; you just don’t get the “workspace operates itself between check-ins” effect.

    How long does it take to build? The honest answer is you never stop building. You never should. A workspace that stops evolving is a workspace that is about to stop working. But the minimum viable version — Control Center, one agent, a handful of databases — is a week of part-time work, not a project.


    A closing observation

    The reason this pattern is worth writing about now, in April 2026, is that the window where it is a genuine edge is probably short. Two years from now, some version of this will be the default way Notion is used, and the advantage will compress. Today, most workspaces are still wikis. The operators who make the switch to operating-company now are buying a year or two of operational leverage that becomes the baseline eventually.

    But for right now, this works, it is real, and almost nobody is doing it. That gap is the thing.

    If you are already running something like this, you know. If you are reading about it for the first time, the starting point is the Control Center and one agent. Build the Control Center this week. Add the agent next week. In a month, you’ll have a workspace that is a different kind of object than the one you started with.

    That’s what we mean by an operating company.


    Sources and further reading