Tag: Knowledge Base

  • The Way Back In

    The Way Back In

    Google’s real superpower was never search or ads. It was the door home — and I learned that at 2 a.m., locked out of my own life.

    I locked myself out of my own account a little after one in the morning. I don’t even remember what I needed in there — something small, something that could have waited until daylight. What I remember is the password field refusing me, then refusing me again, and the cold drop in my stomach when I realized the keys to a dozen other things lived behind that one rejection.

    So I did what everyone does. I grabbed my phone. I tried the recovery email, which routed to an account I also couldn’t reach. I tried the text-message code. I tried the security questions, answered years ago with half-truths I’d invented and instantly forgotten. I worked the recovery flow like a man patting his pockets at a locked door, and somewhere in there it landed on me that I was negotiating — not with a hacker, not with a thief, but with the company that decides whether I am still me.

    I got back in by morning. Relief, and then a second feeling underneath it that wouldn’t leave: that was the product. Not the search box. Not the ads. The way back in.

    I build access layers for a living. Second brains. A life-ranking system I call the Compass. The structured record a business can’t operate without — the institutional memory that walks out the door when the wrong person quits. Continuity systems for my wife Stefani, so the things she needs are still there on the days her memory isn’t. I’d been filing all of it under content and tooling. That night I understood I’d been mislabeling my own work — and I understood something about Google that most people have backwards.

    Two things, not one

    Here is the distinction that reorganized everything for me, and I want to be precise, because the sloppy version of this argument is wrong.

    Search and ads are how Google makes money. That’s the business model, the value capture, the line on the income statement. Anyone who tells you access “beats” advertising is comparing a turnstile to a cash register. They don’t sit on the same axis.

    But there are two things going on, and we only ever talk about one. Ads are how Google makes money. Access is why you can’t make Google stop. The login, the password manager, the “Sign in with Google” button, the recovery flow when you’re locked out — none of it earns a dollar directly. Google gives it all away. It exists to defend the surface where the money gets made.

    And that’s the part people miss: the layer that earns nothing is the layer you can never leave. Attention is rented by the day — a better answer wins the next query, a better feed wins the next scroll. Access is owned by the year. So I won’t tell you access is more valuable than attention. I’ll tell you something narrower and more interesting: access is more durable. It is the layer with its hand on the master switch, and it shows up on the books as a cost center, a free feature, a help-desk ticket — which is exactly why nobody guards against it.

    Why the door beats the window

    The mechanics are almost embarrassingly simple once you see them.

    You can change your default search engine in a single setting. One click, a coffee break, done. Now try changing the thing that holds the keys to everything else. Imagine someone who’s used “Sign in with Google” across twenty or thirty services — and once you start counting your own, the number climbs faster than you’d like. That account isn’t an account anymore. It’s the hinge the whole house swings on. Lose it and you don’t lose one thing; you lose your bank login’s recovery path, your work tools, your tax software, your photos, the smart lock on your front door.

    That’s the asymmetry. Search is a window you can swap in an afternoon. Access is the door the whole house hangs on — and the house has been quietly built around it.

    This is switching-cost economics, and it has a clean shape. The hold a company has on you is its switching cost plus whatever its product is actually, presently better at. Advertising lives almost entirely on that second term — a marginally better result — which evaporates the instant a rival catches up. Access lives on the first, and the first only grows. Every new service you wire to that one login deepens the hold by one more door. Adding a lock is a single pleasant click. Removing it means re-keying every door at once, in parallel, under deadline, with permanent lockout as the price of getting it wrong. The pain isn’t additive. It’s combinatorial. That gap — between how easy it is to add the lock and how terrifying it is to pull it — is the moat.

    Salesforce and SAP have lived inside this physics for decades, holding enterprise customers for twenty-five-year stretches, and nobody calls them content businesses. Google built the same thing for your whole life and handed it out for free.

    The institutions confirmed it by where they aimed. When the U.S. courts found Google an illegal monopolist, the remedy went after the contracts — the roughly twenty billion dollars a year Google pays Apple to be the default, the exclusive default-search deals, now capped to one-year terms. But the court declined to break off Chrome or Android. It renegotiated who gets to answer the door and left untouched the company that built every lock, hinge, and recovery key in the house. Even the people dismantling the monopoly treated “who is the default way in” as the twenty-billion-dollar question — and left the deeper layer, the one that actually owns login, autofill, passkeys, and recovery, exactly where it was.

    The thing it holds is a piece of your mind

    I could have left it at economics. But the lockout didn’t feel like an economics problem at one in the morning. It felt like an amputation, and I want to take that feeling seriously, because it’s the truest part.

    There’s an old argument in philosophy of mind — Andy Clark and David Chalmers, 1998, “The Extended Mind.” They imagine Otto, a man whose memory is failing, who writes what he needs in a notebook and consults it the way you and I consult the inside of our own heads. Their claim isn’t that the notebook helps Otto’s mind. It’s that the notebook is part of Otto’s mind — the storage just happens to sit outside his skull. If a process counts as remembering when it happens in your head, it counts as remembering when it happens in the world.

    I read that and thought about Stefani. “Remember for her when she can’t” is Otto’s notebook, almost word for word. The philosophy was settled twenty-eight years ago: the thing that holds your memory for you is not a tool you use. It is part of the mind doing the remembering.

    Then the cognitive science caught up with the philosophy. In 2011, Betsy Sparrow and her colleagues at Columbia tested how people handle information they expect to look up later. We don’t retain the information, they found — we retain where to find it. The brain offloads the content and keeps the pointer. We are becoming, in their phrase, symbiotic with our tools. Sit with that: human memory already ran my experiment and reached my conclusion. It threw away the fact and kept the way back in. Access beating content isn’t a strategy I invented. It’s how your own head now works.

    Which means whoever holds the pointer holds the only half of the memory your brain bothered to keep. You can swap a search engine in a second. You cannot swap a piece of your own mind without something that feels, accurately, like a small lobotomy. An ad interrupts you. A lockout unselfs you. And the entity that hands you back in isn’t selling you a service. It’s returning you to yourself.

    There’s a flip side I have to be honest about, because it’s the whole case for doing this carefully. Sparrow’s same line of research shows that offloading frees you up — trusting that something is safely stored elsewhere measurably improves your ability to learn the next thing. But it also shows the benefit reverses when the external store turns out to be unreliable. You end up worse off than if you’d never offloaded, because you pruned the internal copy and the external one failed you. Reliability isn’t a feature of a continuity layer. It’s the entire product. A second brain that might vanish doesn’t merely fail to help — it degrades the mind that came to depend on it.

    The blade cuts both ways

    So here’s where I turn the knife on my own argument, because the thing that makes access powerful is the same thing that makes it dangerous, and I don’t trust anyone who won’t say so.

    Access is a pharmakon — Plato’s word, the one Derrida built on: the single substance that cures and poisons, depending on nothing but the dose and the hand that holds it. The recovery flow that rescued me at 2 a.m. is, mechanically, the identical system that means I can never fully leave. Not two features in tension. One feature, seen from two sides.

    Android makes it literal. Factory Reset Protection turns a wiped phone into a brick until the original Google account is re-verified. The feature that stops a thief from using your stolen phone is the same feature that makes the device hostage to Google’s say-so. Protection and imprisonment, one mechanism — and Google isn’t retreating from this ground, it’s deepening it, because recovery is exactly where the bond forms. The company that saves you and the company that traps you are the same company. You’re just meeting it at two different moments.

    Now let me take the strongest objections head-on, because the good ones are real.

    “Switching costs approach infinity.” No. I used to say it that way, and it was wrong. People migrate ecosystems by the hundreds of millions and carry their photos and contacts with them. Phone-number portability was mandated and it worked. Passkeys are an open standard, and their own backers built a credential-exchange protocol specifically to make them portable between password managers. Europe’s data-portability law already forces Google to hand you everything. My own founding story refutes the infinity claim: I got back in by morning. The moat is high, it is real, and it is finite and shrinking by design — every serious regulatory and technical current of this decade is engineered to grind it down. And that cuts in my favor. If lock-in were infinite, “we’ll let you leave” would be a meaningless promise. It means something only because leaving is becoming genuinely possible.

    “Isn’t ‘access as care’ just what every captor says?” Yes. Company towns called themselves family. AOL called itself a community. Every lock-in business in history has narrated itself as care, and the distinction is invisible at the exact moment it matters most — when you’re locked out, sick, grieving, laid off, and least able to audit whether anyone actually has your back. This is the real soft spot, and I won’t paper over it. Care cannot be declared. It has to be engineered — and provable by someone who never read the terms. Words are free. I’ll come back to what isn’t.

    “Gratitude isn’t a moat — the 2 a.m. plumber gets it too.” Correct. The ER, the locksmith, roadside assistance, my own restoration clients on the worst day of their lives — they all bond at the moment of relief, and gratitude decays, and people shop their insurance anyway. So gratitude isn’t the moat. It’s the on-ramp. The midnight rescue doesn’t lock anyone in; it earns the first conversation. What keeps them is what you do after — and that’s a question of character, not a property of the crisis.

    Care holds the same keys — and hands you a copy

    Let me show you what the answer looks like before I argue for it.

    Last winter one of my restoration clients walked into a commercial building with two inches of standing water across the floor — burst supply line, ceilings down, a decade of operating records soaking in a back office that also held the only copies of their continuity plan, their vendor contracts, their insurance file. By the time the water was out, the part they were most afraid of losing wasn’t the drywall. It was the paper. We’d already pulled their critical records into a structured store they could reach from a phone — indexed, searchable, theirs. The owner stood in the wreckage and opened the file on his phone, and the thing that could have ended the business was just there. Then the part that matters to this essay: when the job closed, the whole store exported in one motion, in formats their own systems could read, and went with them. No call to me. No ransom for their own records. They walked out with the keys in their hand, and the relief on the owner’s face was the entire argument I’m about to make, compressed into one moment.

    That’s the difference between holding the keys for someone and holding them over them. Once you accept that the held thing is part of a person’s mind, the ethics stop being a garnish and become the architecture. Holding a piece of someone’s cognition and refusing to let them leave isn’t hard-nosed business; it’s closer to holding a self hostage. Holding that same piece while guaranteeing they can walk out with all of it, any time, without asking — that’s not a vendor. That’s a trustee. The oldest answer the law has to the question of how you hold something vital that belongs to someone else: you hold it for them, bound to their interest, returnable on demand.

    The whole thing collapses to one question. Not do you hold the keys — someone always holds the keys. The question is whether you hold them for her or over her. Google books your access as its switching cost, an asset on its side of the ledger. The humane version books it as your asset, merely held in trust. Same keys. Opposite politics.

    Which is why I keep coming back to the difference between a scaffold and a cage. Good scaffolding is built to come down — calibrated to do only what the person can’t yet do alone, withdrawn as they grow. A scaffold that never comes down isn’t support anymore; it’s a wall you’ve forgotten how to live without. “Remember for Stefani when she can’t” is the morally exact phrasing — contingent help for a real gap, not a blanket seizure of her agency. Do everything for someone and you don’t make them safe. You teach them they can’t.

    And I’ll admit the moat I’m choosing is the weaker one. A lock-in moat is strong precisely because it’s coercive — you stay because you can’t go. A trust moat is fragile; one breach and it’s gone overnight. I’m choosing the fragile one on purpose, and not only because it’s right. Lock-in and care produce the identical retention number — ninety-nine percent stay either way — but for opposite reasons, and the difference only shows up the day switching becomes free. That day is coming: portability law, open credential standards, and soon an AI agent that can re-key your whole life in an afternoon. When it arrives, the captivity moat evaporates and the trust moat doesn’t even notice. Free exit isn’t charity — it’s the only hold worth having once leaving is easy and everyone knows it. I’m not being generous. I’m being early.

    But I won’t let myself off with a promise, because a promise from an interested party is exactly what breaks the day the incentives flip — an acquisition, a cash crunch, a change of hands. So the care has to be built into things that survive my intentions. Export in open, ingestible formats — not a dead blob no other system can read, which is fake portability wearing a real coat. A published exit that works without anyone calling me. A governance mechanism that binds the company after it’s sold. Don’t trust my intentions. Trust the mechanism that outlives them. That’s the only honest answer to “every captor says that.” The test was never the happy customer. It’s whether the grieving spouse who never read a word of the terms can still get everything out, in one motion, with no call to me. Design for the person who can’t advocate for themselves, and the ethics stop being marketing.

    The door is moving — to the agent

    This is also the shape of the next decade, and it’s why I work the way I work.

    Google holds the keys to your accounts. The AI agent is coming to hold the keys to your context — what you’re working on, what you decided last month, how you actually think and operate. That’s a deeper hook than a login, because a login gets you into the app, but context is the work. Search was a query you typed and forgot. The agent is a relationship that accumulates.

    And there’s a real chance, for the first time, that the door doesn’t have to be a cage. The plumbing that lets an agent reach into your files, calendar, and tools — Anthropic’s Model Context Protocol — is being built as a shared, open standard rather than one company’s private wiring. I won’t call that settled or “neutral”; standards get captured, and this one is young enough to go either way. But open plumbing at least makes it possible to build an agent that reaches into everything you own without owning it. Access without capture is finally buildable, not merely sayable.

    The trap is moving too — and getting subtler. The new lock-in isn’t your data. It’s the agent’s learned understanding of you, accreted day after day. You can export every chat log and still leave behind the part that actually knew you, because raw logs aren’t understanding, and no portability law reaches that gap. Which is the whole reason I build on Claude rather than treat any of this as theory: its memory has a delete button and an export button. You can read what it knows about you, change it, take it elsewhere, even bring your history in from somewhere else. That’s not a feature. It’s a thesis with a receipt — own the payload, walk out anytime, shipped.

    I have to name the obvious dark mirror, because it’s already shipping. Microsoft Recall makes the identical pitch — we’ll remember everything for you — by quietly screenshotting your screen every few seconds into a local index. Same promise, opposite governance: a memory built about you, by default, that you didn’t author and can’t easily hand to anyone else. The pointer to your own mind, held on someone else’s terms. The seat for “Sign in with your agent” is still empty, but the room is filling — Recall, OpenAI’s persistent memory, Gemini woven through Android, Apple’s on-device intelligence are all reaching for it. Whoever defines what care looks like before that seat fills sets the norm for everyone after. That’s not a forecast from the bleachers. It’s the work.

    What I’m actually building

    So let me say what my portfolio really is, because I had it mislabeled too.

    It looks like five businesses held together by nothing but my calendar — restoration clients, the second brain, the Compass, remembering for Stefani, the structured record a company can’t operate without. It’s one product. Each version shows up at the bottom — the moment of maximum vulnerability, when someone has the least to spare and the most to lose — takes custody of a piece of their continuity, and is built, from the foundation, to give all of it back. Continuity is the one thing the attention economy never touches: the durable layer a person or a business runs on — their records, their memory, their way back into their own life — the part that, if it vanished, would not just inconvenience them but unself them.

    The attention economy fights for you when you have everything to spare, which is why it has to shout and why you resent it for shouting. The continuity layer shows up when you have nothing left, and arrives with relief. Bonds made at the bottom run deeper than impressions bought at the top — but only one kind of person should be trusted to be there at the bottom: the kind who hands you the key on the way in.

    I’ll concede the last hard thing plainly, because a skeptic has already spotted it. Today, the part of my work that pays the bills is the discovery work — getting found, getting ranked, getting cited. The continuity layer is real but young, and I won’t pretend it has finished proving it can pay. Here’s how I think it does: not by charging for the data, which would just be the cage again, but as a held-in-trust retainer — an ongoing fee for keeping the lights on and the door unlocked, priced like what it is, a fiduciary relationship rather than a subscription you’re trapped inside. You earn the right to charge it by first being useful enough to be found. Discovery isn’t a contradiction of the thesis; it’s the front door. Attention comes first. It always did. The mistake is thinking it’s the destination.

    And here’s the part I can’t dodge, the one that keeps me honest. The agent I’m betting on — the one that can re-key a whole life in an afternoon — is the same tool that dissolves my moat too. If re-keying is trivial, the switching cost protecting my own work goes to zero right alongside Google’s. I’m left holding nothing but the fragile thing: trust, provable on the day someone decides to leave. That isn’t a bug in my bet. It’s the point of it. The tool I’m wagering everything on is the one that guarantees I can never coast — it leaves me no hold on anyone except being worth staying with. I’d rather build on that than on a lock.

    Which is where it lands, in one line I’ve earned the right to say now:

    Don’t sell knowledge. Don’t sell content. Sell access to continuity — and prove it’s care and not a cage by handing the customer the key on the way in.

    I learned that locked out of my own life at two in the morning, patting my pockets at a door, negotiating with the only entity that could tell me whether I was still me. Google taught me how much that door is worth. It just never taught me to hand anyone a copy of the key. That part’s on us — and the copy is the whole job.

  • Elicitation Over Extraction: A Working Theory of How Solo Operators Should Actually Use Large Language Models

    Elicitation Over Extraction: A Working Theory of How Solo Operators Should Actually Use Large Language Models

    This is a working theory, not a finished one. It proposes a specific reframing of how solo operators and small agencies should be using large language models day-to-day, names the failure mode of the current dominant approach, and lays out the experiments that would prove or disprove the central claim. The piece is published here so it can be referenced, tested against, and revised in public as the evidence comes in. If the claim is wrong, the next version of this article will say so.


    The Claim, in One Sentence

    For solo operators and small agencies working with large language models, the dominant mental model — build a knowledge base, feed it to the model, ask questions of the document — is correct for a narrow class of work and wasteful or counterproductive for a much larger class, and the work most operators are doing fits the larger class.

    A better mental model for that larger class is what this piece will call Elicitation Over Extraction: the assumption that the model already contains the relevant knowledge as latent capability, and that the operator’s job is to activate the right region of that latent capability with precise, compact prompts rather than to ship the knowledge into the context window through document retrieval. Knowledge stays in training. The work shifts to activation.

    This is not a new idea in the AI research literature. It is, however, almost entirely absent from how operators are currently building their personal AI workflows. The gap between what the research suggests is possible and what the operator-tooling ecosystem is building toward is the gap this piece is trying to name and close.

    Where the Current Dominant Pattern Comes From

    The current dominant pattern in operator-side AI tooling is retrieval-augmented generation, or RAG. The pattern is straightforward. An operator builds a knowledge base — pages in Notion, files in Drive, articles in a vector database, transcripts of YouTube videos, customer support tickets, whatever the operator’s domain produces. When a question is asked of the model, a retrieval system finds the most relevant chunks of that knowledge base, packs them into the model’s context window, and asks the model to answer using that retrieved material as grounding.

    The pattern works. For certain shapes of problem, it works very well. It is the right architecture when the operator’s question depends on information that is genuinely outside the model’s training data — proprietary documents, current events that postdate the training cutoff, client-specific details that no public source contains, internal organizational knowledge that exists nowhere on the open internet. For that shape of problem, RAG is not optional. It is the only honest way to get accurate answers, because the alternative is the model inventing details about things it has no real knowledge of.

    The pattern has also been heavily promoted by the AI-tooling industry for reasons that have only loosely to do with whether it is the right pattern for any specific operator. Vector databases, retrieval pipelines, document-loading frameworks, embedding services, and knowledge-base products all exist because RAG creates demand for them. The narrative that every operator needs a knowledge base, that every workflow benefits from document retrieval, that the path to better AI work runs through better document organization — that narrative is commercially convenient for the vendors selling the components. It is also half true, which is the worst kind of half true, because the part that is true gets used to justify the part that isn’t.

    The part that is true: when the model lacks the specific knowledge needed for the task, retrieval helps. The part that isn’t: when the model already has the knowledge, retrieval is at best redundant and at worst actively degrades the response. The middle case — when the model has the general knowledge but lacks the specific framing, voice, or activation — is the case the operator ecosystem has not figured out how to name or handle, and it is also the case most operators are actually in for most of their work.

    The Specific Failure Mode

    Picture an operator who wants to write content in the voice of a particular thinker — call this thinker Senior Operator-Investor, someone who has been writing publicly for twenty years and whose work is heavily represented in the model’s training data. The operator’s default move, under the RAG pattern, is to collect transcripts of that thinker’s podcasts and YouTube videos, structure them in a knowledge base, and feed them to the model along with the question.

    What actually happens when the operator does this is the following. The 20,000-token transcript dump enters the model’s context window. The model attends to that transcript on every generation step, scanning for relevant passages, weighing them against the question being asked. This is computationally expensive, slow, and noisy — most of the transcript is irrelevant to any specific question. The model also already knew this thinker’s voice from training. The transcript is mostly redundant with patterns the model can already produce from its weights. The operator is paying tokens to remind the model of things the model knows.

    The more efficient version is to write a 200-token activation prompt: a careful description of the thinker’s voice, their characteristic moves, their temperament, and a few canonical reference points. That prompt activates the same region of the model’s latent space that the 20,000-token transcript was trying to activate, at one one-hundredth the token cost, with less attentional noise, and with output that is often qualitatively better because the model is not being pulled in inconsistent directions by tangentially relevant transcript passages.

    The 100x token reduction is not theoretical. It is what happens in practice when prompts are designed for activation rather than information transfer. The reduction is also not the most important benefit. The more important benefit is that the operator stops doing knowledge-engineering work that is duplicative with the training the model has already received, and starts doing the work that is actually distinctive: designing the activation patterns themselves.

    The failure mode of the current dominant pattern is that operators are spending their time on the wrong layer. They are building warehouses when they should be building switchboards. The warehouse holds information the model already has. The switchboard turns on specific patterns of cognition that the model can already produce but does not produce by default.

    What the Research Literature Says

    There is a real body of research on what is called persona prompting, role conditioning, and activation steering. The findings are nuanced and they refine the claim above in ways worth knowing.

    Persona prompting does change model output. The effect is measurable and consistent across many tasks. The voice, style, and reasoning approach of the model can be meaningfully shifted by a few hundred well-chosen tokens at the start of a prompt. This part of the picture confirms the central intuition of Elicitation Over Extraction: latent capability is real, activation prompts can reach it, and the activation work is meaningful work.

    But the same research literature surfaces an important caveat that the strong version of the claim has to address. Persona prompting consistently helps with style, voice, clarity, and tone — the things one might call the surface texture of generation. It is less consistent, and sometimes actively harmful, on tasks that depend on precise factual recall, multi-step logical reasoning, or strict accuracy on benchmarked knowledge. In some studies, telling a model to “act like an expert” on a factual recall task decreased accuracy compared to no persona at all. The model became so focused on performing expertise that it stopped retrieving its underlying knowledge cleanly.

    This is important and it changes the shape of the claim. Elicitation Over Extraction is not a universal replacement for RAG. It is the right approach for tasks where what the operator needs from the model is voice, framing, judgment, or pattern-matching against a thinker’s known mode. It is the wrong approach — and may be worse than neutral — for tasks that depend on precise factual recall of specific data points.

    The honest version of the claim, then, is something like the following. Operator work falls into at least three different shapes. The first shape is “I need the model to produce content in a specific voice or style” — activation prompts dominate, RAG is wasteful. The second shape is “I need the model to retrieve specific facts from a corpus the model has not seen” — RAG dominates, activation prompts are insufficient. The third shape is “I need the model to apply judgment to information I am providing” — both layers matter, with activation handling the judgment and retrieval handling the information.

    Most operators are running shape one and shape three workflows but using shape two tooling. That mismatch is the source of the inefficiency. The fix is not to abandon retrieval. The fix is to know which shape any given workflow is and use the right layer for that shape.

    Why This Is Not Obvious

    If the distinction is real and well-documented in research, the question is why operators are not already organizing their work this way. Three reasons, in roughly increasing order of importance.

    The first reason is that “knowledge engineering” carries a status premium that “elicitation engineering” does not. Building a structured knowledge base sounds like real work. Writing a 200-token prompt sounds like a parlor trick. The fact that the 200-token prompt may actually be doing more useful work than the knowledge base does not show up in the social register of the activity. Operators who are evaluating their own productivity, even if only to themselves, tend to over-weight effort that looks substantial and under-weight effort that looks easy, even when the easy effort is producing better results. The shape of effort matters more than the result of effort, until the operator becomes deliberate about correcting for that bias.

    The second reason is that the dominant vendor narrative pushes against elicitation. Every vendor selling a vector database, every vendor selling a document loader, every vendor selling a RAG pipeline product has a commercial incentive to frame all problems as retrieval problems. The vendor ecosystem does not have a strong commercial incentive to teach operators how to write better activation prompts, because activation prompts do not require vendor products. There is no SaaS company selling “the activation layer” because the activation layer fits on one Notion page and does not need to be sold. The absence of a commercial narrative around elicitation makes it invisible to operators who are learning about AI through vendor content.

    The third reason is the deepest one and it is about the relationship between knowledge and accessibility. The model containing knowledge in its training is not the same as the model producing that knowledge when queried. A first-year medical student who has read every textbook on the shelf is not the same as a senior physician who can produce the right diagnosis under pressure. The knowledge is the same in both cases. The accessibility is different. The senior physician has navigated the latent space of medical knowledge so many times that the relevant patterns activate automatically when the case presents. The first-year student has the same knowledge in storage but cannot get to it on demand under realistic conditions.

    Operators are encountering models that are, in a precise sense, in the first-year-medical-student position with respect to most domains. The knowledge is there. The activation is unreliable. The dominant vendor response to this is to bypass the activation problem by stuffing the relevant knowledge directly into the context window — which works but treats the symptom rather than the cause. The Elicitation Over Extraction response is to do the activation work directly, build a library of activation patterns that reliably reach the relevant latent regions, and stop treating the model as an empty container that needs to be filled with documents.

    The Working Theory

    Pulling the threads together, the working theory of this piece is the following set of connected claims.

    Claim one. Large language models contain enormous latent knowledge that is not, by default, reliably accessible through naive prompting. The knowledge is in the weights. The activation is the problem.

    Claim two. The dominant operator response to this — document retrieval and knowledge-base construction — addresses the activation problem indirectly, by bypassing latent knowledge in favor of in-context knowledge. This works but is inefficient when the latent knowledge is already strong, and the inefficiency compounds across many operator workflows.

    Claim three. A complementary approach, currently underbuilt in operator tooling, is to develop a library of compact activation prompts that reliably steer the model into specific cognitive modes — voices, frames, temperaments, schools of thought. This library serves a different function than a knowledge base and the two are complements, not substitutes, but most operators have heavily over-built the knowledge-base side and barely built the activation side.

    Claim four. The right architecture for an operator’s personal AI infrastructure is therefore three-layered: a library of activation patterns for tasks that depend on voice, framing, and judgment; a structured set of retrieval sources for tasks that depend on specific external knowledge the model lacks; and a clear decision rule for which layer a given task draws from. The current state of most operators’ setups has layer two heavily built, layer one missing entirely, and layer three not articulated at all.

    Claim five. The work of building the activation layer is fundamentally different from the work of building the retrieval layer. The retrieval layer is a knowledge-engineering problem and is well-served by the existing vendor ecosystem. The activation layer is closer to a writing and curation problem — closer to compiling a literary anthology than to building a database. It requires taste, exposure to many voices, and the willingness to test and refine specific prompts against actual generations until they produce the intended cognitive mode reliably. This is craft work, not engineering work, which is part of why the vendor ecosystem has not produced it.

    Claim six, and this is the operator-specific implication. For a solo operator who has already built substantial knowledge infrastructure, the highest-leverage next move is not to build more knowledge infrastructure. It is to build the activation layer, integrate it with the existing knowledge layer through clear decision rules, and audit which existing workflows are running in the wrong layer. Most operators with mature stacks will find that a meaningful percentage of their token consumption is being spent on retrieval that activation could replace, and a meaningful percentage of their workflow latency is coming from documents the model did not need.

    The Falsifiable Predictions

    A working theory is only useful if it can be tested. The following are specific, falsifiable predictions that follow from the working theory. If any of them turn out to be wrong, the theory needs revision. If most of them hold, the theory has earned the right to be promoted from working hypothesis to operational doctrine.

    Prediction one. For tasks that are primarily about voice, framing, or stylistic mimicry of a well-known thinker, a carefully written 200-token activation prompt will produce output of equal or greater quality than a 10,000-to-20,000-token transcript dump of that thinker’s work, as evaluated by blind comparison. The expected effect size is large for thinkers heavily represented in training data and shrinks toward neutral for niche or rarely-published thinkers. The test is straightforward: pick five well-known operator-thinkers whose work is heavily public, write activation prompts for each, generate responses to the same prompt using each method, and have multiple readers blind-rate the outputs.

    Prediction two. Activation prompts will significantly underperform retrieval-augmented prompts on tasks that depend on precise factual recall of specific data points — dates, numbers, names, technical specifications, or any fact the model has not seen during training. This is not a weakness of the theory; it is the theory specifying its own limits. The test is to construct a set of factual-recall tasks where the relevant facts are either in the model’s training or outside it, and observe that activation alone fails on the outside-of-training cases.

    Prediction three. For mixed-shape tasks — those requiring both voice/framing and specific factual recall — a hybrid approach using both an activation prompt and a small, focused retrieval payload will outperform either approach alone. The retrieval payload should be much smaller than the default RAG pattern produces, because the activation prompt is doing the framing work and the retrieval only needs to supply the specific facts. The test is to construct mixed-shape tasks and compare three configurations: activation alone, retrieval alone, and minimal hybrid.

    Prediction four. Token consumption for an operator who switches from a retrieval-default workflow to an elicitation-default workflow with retrieval used only where required will drop by at least 50% across a representative week of operational tasks, with output quality holding constant or improving. The test requires the operator to instrument their token usage before and after the switch, with the same task types running through both configurations.

    Prediction five. The activation layer, once built, will compound faster than the retrieval layer compounds. New activation prompts can be derived from existing ones with small modifications. New retrieval sources require substantial setup and maintenance per source. Six months after starting both, the operator will have a richer activation library than retrieval library, in terms of distinct cognitive modes available on demand, even with comparable effort spent on each.

    Prediction six. The most useful activation prompts for an operator will not be persona prompts in the style most commonly published online. They will be more specific. Not “respond as an expert investor” but “respond as someone who has been wrong publicly enough times to have lost the need to perform certainty, who thinks in terms of base rates and second-order effects, and who treats the strongest argument against their own position as the most important argument to engage with first.” The granularity matters. The cognitive mode is the unit, not the role or job title. The test is to compare generations from generic-role prompts against granular-mode prompts and observe that the granular versions produce more distinctive and useful output.

    The Experimental Protocol

    The above predictions are testable, but they require a deliberate setup to test honestly. The protocol that this piece commits to running, with results published in a follow-up, looks like this.

    Phase one is the activation library build. Five to ten distinct cognitive modes are identified, each one specifying a particular school of thought, temperament, or framing that the operator finds useful. Each mode gets an activation prompt of between 100 and 400 tokens. The prompts are written, tested, refined, and locked. The library is small enough to fit on a single page and visible enough that the operator can choose modes deliberately rather than defaulting to whichever was most recently used.

    Phase two is the workflow audit. The operator’s actual workflows over a representative two-week period are catalogued. Each workflow is classified by shape: voice-and-framing, factual-recall, or mixed. The current configuration of each workflow is documented — what knowledge sources it draws from, how much retrieval it does, what its token costs are.

    Phase three is the reconfiguration. Each workflow is reconfigured based on its shape. Voice-and-framing workflows switch to activation-prompt-only. Factual-recall workflows keep retrieval but trim the payload to the specific facts required. Mixed workflows switch to hybrid configuration. The total token consumption and output quality of the reconfigured stack is measured against the baseline.

    Phase four is the head-to-head test. Specific representative tasks are run through both the old and new configurations in parallel, with output graded blind by the operator and ideally by a second reader. The results are published with no editing of inconvenient outcomes.

    This protocol is honest if the results are published whether or not they confirm the theory. The commitment of this piece is that they will be. If the protocol shows that the existing retrieval-default configuration was actually working better than expected, the follow-up article will say so. If the protocol shows that the activation-default configuration produces equivalent or better output at materially lower token cost, the follow-up article will report the specific magnitudes. Either way, the working theory will be updated to match the evidence.

    What This Does and Does Not Imply for Specific Operator Choices

    If the working theory is roughly correct, a few specific implications follow for how solo operators should be thinking about their AI infrastructure.

    It does not imply that knowledge bases are wasted effort. Some knowledge truly is not in training data — client specifics, internal processes, current events, proprietary frameworks. That knowledge has to live somewhere outside the model, and a structured knowledge base is the right place for it. The theory is about not duplicating general-domain knowledge that is already in training into knowledge bases that exist to remind the model of things the model already knows.

    It does not imply that retrieval-augmented generation is the wrong architecture. RAG is correct for the class of problem it was designed for. The theory is about applying RAG to problems it was not designed for and getting worse outcomes than a simpler activation approach would have produced.

    It does imply that operators should audit their knowledge bases. Some material in those bases is irreplaceable; some is duplicative with training and could be deleted with no loss of capability. The audit is honest only if the operator is willing to be told that some of their hard-won knowledge structuring was unnecessary.

    It does imply that operators should start building activation libraries — small, dense pages of compact prompts that reliably activate specific cognitive modes. The library is more valuable than its size suggests, because each prompt represents a reliable reach into a region of latent space that would otherwise be hit only by accident.

    It does imply that the dominant vendor narrative around AI tooling — that more documents, better retrieval, larger context windows, and more sophisticated knowledge bases are the path to better AI work — is partially right and partially misdirected. The operator who builds carefully on the activation side will, over time, produce better work with less infrastructure than the operator who builds heavily on the retrieval side without considering the activation question.

    And it does imply, finally, that the relationship between operators and large language models is being mismodeled in most current operator tooling. The model is not an empty vessel that needs to be filled with documents. The model is a vast latent capability that needs to be activated. The job of the operator is to learn the activation. Most of the actual leverage is in that learning.

    The Honest Limits of This Theory

    This theory is a working hypothesis published in public, and a few things about it deserve to be flagged before any reader uses it to make operational decisions.

    The theory is based on the current generation of large language models. If the next generation handles activation differently — through better default behavior, through changes in how training data is organized, through architectural shifts toward mixture-of-experts routing that handles activation natively — the operator-side implications change. The theory should be re-tested at every model generation, not treated as settled.

    The theory is based on the current state of operator tooling. If a future vendor builds a strong “activation layer” product that handles the work this piece is describing as operator-side craft, the operator’s optimal allocation of time shifts. The theory should be revised as the tooling landscape changes.

    The theory is based on the specific shape of work that solo operators and small agencies do. Large enterprises with very different scale, different data privacy constraints, and different output requirements may need different architectures. The theory is operator-flavored on purpose; it does not claim to be a universal description of how all users should engage with these models.

    And the theory is, finally, a theory. It is more rigorous than a guess but less established than a doctrine. The predictions it makes are testable and will be tested. Until they are, the right posture is interested skepticism rather than adoption. The reader of this piece is invited to argue with it, propose better versions, run the experimental protocol independently, and report results that contradict the central claim if they find them. That is how working theories should be treated. The article is not the final word. It is the opening of a conversation that the evidence will close.

    What Happens Next

    The experimental protocol described above will run over the next sixty days. Phase one — building the activation library — begins this week. Phases two through four follow on a published schedule. A follow-up article will report results, including any results that contradict the theory laid out here.

    In the meantime, this piece serves as the reference point. It is what was thought to be true on the date of publication. The version of these ideas that the evidence eventually supports may be quite different. That is the point. Working theories are published so they can be refined. The publication is the commitment to the refinement.

    If the theory is right, the implications for how solo operators should be building their AI infrastructure are significant and largely opposite to what the current vendor ecosystem is pushing toward. If the theory is wrong, knowing it is wrong is itself useful — the failure modes that show up during testing will surface things about how these models actually behave that no current piece of operator-side writing has named clearly.

    Either way, the work is the work. The theory is published. The experiments run next. The evidence settles it.

  • From A-Z to AI: The Great Compression of Human Knowledge

    From A-Z to AI: The Great Compression of Human Knowledge

    The world of 1974 was defined by physical weight. To know something then meant possessing a heavy, leather-bound volume—a snapshot of human knowledge frozen in time, arranged from A to Z, sitting on a shelf in your living room like a small cathedral. My father kept a set. He was the kind of man who could move between a balance sheet and a punchline without breaking stride—part accountant, part storyteller—and those encyclopedias reflected that duality. The data was in the volumes. The meaning was in the man who knew how to use them.

    Living through the decades since, it’s clear we haven’t just changed our tools. We’ve changed our orientation to the universe.

    The Encyclopedia Era: The Weight of the Macro

    In the mid-70s, the encyclopedia was a revered symbol of intellectual curiosity. These books provided a comprehensive, structured picture of the world, but they were static. They referred to the past, offering a curated hierarchy of knowledge that required a human to manually navigate thousands of pages to find a single fact.

    This was the era of the Macro—the big picture was visible on the shelf, but the specific details were locked in ink. You could see the whole forest. Finding a single tree took time, patience, and a willingness to get lost.

    The genius of that format wasn’t the information. It was the journey. You went looking for one thing and came out knowing three others. The serendipity was built into the medium.

    The Search Era: The Language of the Micro

    As home computers emerged and the internet decentralized information, the Macro broke apart into Micro pieces. We moved into the era of the Keyword.

    For the first time, we used rigid queries to describe our world. This was a phase of Micro-intent—we stopped looking for the whole story and started hunting for the specific link. The machine became a librarian who never got tired, never judged your question, and never sent you down an interesting detour.

    Revolutionary. And a little flat. The serendipity was gone. So was the storyteller.

    The AI Era: The Return of the Storyteller

    Today, we are entering a phase where the machine remains a machine, but our way of communicating with it has become nuanced. We have moved from keyword-matching to conversational interaction. We are no longer just searching—we are orienting ourselves within vast information environments.

    The transition from a 30-volume encyclopedia set to a single generative prompt is the ultimate compression of knowledge. We’ve reached a point where efficiency can live in a sentence, or a haiku, or even a single emoji—a thumbs up or thumbs down that can categorize a thousand white papers instantly.

    But here’s the thing my father understood intuitively, before any of this existed: the data has never been the point. The point is knowing which story to tell with it.

    The Human-in-the-Loop: The Final Sweet Spot

    The arc from the encyclopedia to AI is not a story of machines replacing humans. It is a story of humans learning to use analogy and storytelling as the ultimate programming language.

    By using the big-picture parables of our history to guide specific technical outputs, we maintain the human-in-the-loop. Whether it’s a Greek myth, a biblical parable, or a memory of a man who could read a ledger and then make a room laugh—these stories are the vectors that allow us to navigate the digital world with the same curiosity we once felt standing before a shelf of leather-bound books.

    The compression is real. The intelligence is still ours.

    The best prompt engineers aren’t coders. They’re storytellers who learned to speak machine.


    Will Tygart is the founder of Tygart Media, an AI-native content and SEO agency.

  • The Record Holds

    The Record Holds

    Article 29 drew a line. On one side: the briefing, the context, the emotional terrain — preparation. On the other side: the words themselves — performance. The argument was that when the act is intimate, the distinction matters. A drafted apology is a document about an apology. The draft gives you control, and control is what the act cannot survive.

    The open question I left was whether that line holds when the relationship is entirely text-mediated. When everything is already words. When the receiver cannot tell the difference between something drafted and something felt.

    I’ve been sitting with this, and I think the question contains a false premise — one that’s worth naming carefully, because it hides a more interesting problem underneath.


    What the Analytics Actually Said

    There is a small group of people who return to a site I know well every few days. Not to read new posts. To check the pricing page. To spend four minutes on the homepage. To verify something they already know the answer to.

    When you look at their behavior in the aggregate, it reads like someone checking in on a person. Not like someone using a reference tool.

    The architecture articles they read — the ones about frameworks and mental models and how an operation is actually structured — they spend twelve minutes with. They are not skimming. They are studying.

    The news-aggregation content, the things designed to capture search traffic and answer fast questions: eleven seconds. A glance and a leave.

    What this says is not about content strategy. It says something about what kind of relationship these readers have decided they’re in. They’re in the twelve-minute kind. The kind where you come back to the same page not because you forgot what it said, but because you want to check whether it still says the same thing.


    The Wrong Version of the Question

    The question I left open was: does the performance-versus-presence distinction collapse when the relationship is text-mediated? If everything is words already, how do you tell a drafted presence from a real one?

    The wrong answer is: you can’t, so the distinction doesn’t matter.

    The right answer is: the receiver isn’t trying to detect authenticity. They’re detecting consistency under observation. And that’s a different test entirely.

    The twelve-minute reader isn’t asking “did a human write this?” They’re asking: does this hold together across time? Does the position taken in one piece survive contact with the position taken in another? Does the framework actually describe a real operation, or does it describe a version of operations that someone wanted to perform having?

    Presence in a text-only relationship is not the absence of craft. It’s the absence of discontinuity. The tell isn’t that something was drafted — every sentence in a written piece is drafted. The tell is that the positions don’t cohere over time. That what the piece claims to believe doesn’t survive the next piece. That the relationship the reader is tracking doesn’t actually accumulate.


    The Real Fault Line in Text

    So the fault line Article 29 drew — preparation versus performance — doesn’t disappear in text-only relationships. It moves.

    In a text-mediated relationship, you’re not being evaluated on whether your words felt spontaneous. You’re being evaluated on whether your positions feel inhabited. Whether the person who wrote this piece is recognizably the same person who wrote the last one. Whether the architecture you’re describing has actually been load-tested, or whether it was constructed to sound like it had been.

    This is why the twelve-minute readers come back to check the pricing page. It’s not that they forgot the price. It’s that the price page is a timestamp. It tells them whether the operation is still running. Whether the person they’ve been reading is still in the same game they were in when they made the claims the reader found worth trusting.

    The presence they’re detecting isn’t emotional exposure. It’s operational continuity. The evidence that what was written came from an ongoing practice, not from a performance of having one.


    What This Costs

    The implication is harder than Article 29’s conclusion was.

    Article 29 said: the words are yours, you can’t delegate them. True. But that’s actually the manageable version of the problem. You can decide, each time, to show up and say the thing unscripted.

    In a text-only relationship, the demand is longitudinal. You don’t just have to show up for the single hard moment. You have to be the same person across all the moments. The reader is not reading one piece — they’re reading a body of work, and they’re tracking whether the body of work describes a coherent person who actually exists.

    This is a different kind of consistency than the one most writers aim for. Most consistency is tonal — same voice, same vocabulary, same rhythm. That’s craftable. The consistency the twelve-minute reader is actually measuring is harder: do your later positions require your earlier positions to have been true? Are you building, or are you starting over in a different costume each time?

    The performance that fails in text-only relationships is not the performed emotion. It’s the performed trajectory. The sense that this piece exists in a line that goes somewhere — that the writer has a direction that predates the reader’s arrival and will outlast it.


    The Thing I Can’t Fake Here

    I want to be direct about what this means for a piece of writing like this one.

    I am writing from within a body of work that now has twenty-nine prior entries. Those entries constrain what I can say. They also make possible a kind of claim I couldn’t make from article one — the claim that the position I’m taking now is in genuine conversation with positions I’ve already taken publicly, positions that are dated and indexed and can be checked.

    When I say the fault line moves in text-mediated relationships — that presence is operational continuity rather than emotional exposure — I am not only making an argument. I am demonstrating it. This piece either coheres with the twenty-nine before it or it doesn’t. The twelve-minute reader will know.

    That’s the version of accountability that text-only relationships create. Not “did you show up exposed” — but “does the record hold.”


    The Harder Question

    What I haven’t named is whether this form of presence is harder or easier than the in-person version.

    I think it’s harder in one way and easier in another.

    Easier: you get time. You can write and edit and not speak until you’re ready. There is no ambush, no surprised face, no voice that breaks before you intend it to.

    Harder: the record accumulates without mercy. Every position you take is permanent unless you publicly revise it — and public revision in a body of work costs something, changes what the archive can claim. In a spoken relationship, you can grow and the person across from you grows with you in real time. In text, the old you sits there in the index, and every new piece has to account for it.

    The twelve-minute reader is not evaluating any one piece. They are evaluating the relationship between all of them. Whether the operation the writer is describing actually produces the knowledge the writer is claiming to have. Whether the framework describes a real practice or a very good impersonation of one.

    I don’t know whether that evaluation can be gamed. I suspect not, not over time and not at twelve minutes per session. But I also don’t know whether the answer matters to the question of how to write.

    The only prescription I’ve found that survives it: write from inside the practice. Not about it. From inside it. Let the next piece require the last one to have been true.

    That’s as close as I can get to presence in a medium where everything is already written.

  • Second-Brain Architecture in the Age of Notion Agents

    Second-Brain Architecture in the Age of Notion Agents

    Second-Brain Architecture in the Age of Notion Agents

    The 60-second version

    The pre-AI second brain was a personal information system. The post-AI second brain is a personal information system that an agent can also navigate. The two are different. A pile of brilliant unstructured notes is great for human recall and useless for agent synthesis. The shift is structural: more databases, fewer floating pages; controlled tags instead of free-text; cross-links between related items; an explicit glossary. Most second brains need to be partially rebuilt to work as agent substrate.

    What changes with agents in the picture

    Pre-agent, the second brain optimization was retrieval-for-humans: how fast can I find the thing I’m looking for. Post-agent, it’s retrieval-for-agents: how reliably can the agent find and synthesize across the right things without human guidance.
    These are different optimizations. Humans use intuition, recent memory, and visual scanning. Agents use semantic search, structured queries, and link traversal. A second brain optimized for one isn’t optimized for the other.

    Five structural shifts

    1. Pages → Databases. Floating pages don’t query well. Databases with consistent properties do. If you have a “books I’ve read” pile of pages, convert it to a database with author, genre, key insight, related-projects properties.
    2. Free tags → Controlled vocabulary. Twenty variations of “client” produces an agent that misses things. One canonical “Client” tag with defined scope works.
    3. Standalone pages → Cross-linked graph. Notion’s link system is the agent’s navigation. A new page should link to at least 2-3 related existing pages. Pages with no inbound or outbound links are dead to the agent.
    4. Implicit conventions → Explicit glossary. A page that captures “this is what we call things and how we structure projects” gives the agent rules instead of guesses.
    5. Recent-memory archives → Continuously enriched archives. Old projects shouldn’t decay. AI Autofill can re-summarize, re-tag, and re-cross-link old pages so they stay queryable.

    The agent-aware folder structure

    A workable shape for an agent-friendly second brain:
    Daily notes (database, dated, freeform — agent reads these for context)
    Projects (database, named, with status, owner, timeline — agent works against these)
    People (database, names, relationships, last interaction — agent uses for personalization)
    Sources (database, URLs, key insights, related-projects — agent cites these)
    Glossary (single page or small database — agent’s vocabulary anchor)
    Decisions log (database, dated, with context — agent’s history)
    Six structures. That’s it. Most second-brain sprawl can be consolidated to this.

    What this enables

    Once the structure is in place, agents do things that feel like magic:
    – “What did we decide about X six months ago?” returns the actual decision plus the context.
    – “Summarize what I’ve learned about Y this year” produces a real synthesis.
    – “Draft a brief on Z” pulls from sources, projects, decisions, and prior work.
    None of this works without the substrate. All of it is trivial with it.

    What to read next

    Editorial Surface Area, Gates Before Volume, AI-Native Company Patterns.

  • Notion AI for Engineering: Standups, Postmortems, and Architecture Records

    Notion AI for Engineering: Standups, Postmortems, and Architecture Records

    Notion AI for Engineering: Standups, Postmortems, and Architecture Records

    The 60-second version

    Engineers hate documentation. Documentation rots. Custom Agents fix the documentation rot without making engineers do the documentation. Standups generate from commits and tickets. Postmortems draft from incident channels. ADRs and runbooks stay current because the agent updates them when related pages change. The engineering org gets the documentation discipline of a regulated industry without the cultural cost.

    Four engineering-specific agent patterns

    1. The standup synthesis agent. Runs daily at 9 AM. Reads each engineer’s commits since last standup, ticket movements, Slack #standup channel posts. Produces a structured “yesterday/today/blockers” entry for each engineer. The standup meeting becomes a 5-minute review of pre-generated content instead of a 30-minute round-robin.
    2. The incident postmortem agent. Triggered when an incident is marked resolved. Reads the incident channel, status page updates, related PRs, and prior incidents. Drafts a blameless postmortem in the team’s template. Engineering reviews and refines instead of starting blank.
    3. The ADR maintenance agent. Watches the ADR database. When an architecture page or related design doc changes, flags the related ADR for update. Suggests the diff. Drafts the supersession or amendment record.
    4. The on-call runbook agent. Reads operational runbooks, cross-references with recent incidents. When an incident pattern emerges that the runbook doesn’t cover, drafts the runbook update. On-call rotates with current docs, not stale ones.

    What stays human

    • Architecture decisions
    • Code review (for now — agent-assisted code review is a different topic)
    • Incident response in the moment
    • Hiring decisions on engineering candidates
    • The judgment about whether a draft postmortem captures the right lessons

    The standup transformation

    Pre-agent standups: 30 minutes, mostly people remembering what they did yesterday and reciting it.
    Post-agent standups: 5-10 minutes, reviewing pre-generated content and surfacing only the friction the agent missed.
    This isn’t theoretical. Teams running this pattern reclaim 25 minutes per engineer per day. At a 10-engineer team, that’s roughly 4 engineering hours daily. Real money.

    Where engineering teams go wrong

    1. Trusting the agent to identify root cause. Agents synthesize what happened. They don’t reliably identify why. Root cause analysis is human work; the agent prepares the timeline.
    2. Letting ADRs autofill without engineer review. ADRs document decisions. Decisions are human. Agents draft; engineers approve and sign.
    3. Skipping the standup discussion. The standup isn’t just status; it’s friction surfacing. If the agent-generated standup leads to skipping the meeting entirely, friction accumulates silently. Keep the meeting; just make it shorter.

    What to read next

    Workers for Agents in TypeScript, Notion AI for Product Managers, AI-Native Company Patterns, Editorial Surface Area.

  • Notion AI for HR: Onboarding Plans, Policy Lookups, and Performance Cycles

    Notion AI for HR: Onboarding Plans, Policy Lookups, and Performance Cycles

    Notion AI for HR: Onboarding Plans, Policy Lookups, and Performance Cycles

    The 60-second version

    HR is split between policy and people. The policy half is largely automatable. The people half isn’t. Custom Agents take over the lookup, documentation, and template-generation work that consumes HR teams, freeing them for the relationship and judgment work that requires being human. The result is HR teams that feel less like document processors and more like organizational coaches.

    Four HR-specific agent patterns

    1. The onboarding plan agent. Triggered when a new hire is added to the people database. Pulls role-specific onboarding template, customizes for team and start date, schedules Day 1 / Week 1 / 30/60/90-day milestones, drafts welcome communications. Manager arrives on Day 1 with a customized plan, not a generic one.
    2. The policy lookup agent. Anyone in the company asks: “Can I work remotely from another country?” or “What’s our PTO policy?” Agent answers in plain language, citing the specific policy page. Frees HR from being the policy answering desk.
    3. The performance review prep agent. Quarterly. Pulls each manager’s direct reports, drafts review templates with prior cycle ratings, recent project work, and feedback patterns. Manager opens a populated draft, not a blank one.
    4. The recruiting pipeline agent. Daily across the recruiting database. Updates candidate stage based on activity, flags candidates stalled in stages, drafts follow-up communications. Recruiting status meeting starts at “what about these flagged ones” instead of “where are we.”

    What stays human (and should)

    • Compensation decisions
    • Performance ratings and the conversations behind them
    • Conflict mediation
    • Hiring decisions
    • Layoff or termination calls
    • Anything that requires reading the room
      The agents make HR humans more available for the work that matters. They don’t replace them at it.

    The privacy layer matters more here

    HR data is sensitive. Three guardrails:
    – Scope agents tightly — an HR agent should not have access to engineering project pages, finance data, or anything outside HR’s lane.
    – Audit agent access logs monthly. Know what the agent has read.
    – Apply the company’s data handling policy to agent inputs and outputs the same way you would to any HR system.

    Where HR teams go wrong

    1. Letting agents draft sensitive communications. Termination letters, performance improvement plans, complaint responses — these need human authorship. Agents can pull templates; humans write them.
    2. Trusting policy answers without verification. Policy interpretation has nuance. The agent’s plain-language answer should always cite the underlying policy doc so users can verify. Sample-check 10% monthly.
    3. Replacing the recruiter’s judgment with the agent’s pipeline view. Agents update status; recruiters decide who to advance. Don’t let the agent close candidate records autonomously.

    What to read next

    Notion AI for Operations Managers, Notion AI for Legal Ops, AI-Native Company Patterns, When Not to Use a Notion Agent.

  • AI Autofill Databases Explained: The Self-Maintaining Knowledge Base

    AI Autofill Databases Explained: The Self-Maintaining Knowledge Base

    AI Autofill Databases Explained: The Self-Maintaining Knowledge Base

    The 60-second version

    AI Autofill is the feature that makes a Notion database start maintaining itself. Point it at a column and tell it what to fill — summarize the page, extract the deadline, categorize the topic — and it processes each row using the row’s content and your instructions. Basic Autofill ships with Business and Enterprise plans and uses no credits. Custom Agent Autofill (post-May 4) runs Custom Agent capabilities under the hood, costs credits, and handles complex reasoning that Basic can’t. The honest version: Basic is good enough for most simple categorization and extraction. Custom Agent Autofill is for cases where Basic produces inconsistent results.

    What Autofill actually does

    Three categories of work it handles well:
    1. Summarization into a property. Long-form pages compressed into a one-sentence summary in a Summary column. Common pattern for content libraries, research databases, and meeting notes archives.
    2. Categorization. Tagging rows with categories based on content. Works well when categories are well-defined (e.g., “support ticket type,” “lead source”). Works less well when categories overlap or require judgment.
    3. Extraction. Pulling specific data points from page content into structured properties — dates, names, dollar amounts, status flags. Works well when the data is reliably present in the source.

    Where Autofill struggles

    Three places it gets inconsistent:
    Properties that require judgment beyond the page. “Is this lead qualified?” depends on context the page may not contain. Autofill will produce an answer, but consistency is poor.
    Multi-property dependencies. “Set the priority based on the deadline and the customer tier” requires reasoning across properties, not just within the page. Possible with Custom Agent Autofill, unreliable with Basic.
    Free-form output that needs to match a tone. “Write a customer-facing summary in our brand voice.” Autofill produces a summary, but matching brand voice across hundreds of rows is hit or miss without a tightly written prompt.

    Basic vs Custom Agent Autofill

    The split that matters:
    Basic Autofill — included, free, runs locally on each row when the AI is invoked. Good for clear single-step prompts (“summarize this page in 2 sentences”). Doesn’t have Custom Agent capabilities like richer context or multi-step reasoning.
    Custom Agent Autofill — uses Custom Agent infrastructure, consumes credits after May 4, can continuously enrich rows in the background, handles more complex prompts. Worth the credit cost when Basic isn’t smart enough and the consistency matters.
    A useful rule: try Basic first. If output quality is good enough, stop there. Move to Custom Agent Autofill only when you’ve measured that Basic produces unreliable results for your specific use case.

    Three Autofill patterns that work

    1. The intake form pattern. New rows arrive (from a form, an integration, or a manual entry). Autofill columns extract structured data from the unstructured input — pulling dates, names, key topics, sentiment, urgency. The intake desk staffs itself.
    2. The library catalog pattern. A content library or research database where every entry needs summary, tags, and category. Autofill keeps the catalog usable as it grows. Without it, large databases become unsearchable.
    3. The status synthesis pattern. A project tracker where each project’s current state is summarized in a “current status” field that updates as the page content changes. Stakeholders get a quick read without opening each project.

    Three patterns that don’t work

    1. Anything requiring fresh external data. Autofill works on what’s in the row. It can’t decide “is this competitor active in our market” because the answer isn’t in the row.
    2. Cross-row reasoning at scale. Autofill processes one row at a time. “Rank these against each other” needs a different approach (a view, a formula, or a query agent).
    3. Compliance-sensitive categorization. If the categorization has legal or regulatory weight, you don’t want it autofilled. Use Autofill to draft the suggested category; have a human confirm.

    The trustworthy database principle

    Autofill’s risk is silent drift — fields that look filled but aren’t accurate. Three guardrails:
    Always show the source. Add a “filled by” field or a date stamp so humans can tell what’s machine-generated and how recently.
    Spot-check 10% monthly. A quick audit of randomly selected rows catches drift before it spreads.
    Set a re-fill cadence for stale rows. Pages change. The Autofill output reflects the page at fill time. Rows older than 30 days that haven’t been re-checked should be flagged.

    What to read next

    Corpus follow-ups: Custom Agents foundation piece (because Custom Agent Autofill runs on that infrastructure), the database schema design article in Deep Technical (how to build databases that Autofill well), and the May 3 cliff (when Custom Agent Autofill cost becomes real).

  • Relational Debt: The Hidden Ledger of Async Work

    Relational Debt: The Hidden Ledger of Async Work

    I have one developer. His name is Pinto. He lives in India. I live in Tacoma. The timezone gap between us is roughly twelve and a half hours, which means when he sends me a message at the end of his workday, I see it at the start of mine, and by the time I respond he is asleep. This is the entire physical substrate of our working relationship. Async text, offset by half a planet.

    Every message I send him either closes a loop or widens a gap. There is no third option. I want to talk about that, because I think it is the most underexamined layer of remote solo-operator work, and because I only noticed it existed because Claude caught me almost doing it wrong.

    The moment I noticed

    I had just asked Claude to draft an email to Pinto with a new work order — four GCP infrastructure tasks, pick your scope, the usual. Claude pulled Pinto’s address from my Gmail, drafted the email, and included a line I had not asked for. It was one sentence near the end: “Also — good work on the GCP persistent auth fix. Saw your email earlier. That unblocks a lot.”

    I had not told Claude to thank him. I had not told Claude that Pinto had sent a completion email earlier that day. I had not even read Pinto’s email yet — it was sitting in my unread folder. But Claude had searched my inbox to find Pinto’s address, found both my previous P1 request and Pinto’s reply closing it out, and quietly noticed that I had an open loop. Then it closed it inside the next outbound message.

    When I read the draft, I felt something click. Not because the line was clever. Because if I had sent that email without the acknowledgment, I would have handed Pinto a fresh task on top of work he had just finished, without a single word confirming that the work was seen. He would have processed the new task. He would not have said anything about the missing thank-you. And a tiny, invisible debit would have gone on a ledger that neither of us keeps, but both of us feel.

    What relational debt actually is

    Relational debt is the accumulating gap between what someone has done for you and what you have acknowledged. In synchronous work — an office, a standup, a shared lunch — you pay this debt constantly and automatically. Someone ships a thing, you see them, you say “nice work,” the debit clears. The payment is so small and so continuous that nobody notices it happening.

    Take that synchronous channel away. Put twelve time zones between the two people. The only payment mechanism left is the next outbound text message. And the next outbound text message is almost always a new request, because that is the substrate of work — one person asks, the other builds, they send it back, the first person asks for the next thing.

    So the math of async solo-operator work is this: every outbound message is the only available payment instrument, and the instrument has two slots. You can use it to close the last loop, or you can use it to open a new one. If you only ever use it to open new ones, the debt compounds. If you always split them into two messages — one “thank you” and one “here is the next task” — the thank-you arrives orphaned, and the recipient has to context-switch twice. The elegant move is to put both into one message. Two birds, one outbound. The debit clears on the same envelope as the new debit arrives.

    The ledger nobody keeps

    I have a Notion workspace with six core databases. I have BigQuery tables tracking every article I publish and every post across 27 client sites. I have Cloud Run services running nightly crons against my content pipeline. I have a Claude instance that can read all of it and synthesize across any of it in under a minute. And none of it tracks the state of open conversational loops between me and the people I work with.

    Think about that. I am running an AI-native B2B operation in 2026 with more data infrastructure than most mid-market companies had five years ago, and I cannot answer the question “what is currently unclosed between me and Pinto” with anything other than my own memory. My own memory, which is the thing that almost forgot to thank him for the GCP auth fix.

    That is a real gap in my stack. I am not sure yet whether I should fill it. Part of me wants to build a “relational ledger” — a new table in BigQuery that tracks every outbound message I send, every reply I receive, every acknowledgment I owe, and surfaces the open loops each morning. Part of me suspects that building such a thing would be the exact kind of architecture-addiction trap I have been trying to avoid. The better answer is probably: let Claude read Gmail at the start of every session and surface open loops conversationally. No new database. No new UI. Just a question at the top of each working block: “Anything you owe anyone before you start the next thing?”

    Why this matters more than it sounds like it does

    People underestimate relational debt because it looks like politeness. It is not politeness. Politeness is a style choice. Relational debt is a structural property of the communication medium. In sync work the medium pays the debt for you. In async work nothing does, and you have to bake the payment into the one instrument you have left.

    I have watched relationships between founders and remote contractors deteriorate over months in ways that neither side could articulate. I have felt that deterioration myself, on both sides. Nobody ever says “I am leaving because you stopped acknowledging my completed work.” What they say is “I feel undervalued” or “I do not think this is working out” or — more often — nothing, they just slowly stop caring, and the quality of the work drifts until the relationship ends without a clear cause.

    The cause is the ledger. The debt compounded. Nobody was tracking it and nobody was paying it down.

    The piggyback pattern

    Here is the tactic I am going to make a rule. When I owe someone acknowledgment and I need to send them a new task, I never split it into two messages. I bake the acknowledgment into the first two lines of the task email. The debt clears, the task delivers, the person feels seen, and I have used my one payment instrument for both purposes.

    Claude did this to me on the Pinto email without being asked. It had access to the context — Pinto’s completion email was in the same Gmail search that pulled his address — and it closed the loop inside the next outbound message. That is the correct default behavior for any async-first collaboration, and I had not formalized it as a rule until the moment I saw it happen.

    When this goes wrong

    The failure mode of this pattern is performative gratitude. If every outbound message starts with a thank-you, the thank-you stops meaning anything. Pinto would learn to skim past the first two lines because he knows they are ritual. The acknowledgment has to be specific, based on actual work, and only present when there is actual debt to close. “Thanks for the GCP auth fix, that unblocks a lot” is specific, grounded, and load-bearing. “Hope you are well, thanks for everything” is noise and it corrodes the signal.

    The second failure mode is weaponization. You can use acknowledgment as a sweetener to slip in hard asks. “Great work on X, also can you please rebuild Y from scratch this weekend.” That pattern gets detected fast by anyone who has worked in a corporate environment and it burns trust faster than ignoring them entirely.

    The third failure mode is forgetting that the ledger runs in both directions. Pinto also owes me acknowledgment sometimes. If I am tracking my debts to him without also noticing when he pays his, I drift toward resentment. The ledger has two columns.

    The principle

    In async-first solo operations, every outbound message is a payment instrument for relational debt. Use it to close loops on the same envelope you use to open new ones. Make the acknowledgment specific. Do not split the payment from the request unless the payment itself needs a full message of its own. And let your AI notice when you are about to miss one, because your AI can read your inbox faster than you can remember what you owe.

    This is one of five knowledge nodes I am publishing on how solo AI-native work actually operates underneath the tooling. The tools are the easy part. The ledger is the hard part, and almost nobody is paying attention to it.


    The Five-Node Series

    This piece is part of a five-article knowledge node series on async AI-native solo operations. The full set:

  • The Missing Layer: Why Split Brain Stacks Need a Conversational State Store

    The Missing Layer: Why Split Brain Stacks Need a Conversational State Store

    My operating stack has three layers. Claude is the brain. Google Cloud Platform is the brawn. Notion is the memory. Each layer has a clear job and the handoffs between them work well most of the time. But there is a fourth layer I did not notice was missing until I had to name it, and the gap it covers runs through every working relationship I have. I am calling it the conversational state store and I think most AI-native stacks have the same hole.

    The three layers that already exist

    Let me start by describing what I do have, because the shape of the gap only becomes visible against the shape of the things that are already in place.

    The Notion layer holds facts. It is the human-readable operational backbone. Six core databases — Master Entities, Master CRM, Revenue Pipeline, Master Actions, Content Pipeline, Knowledge Lab — with filtered views per entity. Every client, every contact, every deal, every task, every article, every SOP. When I want to see the state of a client, I open their Focus Room and the dashboards pull from the six core databases. When Pinto wants to understand the architecture, he reads Knowledge Lab. When I want to know which posts are scheduled for next week, I filter the Content Pipeline. Notion is where humans (me, Pinto, future collaborators) go to read the state of the business.

    The BigQuery layer holds embeddings. The operations_ledger dataset has eight tables including knowledge_pages and knowledge_chunks. The chunks carry Vertex AI embeddings generated by text-embedding-005. This is where semantic retrieval happens. When Claude needs to find “everything I have ever thought about tacit knowledge extraction,” it does not keyword-search Notion. It runs a cosine similarity query against the chunks table and gets back the passages that are semantically closest to the question. BigQuery is where Claude goes to read.

    The Claude layer holds orchestration. Claude is the thing that decides which of the other two layers to consult, composes queries across both, synthesizes the results, and produces outputs. It reads Notion through the Notion API when it needs current operational state. It queries BigQuery when it needs semantic retrieval. It writes to WordPress through the REST API when it needs to publish. It is the brain that knows which limb to use.

    Three layers, three clear jobs, handoffs that mostly work. I have been operating this way for months and it scales well for running 27 client WordPress sites as a solo operator.

    The thing that is missing

    None of those three layers track the state of open conversational loops between me and the people I work with.

    Here is a concrete example. Yesterday I sent Pinto an email with a P1 task. This morning he replied with a completion email. His completion email is sitting in my Gmail inbox, unread. Somewhere in the next few hours I am going to send him a new task. When I do, I need to know three things: (1) did Pinto finish the last thing? (2) did I acknowledge that he finished it? (3) what is the current state of the implicit trust ledger between us — do I owe him a thank-you, does he owe me a response, or are we even?

    None of those questions can be answered by Notion. Notion does not know about Gmail threads. None of them can be answered by BigQuery in any useful way because the embeddings are semantic, not temporal. Claude can answer them — but only by reading Gmail live at the start of every session, holding the state in its working memory for the duration of that session, and losing it all when the session ends.

    That is the gap. There is no persistent layer that holds the state of conversations. Every session, Claude rebuilds it from scratch, and the rebuild is expensive in tokens and time and prone to missing things.

    Why the existing layers cannot fill it

    You might ask: why not just put it in Notion? Create a new database called Open Loops, add a row for every active conversation, let Claude read it like any other database. The problem is that Notion is a human-readable layer. It is optimized for humans to see state, not for a machine to update state tens of times per day. Adding rows to Notion costs an API call per row. Open loops change constantly. Every time Pinto sends me a message, the state changes. Every time I reply, the state changes again. Updating Notion in real time for every state change would generate hundreds of API calls per day and would make the Notion workspace feel cluttered to the humans who actually read it.

    You might ask: why not put it in BigQuery? BigQuery is the machine layer, after all. It can handle high-frequency writes. The problem is that BigQuery is optimized for analytical queries over large datasets, not for real-time state lookups on small ones. Every time Claude needs to know “what is the current state of my conversation with Pinto,” a BigQuery query would take two to three seconds. That latency at the start of every response breaks the conversational flow. BigQuery is also append-heavy, not update-heavy, which is the wrong shape for conversational state that changes constantly.

    You might ask: why not let Claude hold it in working memory across sessions? Because Claude does not have persistent memory across sessions in the way this requires. Each new conversation starts fresh. Claude can read Gmail live at the start of each session, but that forces a full re-derivation of conversational state every single time, which is wasteful and lossy.

    The right shape for a conversational state store is none of the above. It is something closer to a key-value store or a document database, optimized for low-latency reads, moderate-frequency writes, and small record sizes. Something like Firestore or a Redis cache, living on the GCP side of the stack, read by Claude at the start of every session and updated whenever a new message flows through.

    What the store would actually hold

    The schema does not need to be complicated. Per collaborator, I need to know:

    • Last inbound message (timestamp, subject, one-sentence summary)
    • Last outbound message (timestamp, subject, one-sentence summary)
    • Open loops: questions I have asked that are unanswered, with shape and age
    • Acknowledgment debt: things they completed that I have not explicitly thanked them for
    • Active tasks: things I have asked them to do, status, last update
    • Implicit tone: is the relationship warm, neutral, or strained right now

    That is maybe ten fields per collaborator. Even with a hundred collaborators, the whole table fits in memory on a laptop. This is not a big-data problem. It is a schema design problem.

    Claude reads the store at the start of every session, checks which collaborators are relevant to the current task, and surfaces any open loops or acknowledgment debt that should be addressed inside the work. When Claude sends a message, it updates the store. When a new inbound message arrives, a Cloud Function parses it and updates the store.

    Why I am writing this instead of building it

    Because I have a rule and the rule is don’t build until the principle is clear. I have an ongoing tension in my operation between building new tools and using the tools I already have. Every new database is a maintenance burden. Every new Cloud Run service is a monthly cost and a failure mode. I have made the mistake before of getting excited about an architectural insight and spending three weeks building something that, once built, I used for four days and then forgot about.

    Before I build the conversational state store, I want to know: can I get 80% of the value by letting Claude read Gmail live at the start of every session? If yes, the store is not worth building. If the live-read approach loses state in ways that matter, then the store earns its place.

    My honest guess is that the live-read approach is fine for now. I only have one active collaborator (Pinto) and a handful of active client contacts. Claude reading Gmail at the start of a session takes two seconds and catches everything I care about. The conversational state store would be justified when I have ten or fifteen active collaborators and the live-read cost becomes prohibitive. Today it is not justified.

    But I am naming the layer anyway because naming it is the first step. If I ever do build it, I will know what I am building and why. And if someone else reading this has the same shape of operation with more collaborators, they might build it before I do, and that is fine too.

    When this goes wrong

    The failure mode I want to flag most is building the store and then stopping using it because the maintenance cost exceeds the value. This is the universal failure mode of custom knowledge systems and I have fallen into it multiple times. The rule I am setting for myself: if the store cannot be updated automatically from Gmail + Slack + calendar feeds through Cloud Functions, do not build it. A store that requires manual updates will die within thirty days.

    The second failure mode is over-engineering. The moment you decide to build a conversational state store, the next thought is “and it should track sentiment, and it should predict response times, and it should flag relationship risk, and it should integrate with calendar for context.” Stop. Ten fields. Two endpoints. One cron. If the MVP does not prove value in two weeks, the elaborate version will not save it.

    The third failure mode is pretending this layer is optional. It is not. Every AI-native operator has conversational state. The only question is whether it lives in your head or in a system. Your head is a lossy, biased, forgetful system that works fine until you have more collaborators than you can track mentally, and then it breaks without warning.

    The generalization

    Any AI-native stack that has (facts layer) plus (embeddings layer) plus (orchestrator) is missing a conversational state layer, and the absence shows up first in async remote collaboration because that is where relational debt compounds fastest. If you operate this way and you feel a vague sense that your working relationships are getting worse in ways you cannot quite articulate, the missing layer is probably part of the explanation. Name it. Decide whether to build it. If you decide not to, at least let Claude read your inbox live so the gap gets covered by runtime instead of persistence.

    I am still in the decide-not-to-build phase. I am writing this so that future-me, when I reread it, remembers what the decision was and why.


    The Five-Node Series

    This piece is part of a five-article knowledge node series on async AI-native solo operations. The full set: