Tag: AI Operations

  • Why Most Restoration AI Projects Fail — and What the Few That Work Have in Common

    Why Most Restoration AI Projects Fail — and What the Few That Work Have in Common

    This is the first article in the AI in Restoration Operations cluster under The Restoration Operator’s Playbook. The previous cluster, Mitigation-to-Reconstruction Intelligence, sets up why operational discipline is now the central question. This cluster goes deep on what AI actually does inside that operational discipline — and what it cannot do.

    The honest state of restoration AI in 2026

    Walk any restoration trade show floor in the second half of 2025 or the first half of 2026 and the dominant theme on every booth is some version of artificial intelligence. AI-powered estimating. AI-driven scheduling. AI-augmented documentation. AI for dispatch, for adjuster communication, for moisture analysis, for content management, for drying calculations, for customer experience. Some of it is real. Most of it is rebranding of capabilities that existed two years ago. A small portion of it represents a genuine step change.

    The owners walking the floor are presented with all of it as roughly equivalent — booth fronts and presentations make modest features look revolutionary and revolutionary capabilities look modest. What is actually happening underneath is that the industry is in the noisy middle of a real technology transition, and the noise is making it almost impossible for an operator to tell signal from sales pitch.

    The honest state of the field is this. The infrastructure layer that makes serious AI deployment possible became a managed service in early 2026. The model capabilities have crossed thresholds in the last twelve months that genuinely matter for operational work. The handful of restoration companies that started building deliberately two or three years ago are now producing visible results. The much larger group that has tried to add AI to their operations through software purchases or pilot programs has, in most cases, very little to show for the money and time spent.

    This article is about why that pattern exists. The next four articles in this cluster will be about what to do differently.

    The shape of the failure

    Restoration AI failures tend to look the same across companies. Different vendors, different use cases, different team compositions, but the pattern is consistent enough to describe.

    The company identifies a problem that AI seems likely to help with. Often it is something high-profile and visible — initial customer intake, scheduling, estimate review, document generation. The company evaluates a few vendors, picks one, signs a contract, and runs an implementation that follows the vendor’s recommended deployment plan. The first ninety days produce a flurry of activity, training sessions, configuration work, and demo wins. The next ninety days produce friction as the tool encounters edge cases, the team discovers it does not handle the company’s actual workflow as cleanly as it handled the demo, and the senior operators start working around it. By month nine, the tool is technically still in use but practically marginal — a few people use a few features, the original sponsor has stopped championing it, and the executive team has quietly moved on to the next initiative.

    The line item is still on the budget. The case study gets used in vendor marketing. The operational reality is that nothing has changed, except that the company is now slightly more cynical about AI than it was before the project started.

    This pattern is not unique to restoration. It is the dominant pattern in operational AI deployments across most industries, including ones with much larger technology budgets than restoration has. The reasons it happens are predictable, and they are not the reasons the vendor explains in the post-mortem.

    The first reason: no captured judgment to deploy

    The most common reason restoration AI projects fail is that the company has not done the upstream work that would let any AI system actually contribute. AI tools are extraordinary at applying captured judgment to new situations. They are useless at inventing judgment that was never captured.

    The companies that have failed AI deployments almost always failed at this layer. They bought a tool expecting it to encode the operational wisdom of their senior operators automatically, by exposure to data or by some species of magic. The tool, of course, did not do that. What it did was apply generic, internet-trained patterns to specific, restoration-specific situations, producing outputs that were correct in form, plausible in tone, and wrong in operational substance often enough to be unusable.

    The senior operators in the company looked at the outputs, recognized them as wrong, and stopped trusting the tool. The tool’s hit rate dropped because the operators were not engaging with it. The vendor pointed at the low engagement as the implementation problem. The implementation team tried to drive engagement through training and mandate. None of it worked, because the underlying issue — the absence of captured judgment for the tool to apply — was never addressed.

    This is the reason the prep standard discussion in the previous cluster matters so much for the AI conversation. A documented standard is captured judgment. It is the substrate that any AI system needs in order to produce outputs the senior team will trust. Companies that have invested in documenting their judgment can plug AI tools in and get force multiplication. Companies that have not done the documentation work cannot, regardless of which tool they buy or how much they spend.

    This is also why the AI projects that have worked tend to be in companies that built operational documentation discipline first, often without explicitly thinking about AI. The documentation work made the AI work possible. The AI work then made the documentation work pay off in a way the company had not initially anticipated.

    The second reason: optimizing the wrong layer

    The second most common reason restoration AI projects fail is that they target the wrong operational layer.

    The natural inclination of an operator looking at AI is to point it at the most visible, customer-facing problem. The intake conversation. The estimate. The customer email. These are the places where operators feel the pain most acutely, and they are also the places where AI demos look most impressive.

    They are also the places where AI is most likely to produce results that range from disappointing to actively damaging. The customer-facing layer is the layer where a small error in tone, judgment, or accuracy is most expensive. It is also the layer where the AI tool has the least context — it does not know the customer, the property, the history, the carrier dynamics, or any of the situational specifics that an experienced operator would bring to the conversation.

    The companies producing real results from AI are deploying it almost entirely in the operational middle layers, not the customer-facing top layer or the systems-of-record bottom layer. The middle layers are where the work of running the business happens — file review, scope analysis, scheduling logic, sub coordination, photo organization, documentation packaging, internal handoff briefings, training material generation. These are unglamorous capabilities. They are also the ones where a competent AI tool can demonstrably free up senior operator time and improve the quality of the operational substrate.

    An AI tool that drafts a clean handoff briefing from the mitigation file for the rebuild estimator to review in thirty seconds is worth more, operationally, than an AI tool that drafts a customer-facing email. The handoff briefing tool removes thirty minutes of estimator time per job, every day, on every job. The customer email tool removes a small amount of friction on a small subset of communications and introduces a meaningful risk of a tone-deaf message going out under the company’s name. The first tool compounds. The second tool gets shut off after a bad incident.

    The companies that have figured this out are not bragging about their AI deployments. They are quietly using AI as connective tissue between operational layers that already worked, and the senior team is feeling the difference in their workload without anyone outside the company necessarily noticing the change.

    The third reason: no senior operator in the loop

    The third reason restoration AI projects fail is that they are run as IT projects rather than operational projects.

    An IT-led deployment optimizes for technical correctness, integration with existing systems, user adoption metrics, and vendor relationship management. None of those are the things that determine whether the tool produces operational value. The thing that determines operational value is whether the tool is producing outputs that a senior operator would have produced, at speed, with the same judgment.

    That determination cannot be made by an IT team or by a vendor. It can only be made by the senior operator whose judgment is supposed to be the benchmark. If that operator is not in the loop on a daily or weekly basis, the tool drifts away from useful behavior and toward whatever the vendor’s defaults happen to be. By the time anyone notices, the tool is producing plausible-looking outputs that are not actually useful, and the operational team has stopped relying on them.

    The companies that have made AI work have, in every case, embedded a senior operator in the deployment as the operational owner. Not as a sponsor. As the owner. The senior operator reviews the tool’s outputs, flags drift, requests adjustments, and is accountable for whether the tool is actually doing what it was bought to do. The owner’s name is on the project. The owner’s calendar reflects the commitment. When the tool produces a wrong output, the owner is the first to know and the first to drive the correction.

    This is uncomfortable for senior operators, who already have full-time jobs running operations and who did not sign up to babysit a software tool. It is also non-negotiable. AI deployments without an embedded senior operational owner do not produce results, in restoration or in any other operational context. The companies pretending otherwise are making the same mistake every other industry made in their first wave of AI adoption.

    The fourth reason: the wrong evaluation horizon

    The fourth reason restoration AI projects fail is that they are evaluated on a horizon that does not match how AI actually delivers value.

    Most AI tools produce a small benefit in their first few weeks of use, because the novelty creates engagement and the early use cases tend to be the simple ones. The benefit then plateaus or even regresses as the team encounters edge cases and the engagement drops. If the company is evaluating the tool at month three, the assessment will look mediocre.

    The tools that compound — and AI tools either compound or fade — start to show real value around month six to nine, when the captured judgment from the team’s interaction with the tool starts to inform the tool’s behavior, when the team has built workflow habits around the tool’s strengths, and when the company has developed an internal language for what the tool is for and what it is not for. Companies that evaluate at month three see the plateau and cancel. Companies that commit to a twelve to eighteen month horizon and continue investing in the operator-tool collaboration see the compounding.

    This horizon mismatch is one of the reasons most AI line items get killed. It is also one of the reasons the companies that persist past the awkward middle period end up with a meaningful operational advantage that is hard for newer entrants to replicate quickly.

    What the few successful deployments have in common

    The restoration companies that have produced visible results from AI in 2026 share a small number of characteristics. None of the characteristics are about the specific tools they bought. They are all about how the company approached the work.

    The company had operational documentation discipline before they started the AI work. Either an existing prep standard, a structured set of training materials, a documented decision framework, or some equivalent body of captured operational wisdom that could serve as the substrate the AI tool would operate against.

    The company targeted operational middle-layer use cases first, not customer-facing top-layer ones. The early wins were in things like file packaging, handoff briefing generation, scope review acceleration, training material drafting, and sub-coordination — boring internal capabilities that compounded into significant senior-operator time recovery.

    The company embedded a senior operator as the day-to-day owner of the AI capability. That operator’s calendar reflected the commitment, and their judgment was the benchmark for whether the tool was producing value.

    The company committed to a twelve to eighteen month horizon for evaluation, with the understanding that the awkward middle period was structural rather than a sign of failure.

    The company invested in the feedback loop between operator and tool. When the tool produced a bad output, that became data that improved the next output. The loop was deliberate, not incidental.

    The company avoided the trap of trying to deploy across the whole organization at once. The successful deployments started narrow, proved value in one operational layer, and then expanded based on what was working rather than on a master rollout plan.

    None of these characteristics are about technology. They are about operational seriousness applied to technology. The companies that brought operational seriousness to the work got results. The companies that treated AI as a technology purchase did not.

    Where this cluster is going

    The remaining articles in this cluster will go deep on each of the patterns the successful deployments share. The next article will address the question every owner asks first: given limited time and budget, what should we actually build first? That question has a defensible answer in 2026, and it is not the answer most vendors are pitching.

    The article after that will go deep on what it actually means to treat the senior operator as the source code for an AI deployment — not as a metaphor, but as a literal description of where the operational substance of the tool comes from. Then an article on the economics of agent-assisted operations, which is the most underdiscussed topic in restoration AI right now and the one that will determine which companies are still profitable in 2028. And finally an article on how to evaluate AI tools without getting fooled by demos, vendor pitches, or the noise that currently dominates the conversation.

    The point of the cluster is not to recommend specific tools. Tools change every quarter. The point is to give restoration owners a durable mental model for thinking about AI deployments — one that will still be useful in 2027 and 2028, regardless of which vendors have come and gone in the meantime. Operators who internalize the model will make consistently better decisions about AI than operators who chase the current vendor cycle. The model is the asset.

    Next in this cluster: what to actually build first when you have limited time and budget — and why the obvious answer is almost always wrong.

  • Replacing the Interviewer: What the Human Distillery App Can and Cannot Do

    Replacing the Interviewer: What the Human Distillery App Can and Cannot Do

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The extraction protocol works. The pivot signal lexicon is learnable. The four-layer descent can be taught. The question is whether it can be deployed without a trained human interviewer in the room — and if so, how much of the value survives the translation.

    This is the duplication problem at the center of the Human Distillery business model. Will can run an extraction session. An app cannot run the same session. But an app can run a version of the session — and for a large subset of extraction use cases, the version is sufficient.

    Understanding what transfers and what doesn’t is the whole architectural question.

    What Transfers to an App

    The four-layer question structure is codifiable. A stateful conversational agent — not a chatbot, a system that maintains a running knowledge map of what’s been surfaced and what’s still needed — can execute the question sequences in order, navigate the domain-specific question libraries for a given vertical, and detect the linguistic markers of pivot signals in real time.

    “It’s hard to explain” is detectable by NLP. Hedging patterns are detectable. Energy shifts in voice are detectable by acoustic analysis. Deflection to process — “the policy says…” — is detectable. The app can recognize these signals and adjust its question path, slowing down at tacit knowledge boundaries and applying the correct follow-up from the signal response library.

    The processing pipeline from transcript to structured concentrate is fully automatable: chunking by topic boundary, entity extraction, claim isolation, confidence scoring, contradiction flagging across multiple sessions, multi-model distillation rounds. This is where AI earns its keep. A human doing this manually would take days per session. The pipeline does it in minutes.

    Domain-specific question libraries can be built from prior extractions and expanded with each new session. The more sessions the app runs in a given vertical, the richer its question library becomes. This is the compounding effect that makes the app more valuable over time.

    What Doesn’t Transfer

    Three things resist automation in ways that won’t be resolved by better models:

    Micro-hesitation reading. The half-second pause before an answer that signals the subject knows more than they’re about to say. The slight change in phrasing when someone moves from what they’re comfortable saying to what they actually think. These are real-time, embodied, relational signals. A text-based app misses them entirely. A voice app gets closer but still lacks the visual channel that carries a significant portion of this information.

    Protocol abandonment. The decision to stop following the four-layer sequence because the subject just said something unprompted that is more important than anything in the protocol. Expert interviewers make this call constantly. They recognize the thread that, if followed, goes somewhere the protocol would never reach. An app will follow the signal response library. It won’t recognize when the library should be put down.

    Trust calibration. Whether the subject is performing for the recording or actually sharing. This is not detectable from content analysis. It requires the social intelligence to know when to lower the formality, when to match the subject’s energy, when to say something self-deprecating to signal that this is a peer conversation and not an evaluation. Subjects share differently with someone they trust. The app cannot build that trust.

    The Honest Architecture

    The tiered model that emerges from this analysis:

    Tier 1 — App-led extraction. Well-mapped domains with accessible knowledge. The subject is cooperative. The question library is deep. The knowledge being sought is in Layers 1 and 2. The app handles the session. Will reviews the concentrate before delivery.

    Tier 2 — Human-led extraction with app processing. High-stakes sessions. Guarded subjects. Knowledge at the outer edge of verbalization (Layer 3 and 4). Will conducts the session. The app runs the processing pipeline. Will reviews and approves the concentrate.

    Tier 3 — Full human extraction and distillation. Strategic engagements. Subjects who will only speak candidly to a person they know. Knowledge so embedded that it requires real-time relational judgment to surface at all. Will does everything.

    The business model implication: Tier 1 is volume. Tier 3 is premium. The ratio shifts over time as the app’s question libraries deepen and its signal detection improves. What begins as mostly Tier 2 and 3 eventually becomes mostly Tier 1, with Will’s direct involvement reserved for the sessions where only a human can get the door open.

    The app is not a replacement for the protocol. It’s a multiplier for the protocol — allowing it to run at a scale that a single human operator never could, while preserving the human layer for the cases that actually require it.


  • The Human Distillery: A Methodology for Extracting Tacit Knowledge for AI Systems

    The Human Distillery: A Methodology for Extracting Tacit Knowledge for AI Systems

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    Every organization has two kinds of knowledge. The documented kind — processes, policies, SOPs, training materials — lives in manuals and wikis. The other kind lives in people’s heads: the adjustments made without thinking, the thresholds learned from expensive mistakes, the pattern recognition that executes in a second but couldn’t survive a PowerPoint slide.

    The first kind is easy to feed into an AI system. The second kind is what makes the organization actually work. And it almost never gets captured before it walks out the door.

    This gap — between what’s written and what’s known — is where most enterprise AI implementations quietly fail. The system gets the documentation. It never gets the knowledge. The result is an AI that gives the same answer a new employee would give, while the 15-year veteran shakes their head and does it differently.

    The Human Distillery methodology exists to close that gap. It is a structured extraction protocol for converting tacit knowledge into dense, structured artifacts — books for bots — that AI systems can actually use. Not summaries. Not transcripts. Knowledge concentrates: information-rich artifacts that encode relationships, decision logic, and confidence alongside the facts themselves.

    This article is the methodology reference. It covers what tacit knowledge is and why it resists standard capture methods, the four-layer extraction protocol that surfaces it, the pivot signal lexicon that tells you when you’re close, what a knowledge concentrate looks like as a structured artifact, and where human judgment remains irreplaceable in the pipeline.


    Why Standard Methods Don’t Work

    The instinct when trying to capture organizational knowledge is to reach for one of three tools: a survey, an interview, or a documentation request. All three fail at tacit knowledge for the same reason: they ask people what they know. Tacit knowledge is knowledge people don’t know they know. It operates below the level of conscious articulation. You cannot survey it out of someone. You cannot ask them to write it down. You have to create the conditions under which it surfaces — and then recognize it when it does.

    Forms and surveys capture what people think they do. Conversations capture what they actually do and why. The difference between those two things is the entire product.

    A 20-year insurance adjuster asked “what’s your process for evaluating a water damage claim?” will give you the documented version: inspect the loss, review the policy, scope the damage, issue the estimate. This is accurate and useless. Ask them about a claim that went sideways and they will, unprompted, tell you that they always check the crawlspace first on older properties in this zip code because the contractor community there has a pattern of scope creep on foundation moisture that the initial inspection never catches. That’s the knowledge. It lives in the deviation from the process, not the process itself.


    The Four-Layer Descent

    The extraction protocol descends through four distinct layers in sequence. Each layer unlocks the next. Skipping a layer produces thin output. Rushing a layer produces performed output. The full descent, executed correctly, surfaces knowledge the subject didn’t know they were carrying.

    Phase 0: Disarmament

    Before any extraction begins, the status dynamic has to be neutralized. The subject needs to stop performing expertise for an evaluator and start explaining their world to a curious outsider. The difference in what comes out is dramatic.

    The disarmament move: position yourself as someone who genuinely doesn’t know. “I’ve never seen a job like this — walk me through it like I’m shadowing you.” This does two things. It forces explanation of steps the subject considers so obvious they wouldn’t otherwise mention — which is exactly where embedded knowledge concentrates. And it signals that there’s no correct answer being evaluated, which reduces the filtering that kills tacit knowledge capture.

    Open with failure. “Tell me about a job that went sideways” surfaces edge cases, exceptions, and judgment calls that success stories never reveal. People tell the truth in their failure stories. They’re not protecting anything.

    Layer 1: Surface Protocol

    The question: “What’s your process when X happens?”

    What it gets: The documented version. What the subject would write in an SOP. What they’d tell a new hire on day one. Accurate. Insufficient. Necessary baseline.

    Why you need it: The surface protocol establishes the frame. It’s the map. Everything that comes after is about finding where the territory diverges from the map — and those divergences are where the knowledge lives.

    Layer 2: Exception Probing

    The question: “When do you deviate from that?”

    What it gets: The adaptive layer. The judgment calls that experience produces. The cases where the checklist gets ignored because the situation demands something the checklist can’t accommodate. This is the first layer where genuine tacit knowledge begins to surface.

    The follow-up sequence: “And when does that happen?” → “How do you know it’s that situation?” → “What would you have done three years ago that you wouldn’t do now?” Each question peels back one more layer of accumulated judgment.

    Layer 3: Sensory and Somatic

    The question: “How do you know it’s that and not something else?”

    What it gets: Pattern recognition so ingrained it operates below conscious awareness. The knowledge the subject has never verbalized because no one has ever asked them to. This is the hardest layer to surface and the most valuable thing in the concentrate.

    What it sounds like: “The smell is different.” “The drywall feels wrong.” “Something about the way the insurance company rep is phrasing the emails.” These are not vague — they’re ultra-specific to a domain. The job is to slow down at these moments and press: “Describe the smell.” “What does wrong feel like compared to right?” “What in the phrasing specifically?” The subject usually thinks they can’t explain it. They can. They just haven’t been asked slowly enough.

    Layer 4: Counterfactual Pressure

    The question: “What would break if you weren’t here tomorrow?”

    What it gets: The knowledge hierarchy. What actually matters versus what’s ritual. Most organizations don’t know which is which until the person who knows leaves. This layer surfaces the load-bearing knowledge — the things that if absent would produce visible failures, not just suboptimal outcomes.

    The follow-up: “Who else knows that?” The answer is almost always “no one” or “maybe [one person].” That’s the knowledge risk. That’s also the product.


    The Pivot Signal Lexicon

    Proximity to tacit knowledge produces specific signals in conversation. Recognizing them in real time is the skill that separates a good extraction session from a great one. Miss these signals and you stay in Layer 1. Catch them and you descend.

    Signal What It Means The Move
    “It’s hard to explain…” The subject is about to verbalize something they have never articulated before. This is the most valuable signal in the lexicon. Slow everything down. “Try anyway.” Do not fill the silence. Do not offer a simpler question. Wait.
    “You just kind of know” Layer 3 boundary. The subject is pointing directly at tacit knowledge they don’t know how to surface. “Walk me through the last time you just knew. What did you notice first?”
    Hedging and qualifiers The subject is filtering. They have an answer but aren’t sure it’s acceptable to say. “Generally speaking…” “In most cases…” “It depends…” are all hedges. “Off the record — what actually happens?” Or: “What’s the version you’d tell a colleague vs. what you’d put in the manual?”
    Sudden energy or animation You’ve touched something they care about. The subject’s pace increases, their posture changes, they lean in. This is a live thread to a knowledge cluster. Follow it immediately. Drop the protocol. “Tell me more about that.” The protocol can resume. This thread may not come back.
    Deflection to process The subject is avoiding the judgment layer. When asked what they do, they tell you what the process says to do. Often accompanied by “the policy is…” or “we’re supposed to…” “But what do you do when that breaks down?” The emphasis on ‘you’ reframes the question from institutional to personal, which is where the knowledge actually lives.
    Pausing before a number The subject is calculating from experience, not retrieving from documentation. The pause is the gap between “what the spec says” and “what I know from doing this 200 times.” Ask for the number, then: “Where does that come from?” The answer to the second question is often the most valuable thing in the session.
    Unprompted stories The subject has moved from answering your questions to accessing their own knowledge map. Stories they tell without being asked are almost always pointing at something important. Let it run. If the story ends without the embedded knowledge surfacing, ask: “What made that one different from a normal job?”

    The Knowledge Concentrate: What the Output Actually Looks Like

    A transcript is raw. A summary is thinner in size but barely denser in information. A knowledge concentrate is smaller than either and more information-rich than both — because it encodes relationships, decision logic, and confidence alongside the facts themselves.

    The schema for a knowledge concentrate has five components:

    Entity graph. Every named concept, process, person-role, piece of equipment, and decision point that surfaces in the extraction, mapped as nodes with typed edges between them. Not a list — a graph. The relationships are the knowledge. The entities alone are just vocabulary.

    Decision logic. Every when-then-because statement extracted from the session. “When the moisture readings are above X in a crawlspace with Y flooring type, we always do Z because A.” Structured with confidence scores: is this firsthand knowledge, observed pattern, or secondhand information?

    Benchmarks. Every number that surfaces in extraction — thresholds, timelines, costs, rates, counts — with context, source count, and variance. A benchmark from one interview has low confidence. The same benchmark confirmed across six interviews in the same market has high confidence and is ready to be used as ground truth.

    Tacit signatures. The things that are hard to explain — captured as best as they can be verbalized, with a confidence flag that signals to the AI system consuming them: this is approximate. This is the residue of knowledge that the extraction process got close to but couldn’t fully surface. It’s still valuable. It tells the AI where human judgment is concentrated.

    Provenance. Traceable but anonymized. How many sources contributed to each claim. Whether a given piece of knowledge is individual or cross-validated. What industry and market it came from.

    An AI system consuming a knowledge concentrate in this format doesn’t just know facts — it knows which facts to trust, how to chain them into decisions, and where the knowledge is thin enough that human judgment should be called in.


    What the App Can Do and What It Can’t

    The four-layer protocol and the pivot signal lexicon can be partially codified. A stateful conversational agent — not a chatbot, a genuinely stateful system that maintains a running knowledge map of what’s been surfaced and what’s still needed — can execute the question sequences, detect linguistic pivot signals, navigate domain-specific question libraries, and run the processing pipeline from transcript to structured concentrate.

    What it cannot do is the thing that makes the difference between a good extraction and a complete one:

    It cannot read the half-second of hesitation before an answer that signals the subject knows more than they’re about to say. It cannot decide, in the middle of an unprompted story, that this tangent is the most important thing in the session and the protocol should be abandoned to follow it. It cannot calibrate trust — cannot sense whether the subject is performing for the recording or actually sharing, and adjust accordingly. It cannot distinguish a valuable tangent from genuine noise in real time.

    These are not gaps that better models will close. They are inherently relational and embodied. They require a human who is genuinely present in the conversation, not processing a transcript of it.

    The honest architecture for a distillery operation is therefore tiered. The app handles extraction volume — the sessions where the knowledge is relatively accessible, the domain is well-mapped, and the question library is sufficient. The human handles the sessions where the stakes are highest, the subject is guarded, or the knowledge being sought is at the outer edge of what can be verbalized. And the human is always the quality gate on the final concentrate, regardless of which path produced it.


    Why This Works in Any Industry

    Tacit knowledge is not a property of any particular field. It is a property of human expertise at depth. Wherever humans have been doing something long enough to develop judgment that exceeds documentation — which is everywhere — the distillery protocol applies.

    The domain changes the question library. The pivot signals are universal. The four-layer structure works in restoration, in legal practice, in medicine, in financial services, in manufacturing, in competitive sports coaching, in culinary production. Any field where experience produces something that training cannot replicate is a field where a knowledge concentrate has value.

    The buyers are the organizations trying to make that knowledge portable. The AI system that needs to give the same answer a 20-year veteran would give. The consultant whose insights live only in their head. The franchise trying to replicate the judgment of its best operators across 400 locations. The company that just lost its most important employee and is only now discovering what they actually knew.

    The product is not content. It is not a report. It is a structured knowledge artifact that makes someone else’s irreplaceable expertise replicable — at least partially, at least for the cases the documentation currently handles worst.

    That’s the distillery. Extract. Distill. Deploy.


    Frequently Asked Questions

    How long does a single extraction session take?

    A full four-layer descent with one subject takes 60–90 minutes. Rushing below 45 minutes consistently produces shallow output — the session ends before Layer 3 is reached. Three to five sessions with different subjects in the same domain produces a concentrate with enough cross-validation to have meaningful confidence scores on the decision logic and benchmarks.

    What industries is this most applicable to?

    Any industry where experience produces judgment that documentation can’t replicate. The highest-value applications are in fields with expensive mistakes (medical, legal, engineering), fields with long apprenticeship periods (skilled trades, finance, consulting), and fields where the knowledge is currently locked in one or two people (most small and mid-size businesses).

    How is this different from a McKinsey-style knowledge management engagement?

    Traditional knowledge management captures process documentation — what should happen. The distillery protocol captures judgment documentation — what actually happens, and why, and when the standard answer is wrong. The output is structured for AI consumption, not human reading. The concentrate is designed to be queried, not read.

    What happens to the concentrate after it’s produced?

    The concentrate is delivered to the client for ingestion into their AI infrastructure — as a RAG knowledge base, as fine-tuning data, as a reference layer for their AI assistant, or as structured context for their customer-facing AI systems. The format is designed to be immediately usable without further transformation. The provenance metadata ensures the client knows which claims to trust at what confidence level.

    Can the extraction protocol be deployed without a trained human interviewer?

    Partially. A well-built stateful conversational agent can execute the question sequences, detect linguistic pivot signals, and run the processing pipeline. What it cannot do is the real-time relational judgment that surfaces the deepest knowledge — the hesitation reading, the trust calibration, the decision to abandon the protocol and follow an unexpected thread. For accessible knowledge in well-mapped domains, the app is sufficient. For the knowledge closest to the surface of human expertise, the human remains in the loop.


  • Four-Layer Data Architecture: Building Around Behaviors, Not Tools

    Four-Layer Data Architecture: Building Around Behaviors, Not Tools

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The instinct, when building a complex operation, is to find one tool that can hold everything. One source of truth. One dashboard. One system of record for all data types.

    This instinct is wrong, and it produces exactly the kind of system it’s trying to avoid: a single tool that does everything poorly, a migration project that costs more than the original implementation, and a team that has learned to distrust the data because the tool was never designed for the behaviors it was forced to support.

    The behavior-first alternative for data architecture doesn’t start with “what tool can hold everything.” It starts with: what are the distinct behaviors this data needs to support, and which tool is genuinely best suited for each one?

    The Four Data Behaviors

    In a multi-site AI-native content operation, four distinct data behaviors emerge:

    Machine-generated operational data needs to be written and read by automated systems at high speed. Batch job results, embedding vectors, image processing logs, Cloud Run execution histories. No human looks at this data directly. It needs to be fast, cheap, and structured for programmatic access. GCP serves this behavior — Firestore for structured operational state, Cloud Storage for large artifacts, BigQuery for analytical queries across the full dataset.

    Human-actionable signals need to be displayed clearly enough that a person can take action without wading through noise. Site health alerts, content gaps, client status changes, task assignments. This data needs to be readable, filterable, and connected to the people who need to act on it. Notion serves this behavior — not because it’s the most powerful database, but because it’s the most human-readable one, with views that can surface exactly the signal each role needs.

    Published content needs to be delivered to web visitors and search engines at performance standards those audiences require. WordPress serves this behavior. It was designed for it. The mistake is asking WordPress to also serve as the storage layer for unpublished content, the analytics layer for content performance, or the task management layer for content production. It wasn’t designed for those behaviors and it’s not good at them.

    Files and documents need to be stored, versioned, and shared across tools and collaborators. Google Drive serves this behavior. Skills, SOPs, brand guidelines, exported data — anything that exists as a file rather than as structured data belongs in Drive, not in a database trying to handle file attachments as a secondary feature.

    Why Separation Produces Better Systems

    A four-layer architecture feels like more complexity than a single-tool approach. In practice it produces less complexity, because each tool is operating within its design constraints instead of being stretched beyond them.

    The signal-to-noise problem in most dashboards comes from forcing machine-generated data and human-actionable signals into the same view. The machine data overwhelms the human signals. The solution is usually “better filtering” — which is the wrong answer. The right answer is storing machine data where machines can read it and surfacing human signals where humans can act on them.

    The performance problem in most content operations comes from asking WordPress to be a content management system when it’s a content delivery system. The content that belongs in a CMS — drafts, revisions, briefs, research notes — should be in Notion. The content that belongs in a CDS — published articles, page templates, media files — should be in WordPress. When you separate these, both tools perform their actual function better.

    The data loss problem in most operations comes from treating the most convenient tool as the system of record. When content lives only in WordPress, a site failure is a data failure. When operational state lives only in a Cloud Run service, a deployment change is a state failure. The four-layer architecture ensures that each data type has a permanent home in the tool designed to hold it — and that the tools interact through APIs rather than through manual migration.


  • ADHD and AI-Native Operations: Designing Around the Behavior, Not Against It

    ADHD and AI-Native Operations: Designing Around the Behavior, Not Against It

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The conventional wisdom about ADHD and work is built around a simple premise: the ADHD brain is deficient in the behaviors that work requires, and management strategies exist to compensate for those deficiencies. More structure. Better schedules. Accountability systems. Tools designed to impose the consistency the brain doesn’t generate naturally.

    This is tool-first thinking applied to a human brain. And like most tool-first thinking, it produces systems that fight the behavior instead of serving it.

    The behavior-first alternative asks a different question: what does the ADHD brain actually do, at its best, and what system design would allow it to do more of that?

    What the ADHD Brain Actually Does

    Three behaviors characterize high-functioning ADHD cognition when the environment supports them:

    Hyperfocus. Sustained, intense concentration that arrives unbidden and runs at extraordinary depth for an unpredictable duration. Not concentration on demand — concentration that seizes the operator when a problem activates the interest system. The output of a hyperfocus session is disproportionate to the time invested, and the quality often exceeds what deliberate, scheduled work produces.

    Interest-based attention routing. The ADHD attention system allocates based on interest, novelty, urgency, or challenge — not importance. High-interest work gets exceptional focus. Low-interest work gets almost none. This is not a failure of will. It’s a feature of a different attentional architecture.

    Cross-domain pattern recognition. Rapid context-switching, which looks like distractibility in sequential-task environments, produces something valuable in environments that reward synthesis: the ability to connect observations across unrelated domains and identify patterns that single-domain experts miss.

    The System That Serves These Behaviors

    An AI-native operation designed around these behaviors looks different from a conventional productivity system:

    For hyperfocus: The system captures whatever the hyperfocus session produces — immediately, in full, without requiring the operator to organize it mid-session. The Second Brain stores the output. The cockpit session for the next day picks up the thread. The non-linearity of hyperfocus (jumping between connected insights, building in spirals) becomes productive because the AI can hold the full context of the spiral across sessions.

    For interest-based attention: Low-interest, deterministic work routes to automated pipelines. Haiku runs taxonomy fixes at scale. Cloud Run handles scheduled publishing. Batch jobs process a hundred posts while the operator is doing something that has activated their interest system. The attention that would have been coerced onto low-interest work is freed for the high-interest work where ADHD attention genuinely excels.

    For pattern recognition: The cross-domain synthesis that ADHD cognition produces naturally — connecting a restoration industry CRM insight to an AI architecture principle to a neurodiversity research finding — is exactly what generates the novel frameworks that constitute a knowledge operation’s core asset. This isn’t compensated for. It’s the product.

    The Architecture Principle

    The systems that emerged from designing around ADHD constraints are not ADHD-specific. They are better systems. External working memory (the Second Brain) outperforms internal working memory for complex multi-client operations regardless of neurology. Routing low-value-attention work to automation is better for any operator. Pre-staged context reduces friction for everyone.

    The ADHD constraints forced designs that a neurotypical operator would also benefit from — because the constraints that neurodivergence makes extreme are present in milder form in everyone. The behavior-first design process, applied to an ADHD brain, produced infrastructure. The same process, applied to any operation, produces the same result: systems that serve the actual behavior, compound over time, and don’t require the operator to fight their own cognition to function.


  • Separating Intelligence from Execution: The AI Work Order Architecture

    Separating Intelligence from Execution: The AI Work Order Architecture

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    AI systems are good at identifying problems. Automated systems are good at fixing them. The failure mode that kills most AI automation projects is building them as one thing instead of two.

    When you couple intelligence and execution in a single system, you get something that can do everything slowly and nothing reliably. The intelligence layer needs to be conversational, contextual, and judgment-driven. The execution layer needs to be deterministic, fast, and parallelizable. These are fundamentally different behaviors, and they require different tools.

    The Work Order as the Bridge

    The behavior-first design for AI automation has three distinct stages: identify (Claude analyzes a system and surfaces what needs to be done), deposit (Claude writes a structured work order to a persistent queue), and execute (a Cloud Run worker reads the work order and runs the fix).

    The work order is the key artifact. It’s the contract between the intelligence layer and the execution layer. A well-formed work order contains everything the execution layer needs to run without asking Claude any follow-up questions: the target (site, post ID, endpoint), the operation (what to do), the parameters (how to do it), and the success criteria (how to know it worked).

    When the work order is well-formed, the execution layer is a dumb runner. It doesn’t need to understand context, history, or judgment. It reads the work order, executes the operation, and writes the result back. The intelligence that produced the work order stays in the intelligence layer — which is exactly where it belongs.

    What This Looks Like in Practice

    In a multi-site content operation, Claude might analyze a WordPress site and identify 47 posts with missing FAQ schema. The tool-first approach runs Claude in a loop, generating and publishing schema for each post sequentially. This is slow, context-dependent, and fragile — if Claude loses context mid-run, the job is incomplete and the state is unclear.

    The behavior-first approach: Claude generates 47 structured work orders, one per post, and deposits them in a Notion database with status “Queued.” A Cloud Run service reads the queue and processes each work order independently, in parallel, writing results back to each row. Claude is done in minutes. The Cloud Run service finishes the execution while Claude is doing something else entirely.

    The behaviors are clean. The tools serve them. The system scales horizontally without requiring Claude to be in the loop for execution.

    The Two Lanes of AI Automation

    Not everything belongs in the work order queue. Some operations require judgment that the execution layer can’t replicate: content quality assessment, strategy decisions, anything where “it depends” is the correct first answer. These belong in a different lane — one where Claude stays in the loop through completion.

    A mature AI automation architecture has both lanes clearly defined. Deterministic operations (taxonomy fixes, schema injection, meta rewrites, image uploads, internal link additions) go to the work order queue and run without Claude. Judgment-dependent operations (content strategy, quality review, client recommendations) stay in the conversational layer where Claude’s judgment can be applied continuously.

    The discipline is in knowing which lane each operation belongs in — and resisting the temptation to put judgment-dependent work in the queue just because it would be faster. Faster execution of the wrong thing is not an improvement.


  • Tacit Knowledge Extraction: Why the Behavior Comes Before the AI System

    Tacit Knowledge Extraction: Why the Behavior Comes Before the AI System

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    Every organization has two kinds of knowledge. The first kind is documented: processes, policies, training materials, SOPs. The second kind is tacit: the adjustments people make without thinking, the thresholds they’ve learned from experience, the judgment calls they can execute in seconds but couldn’t explain in a meeting.

    The documented knowledge is easy to feed into an AI system. The tacit knowledge is what makes the organization actually work — and it’s almost never in a format that AI can use.

    The gap between these two knowledge types is where most enterprise AI implementations fail. Companies feed their AI the documentation and wonder why it can’t give the same answers a 10-year veteran would give. The answer is that the 10-year veteran isn’t running on the documentation. They’re running on the tacit layer — and nobody captured it.

    What Tacit Knowledge Extraction Actually Requires

    You cannot extract tacit knowledge through forms, surveys, or documentation requests. Tacit knowledge by definition is knowledge that the holder cannot fully articulate without a skilled interviewer pulling it out. The behavior that surfaces it is specific: a conversational sequence that descends through four distinct layers.

    Layer 1 — Surface protocol: “What’s your process when X happens?” This gets the documented version — what people think they do, what they’d write in an SOP. Necessary baseline but not the target.

    Layer 2 — Exception probing: “When do you deviate from that?” This surfaces the adaptive layer — the judgment calls that experience produces. The deviations are where tacit knowledge lives.

    Layer 3 — Sensory and somatic: “How do you know it’s that specific problem and not something else?” This is the hardest layer to surface and the most valuable. It captures knowledge that the holder has never verbalized — pattern recognition so ingrained it operates below conscious awareness.

    Layer 4 — Counterfactual pressure: “What would break if you weren’t here tomorrow?” This surfaces the knowledge hierarchy — what actually matters versus what’s ritual. Most organizations don’t know which is which until the person with the knowledge leaves.

    The Behavior Determines the Tool Stack

    Once this extraction behavior is understood, the tool selection for the AI system becomes clear. You need: a way to capture the conversation at high fidelity, a way to convert the transcript into structured knowledge artifacts, a storage layer that preserves the knowledge in a format AI systems can query, and an embedding layer that makes the knowledge semantically searchable.

    These are four distinct behaviors served by four distinct tools. The extraction conversation is a human behavior — no tool replaces it. The structuring is where AI earns its keep: running the transcript through multiple models with different attack angles, identifying the tacit signatures embedded in the language, organizing the output into the knowledge concentrate schema. The storage is a database decision. The embedding layer is a vector store.

    None of these tool choices could have been made intelligently without first understanding the extraction behavior. The behavior is the constraint that makes the tool selection tractable.

    The Minimum Viable Experiment

    For any organization that wants to capture its tacit knowledge layer before it walks out the door: four extraction conversations, transcribed and run through a three-model distillation round, produce a knowledge artifact dense enough to answer questions that the documentation cannot. The experiment takes a week and costs almost nothing. The cost of not doing it shows up when the person who holds the knowledge leaves and the organization discovers, for the first time, how much was never written down.


  • Build the System Around the Behavior, Not the Tool

    Build the System Around the Behavior, Not the Tool

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    There is a mistake that kills more technology projects than bad code, bad vendors, or bad timing combined. It happens before a single line is written, before a single subscription is purchased, before anyone even knows there’s a problem.

    The mistake is this: choosing the tool before understanding the behavior.

    It looks like a reasonable decision. You need to manage customer relationships, so you buy a CRM. You need to publish content, so you build around WordPress. You need to organize knowledge, so you set up Notion. The tool selection feels like the hard part — the research, the demos, the pricing comparisons. By the time you’ve chosen, you feel like the work is half done.

    It isn’t. You’ve just committed to building a system shaped like a tool instead of shaped like a behavior. And when the behavior and the tool don’t match, the system fails quietly — not in a crash, but in a slow drift toward abandonment, workarounds, and the quiet understanding that “we don’t really use that anymore.”

    The alternative is building the system around the behavior first. It sounds obvious. Almost nobody does it.


    What “Behavior-First” Actually Means

    A behavior is what actually happens — or needs to happen — in your operation. It’s not a goal, not a feature request, not a capability. It’s the specific sequence of actions, decisions, and handoffs that produce a result.

    Most system design starts with tools and works backward to behaviors. Behavior-first design starts with the behavior and works forward to the minimum set of tools that can serve it.

    The difference sounds subtle. The outcomes are not.

    When you start with the tool, you spend the first six months learning the tool’s shape and then trying to reshape your operation to fit it. When you start with the behavior, you spend the first six months building a system that serves the operation — and then choosing the simplest tool that delivers what the behavior requires.

    The tool-first approach produces complexity. The behavior-first approach produces leverage.


    Six Behaviors That Built This Operation

    The following examples are drawn from a single AI-native operation built over three years. None of them started with a tool selection. All of them started with the question: what actually needs to happen here?

    1. Write → Store → Distribute (The Content Pipeline)

    Most content operations are built around WordPress. The platform is the system. Articles go into WordPress, WordPress manages drafts, WordPress publishes, WordPress is the source of truth. This is tool-first design.

    The behavior is different. The behavior is: write a piece of content, preserve it permanently, distribute it to wherever it needs to go.

    When you build around that behavior, WordPress becomes one destination among several — not the system. Notion becomes the storage layer. WordPress becomes the distribution layer. The article exists independently of where it’s published. If WordPress goes down, if the WAF blocks you, if the site moves hosts — the content is not at risk. The behavior (write → store → distribute) is served by a stack of tools, none of which is the irreplaceable center.

    The practical result: every article written in this operation goes to Notion first, WordPress second. Not because Notion is a better publishing platform — it isn’t. Because the behavior requires permanent, accessible storage before distribution, and WordPress was never designed to be that.

    2. Identify → Deposit → Execute (The Work Order Architecture)

    The problem: an AI system can identify what’s wrong with a WordPress site in seconds — thin content, missing schema, broken taxonomy, orphan pages — but the identification and the fix are handled by completely different systems. The identification lives in a conversation. The fix lives in a deployment. There’s no bridge.

    The behavior is: Claude identifies a problem, deposits a structured work order, a Cloud Run worker executes it. The intelligence and the execution are decoupled. Neither layer needs to know how the other works.

    Built around that behavior, the tool choices become obvious. Notion holds the work order queue — not because Notion is a task management tool (though it is), but because Claude can write to it via API and a Cloud Run service can read from it. The tools serve the behavior. The behavior doesn’t contort to serve the tools.

    3. Extract → Distill → Deploy (The Human Distillery)

    The behavior here is one of the rarest in any knowledge-intensive industry: taking tacit knowledge — the unwritten, unspoken operational intelligence that lives in people’s heads — and converting it into structured artifacts that AI systems can immediately use.

    Tacit knowledge doesn’t fit into forms, surveys, or databases. It surfaces through conversation. The extraction behavior is a specific sequence: disarm the subject, descend through four layers of questioning (documented protocol → exception cases → sensory knowledge → counterfactual pressure), capture what surfaces, and distill it into a dense artifact.

    That behavior existed long before any tool was selected to support it. The tool choices — which models to run distillation through, how to structure the output schema, where to store the resulting knowledge concentrates — all came after the behavior was understood. The behavior is irreplaceable. The tools are interchangeable.

    4. Observe → Route → Produce (Task Routing for Variable Attention)

    Most productivity systems are built around the assumption that the operator applies consistent, scheduled attention to work. Tasks sit in queues. Work happens in order. Focus is managed through priority.

    That behavior doesn’t match how an ADHD-wired operator actually works. The actual behavior is: attention arrives unbidden, attaches to whatever has activated the interest system, runs at extraordinary intensity, and then ends — also unbidden. The work happens in spirals, not lines.

    An AI-native operation designed around this actual behavior routes tasks differently. High-interest, high-judgment work goes to the operator when the operator’s attention is activated. Low-interest, deterministic work gets routed to automated pipelines that run on schedule regardless of operator state. The behavior — variable, interest-driven, high-intensity — shapes the system. The system doesn’t demand behavior the operator can’t deliver.

    The result is not a workaround. It’s an architecture. And the architecture works better for a neurotypical operator too — because the constraints that neurodivergence makes extreme are present in milder form in everyone.

    5. Touch → Remind → Refer (The CRM Community Framework)

    The restoration industry spends $150–$500 per lead acquiring customers and then never contacts them again. Not because they don’t want to. Because the tool they have — a job management system built around transactions — doesn’t support the behavior they need.

    The behavior is: make consistent, relevant, human contact with warm relationships at regular intervals, using legitimate business moments as the reason. That’s it. The behavior is simple. The tool selection is almost irrelevant — a spreadsheet and a Mailchimp free account can execute it. What matters is that the system is built around the behavior (stay present in warm relationships) rather than around the tool (send marketing emails).

    When you build around the tool, you get a marketing email campaign. When you build around the behavior, you get a community — a network of people who feel a genuine two-way relationship with your company and who refer you business because you’re the company that actually stayed in touch.

    The technical implementation of this — segmentation from ServiceTitan and Jobber, email automation in Mailchimp or Brevo, relationship intelligence in a Notion Second Brain — is documented in full in the CRM Community Framework series. Every tool choice in that series is downstream of the behavior. None of it works if you start with the tool.

    6. Signal → Display → Act (The Four-Layer Data Architecture)

    A complex multi-site operation generates data from dozens of sources simultaneously — WordPress post metrics, GCP Cloud Run logs, Notion task statuses, client pipeline movements, content performance signals. The instinct is to find one tool that can hold all of it. The tool becomes the system.

    The behavior is different for each data type. Machine-generated operational data (image processing logs, batch job results, embedding vectors) needs to be written and read by automated systems at high speed. Human-actionable signals (site health alerts, content gaps, client status changes) need to be displayed in a way a person can act on without noise. Content in progress needs to be stored independently of where it will ultimately be published.

    Four behaviors. Four tool layers. WordPress for published content, GCP for machine data, Notion for human signals, Google Drive for files. No single tool tries to do all four. Each tool is chosen because it’s the best fit for one specific behavior — not because it can technically handle the others.


    How to Apply This in Your Operation

    The behavior-first design process has three steps, and none of them involve opening a browser tab to research tools.

    Step 1: Write down what actually needs to happen. Not what you want to accomplish. Not what you wish the system could do. The specific sequence of actions that produces the result you need. Subject → verb → object, repeated until the behavior is fully described. “Someone writes an article. The article needs to be findable in six months. The article needs to be published to a website.” That’s a behavior. “We need better content management” is not.

    Step 2: Identify where the behavior breaks down today. Every system has the places where it works and the places where it silently fails. A CRM that nobody updates after the job closes. An email platform that has contacts from three years ago and no segmentation. A content process that lives in someone’s head. These are the behavior gaps — the places where the actual behavior doesn’t match the intended behavior.

    Step 3: Choose the simplest tool that serves the behavior. Not the most powerful. Not the most popular. Not the one with the best demo. The one that makes the behavior easiest to execute consistently. A $13/month Mailchimp account and a Google Sheet will outperform a $400/month marketing platform if the behavior is four emails per year to a warm local database — because the complexity of the expensive tool introduces friction that kills the behavior entirely.


    The AI-Native Operation Is Behavior-First by Definition

    The reason AI-native operations tend to outperform tool-native operations has nothing to do with AI being smarter. It has to do with design philosophy.

    AI tools, at their best, are infinitely flexible. They don’t impose a shape on your operation. They serve whatever behavior you describe. The operator who builds an AI-native operation is forced — by the nature of the tools — to understand their own behaviors first. You cannot prompt your way to a useful output without knowing what useful looks like. You cannot build a pipeline without understanding the sequence it’s meant to automate.

    This is why the AI-native operator has a structural advantage over the SaaS-native operator. Not because their tools are better. Because the process of building with AI forces behavior-first thinking, and behavior-first thinking produces systems that compound over time instead of decaying into expensive shelf-ware.

    The tool will change. The behavior won’t. Build the system around the behavior.


    Frequently Asked Questions

    How do you identify the behavior if you’ve always built around tools?

    Start with the breakdowns. Wherever your current system has workarounds, manual steps, or things people do “outside the system,” those are the places where the tool’s shape and the behavior don’t match. The workarounds are the behavior. Build the new system to serve them directly.

    Doesn’t this make tool selection harder and slower?

    It makes it faster. When you know the behavior precisely, you have a clear evaluation criterion: does this tool make the behavior easier to execute consistently, or does it add complexity? Most tool evaluations fail because the criteria are vague. Behavior-first evaluation is fast because the test is concrete.

    What if the behavior changes over time?

    Behaviors evolve. Systems built around behaviors can evolve with them — you swap the tool layer without disrupting the behavior layer. Systems built around tools can’t evolve without a full rebuild, because the tool is the system. Behavior-first architecture is inherently more resilient to change.

    Is this just another way of saying “process before technology”?

    It’s related but more specific. “Process before technology” is usually interpreted as documentation before implementation — write the SOPs, then build the tools to support them. Behavior-first design is about understanding the actual behavior of the operation, which often differs significantly from the documented process. You’re designing around what people and systems actually do, not what they’re supposed to do.

    How does this apply to AI tool selection specifically?

    AI tools are especially susceptible to tool-first thinking because they’re impressive in demos. The demo shows capability; the behavior question asks whether that capability serves a specific sequence in your operation. Most AI tool adoptions fail not because the tools are bad but because they were selected based on capabilities rather than behaviors. The question is never “what can this tool do?” It’s “which of my behaviors does this tool serve, and does it serve them better than what I have now?”


  • How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    How Claude Managed Agents Handles Idle Time (And Why It Matters for Your Bill)

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The most counterintuitive thing about Claude Managed Agents pricing is what you don’t pay for. Most people, when they hear “$0.08 per session-hour,” mentally model a virtual machine running continuously. That’s the wrong mental model. Here’s the right one, and why it matters for your bill.

    The Core Distinction: Active vs. Idle

    Managed Agents session runtime only accrues while your session’s status is running. The session can exist — open, initialized, capable of continuing — without accumulating runtime charges when it’s not actively executing.

    The specific states that do not count toward your $0.08/hr charge:

    • Time spent waiting for your next message
    • Time waiting for a tool confirmation
    • Time waiting on an external API response your tool is calling
    • Rescheduling delays
    • Terminated session time

    This is a meaningful architectural decision by Anthropic. They’re billing on what actually taxes their compute — active execution — not on session existence or wall-clock time.

    Why This Is Different From How You Might Expect Billing to Work

    Compare three billing models:

    Virtual machine billing (what this is not): You pay for every hour the instance exists, whether it’s idle or saturated. A VM running 24/7 with 10% actual utilization still costs 24 hours/day.

    Lambda/function billing (closer analogy): AWS Lambda bills on execution duration and invocation count — you pay when code actually runs, not when a function is “available.” Idle Lambda functions cost nothing.

    Managed Agents billing (what this actually is): Closer to Lambda than VM. You pay $0.08 per hour of active execution. A session that runs for 2 hours of wall-clock time but has 90 minutes of waiting costs $0.08 × 1.5 hours = $0.12, not $0.08 × 2 hours = $0.16.

    A Real Scenario: The Human-in-the-Loop Agent

    Consider an agent that processes your inbox for action items and waits for your approval before sending replies. Wall-clock time: 4 hours open during your workday. Actual active execution: 20 minutes of processing across that 4-hour window, with the rest spent waiting for your review decisions.

    • VM billing equivalent: 4 hours × rate = significant charge
    • Managed Agents billing: 20 minutes × $0.08/hr = $0.027

    The difference is real. For interaction-heavy agents where the agent frequently waits for human decisions, the idle-time exclusion significantly reduces costs versus a naive per-hour model.

    A Real Scenario: The Autonomous Batch Agent

    Now consider an agent running a fully autonomous content pipeline — no human checkpoints, just continuous execution through a queue. Wall-clock time and active execution time are nearly identical because the agent never waits.

    • A 2-hour autonomous batch: 2 hours × $0.08 = $0.16

    Here, the idle-time model provides no benefit — the agent has no idle time. The billing is effectively equivalent to per-hour pricing because execution is continuous.

    Code Execution Containers Are Included

    One more billing nuance worth knowing: when your agent runs code, the execution happens in sandboxed Linux containers. These containers are not separately billed on top of session runtime. The $0.08/hr covers both the session runtime and the container execution. This is explicitly documented by Anthropic and represents meaningful savings if your agent is doing significant code execution work — you’re not paying twice.

    What This Means for Workload Design

    If you’re designing agent workflows and have the choice between architectures, the billing model creates a useful signal:

    • Agents that wait on humans: Metered billing is favorable — you only pay for the actual reasoning and execution time, not the human decision time
    • Fully autonomous agents: Billing approaches equivalent to per-hour rates — optimize these on token efficiency, not idle reduction
    • Scheduled batch agents: Natural fit — run when needed, terminate when done, no idle accumulation

    The 24/7 Agent Math

    For anyone doing the 24/7 always-on calculation: the maximum theoretical runtime exposure is 24 hrs × $0.08 × 30 days = $57.60/month in session fees. But a 24/7 agent with zero idle time is rare in practice. Agents that sleep between triggers, wait on external data, or hold for human decisions have meaningful idle windows that reduce the actual charge below the theoretical ceiling.

    Full monthly cost analysis: The Real Monthly Cost of Running Claude Managed Agents 24/7. Pricing reference: Complete Pricing Guide. All questions: FAQ Hub.

  • Claude Managed Agents Rate Limits — What 60 Requests Per Minute Means in Practice

    Claude Managed Agents Rate Limits — What 60 Requests Per Minute Means in Practice

    The Lab · Tygart Media
    Experiment Nº 561 · Methodology Notes
    METHODS · OBSERVATIONS · RESULTS

    You’re planning to run Claude Managed Agents at scale. You’ve modeled the token costs, the session-hour charge, the workload cadence. Then you hit the actual constraint: rate limits. Here’s what 60 requests per minute actually means in practice, and whether it’s going to be your ceiling.

    The Two Limits You Need to Know

    Managed Agents has two endpoint-specific rate limits, separate from your standard Claude API limits:

    • Create endpoints: 60 requests per minute
    • Read endpoints: 600 requests per minute

    Your organization-level API limits apply on top of these. If your org is on a tier with a lower requests-per-minute ceiling, that’s the actual binding constraint.

    What “60 Create Requests Per Minute” Actually Means

    A create request, in Managed Agents context, is typically a session creation call — starting a new agent session. 60/minute means you can start 60 sessions per minute maximum. For almost all real workloads, this is not the binding constraint. Here’s why:

    Think about what generates create requests. If you’re running a batch pipeline that starts one new agent session per content item, processing 60 items per minute would saturate the limit. But a 60-item-per-minute content pipeline is running 3,600 items per hour — a genuinely high-volume operation. Most production agent workloads don’t look like this. They look like one session that runs for minutes or hours, processes multiple tasks within that session, and terminates when done.

    The create limit matters most for architectures where you’re spinning up a new session per task rather than running tasks within a persistent session. If that’s your pattern, 60/minute is a hard ceiling you’ll need to design around.

    What “600 Read Requests Per Minute” Actually Means

    Read requests include polling session status, reading agent output, checking checkpoints, and retrieving session state. 600/minute is a relatively generous limit — that’s 10 reads per second. For a monitoring dashboard polling 10 active sessions every second, you’d hit this. For most production monitoring patterns (checking status every 5-30 seconds per session), you’re well under the ceiling.

    The read limit becomes relevant in high-concurrency architectures where many sessions are running in parallel and all being polled aggressively. If you’re running 50 concurrent agents and checking each one every 2 seconds, that’s 25 reads/second — still within the 10 reads/second limit per second, but compressing toward it.

    The Limit That’s More Likely to Actually Stop You

    For most agent workloads, token throughput limits hit before request rate limits do. The reasoning: a long-running agent session processing significant context generates a lot of tokens. If you’re running many such sessions in parallel, you’ll hit your organization’s token-per-minute limit before you hit 60 sessions created per minute.

    Token limits depend on your API tier. Higher tiers have higher token throughput limits. Rate limit increases and custom limits for high-volume enterprise customers are negotiated with Anthropic’s sales team.

    Designing Around the 60 Create Limit

    If your architecture genuinely needs more than 60 new sessions per minute, the primary design pattern is batching more work within each session rather than creating more sessions. A single Managed Agents session can handle sequential tasks — you don’t need a new session per task if your tasks can be queued and processed within one session’s lifecycle.

    The tradeoff: longer-running sessions accumulate more runtime charge ($0.08/hr active). For most workloads, the efficiency gains from batching outweigh the marginal runtime cost.

    The Agent Teams Implication

    Agent Teams — Managed Agents’ multi-agent coordination feature — coordinate multiple Claude instances with independent contexts. Each instance in an Agent Team is a separate entity from a context standpoint. How Agent Team member sessions count against the create rate limit is worth verifying against current documentation if you’re architecting a high-concurrency Agent Teams deployment.

    For Enterprise Workloads

    If you’re evaluating Managed Agents for enterprise-scale deployment and the published limits don’t fit your volume requirements, contact Anthropic’s enterprise sales team. Rate limit increases for high-volume applications are a documented option — they’re negotiated, not self-serve.

    Contact: [email protected] or through the Claude Console.

    Frequently Asked Questions

    Does the 60 requests/minute limit apply to all API calls or just session creation?

    The 60/minute limit applies to create endpoints — session creation being the primary one. Read operations have a separate 600/minute limit. Standard Messages API calls are governed by your organization’s standard tier limits, not these Managed Agents-specific limits.

    Do subagents count against the create rate limit separately from the parent session?

    Subagents operate within the parent session’s context and report results upward — they’re architecturally different from new sessions. Verify current documentation for precise billing treatment of subagent creation calls vs. Agent Team session creation.

    What happens when I hit the rate limit?

    Standard API rate limit behavior applies — requests over the limit receive a 429 response. Implement exponential backoff in your session creation logic for any high-volume pattern that approaches the 60/minute ceiling.

    How does this compare to OpenAI’s Agents API limits?

    Rate limit structures differ by product and tier. Direct comparison requires checking both providers’ current documentation for your specific tier. The full comparison: Claude Managed Agents vs. OpenAI Agents API.

    Full pricing context including rate limits: Claude Managed Agents Complete Pricing Reference. All questions: Claude Managed Agents FAQ.