Tag: Tech Stack

  • Claude Code Server-Managed Settings: The Admin Console Push That Replaces Your MDM Pipeline

    Claude Code Server-Managed Settings: The Admin Console Push That Replaces Your MDM Pipeline

    Last week I argued that if you have more than a handful of engineers on Claude Code, repo-level .claude/settings.json is not enough — you need managed-settings.json deployed through MDM. That is still true. What changed in 2026 is that you no longer need an MDM team to roll it out.

    Claude Code now supports server-managed settings: a remote configuration tier pushed from the Claude.ai admin console, with no file on disk and no MDM involvement. If you are on the Team plan running Claude Code 2.1.38+ or the Enterprise plan running 2.1.30+, this is available to you today, and most platform teams I talk to are still treating MDM-deployed managed-settings.json as the only option.

    It is not. And the precedence rules matter.

    The New Top of the Settings Hierarchy

    Claude Code’s settings stack already had a clear order — repo > user > project > local — with managed settings sitting on top of all of them as the unoverridable tier. Server-managed settings now sit at the same top tier alongside MDM and the on-disk managed-settings.json file. Within that managed tier, the documented precedence is:

    1. Server-managed settings (admin console push)
    2. MDM / OS-level policies (Jamf, Kandji, Group Policy, Intune)
    3. managed-settings.json on disk (the file we deployed last week)
    4. HKCU registry (Windows)

    Server-managed wins. If you push a policy from the admin console that conflicts with a fleet managed-settings.json deployed by MDM, the server policy applies. That is the entire point.

    What This Actually Replaces

    For organizations without a mature endpoint management pipeline — which is most companies smaller than a couple hundred engineers — the old path looked like this: get IT to package a JSON file, push it through Jamf or Group Policy, verify on a pilot machine, then deploy fleet-wide. Two-week ticket minimum.

    Server-managed settings collapse that to: log into the admin console, write the policy in the UI, save. Claude Code clients fetch the new policy at startup and re-poll hourly during active sessions. No reboot. No reinstall. No ticket.

    This is a real change in posture. The friction that kept smaller teams from deploying any managed policy at all just dropped to near zero.

    The Approval Gate Most Teams Will Hit

    Server-managed settings have one behavior MDM-deployed settings do not: certain categories require explicit user approval before they apply on a given machine. The current list per the docs:

    • Shell command settings (custom commands surfaced to the model)
    • Custom environment variables (anything injected into the model’s process env)
    • Hook configurations (pre/post-tool-use hooks)

    These three need the user to click through an approval prompt the first time the new policy hits their client. Deny rules in permissions.deny, the audit log path, telemetry settings, default model — those apply silently.

    The reasoning here is sound: a malicious admin (or a compromised admin account) could otherwise inject a hook that exfiltrates every prompt or a shell command that pipes diffs to an external endpoint. Approval gating those three categories means a developer at least sees the change before it takes effect. It also means your “push the new hook policy fleet-wide” plan has a manual confirmation step you cannot skip.

    If you need silent enforcement of hooks or shell commands, MDM-deployed managed-settings.json still does that without the prompt. Use the right tool for the right setting.

    What Belongs on the Server, What Belongs in MDM

    After running both for two weeks across a small fleet, the split that has held up:

    Push from the admin console:

    • permissions.deny rules that should be hot-updatable when a new exfil vector is discovered
    • Default model pinning (when you want to change it without re-deploying)
    • Telemetry and audit log endpoints
    • Anything you want to A/B across user groups (more on this in a second)

    Keep in MDM managed-settings.json:

    • Hook configurations you need to enforce silently
    • Shell command allowlists that must apply before first launch
    • Anything that needs to survive the user being signed out of their org account

    The reason for the second list is that server-managed settings only apply once the user authenticates with org credentials. A fresh laptop with a developer running claude before signing in gets no server policy. MDM-deployed settings apply from the first invocation.

    Group-Targeted Policies Are the Sleeper Feature

    Anthropic added user groups to the admin console earlier in 2026. Groups can be created manually or synced from an IdP via SCIM, and each group can be assigned a custom role plus its own spend limit. The piece most teams have not connected yet: server-managed settings respect group membership.

    This means you can push one permissions.deny policy to the “Security” group and a different one to the “Platform” group without writing two separate managed-settings.json files and pushing them through MDM with different scoping. Write two policies in the console, assign to groups, done. Group membership changes via SCIM propagate within the hour-long polling window.

    For a 200-engineer org that previously needed Jamf smart groups + MDM JSON variants to do the same thing, this is significant.

    Verification Workflow

    The same verification workflow from the MDM-deployed setup still applies, with one addition:

    1. Push the policy in the admin console
    2. On a test machine, run claude config list — server-managed settings should appear flagged as such
    3. Attempt a denied action, confirm immediate block
    4. If hooks or shell commands are in the policy, walk through the approval prompt
    5. Sign the test user out, sign back in, confirm policy reapplies

    The sign-out test matters because that is where server-managed differs most from on-disk managed settings — the policy is bound to the org-authenticated session, not the machine.

    Model Versions for Org-Wide Pinning

    If you pin a default model via server-managed settings, the current strings are: claude-opus-4-7 (flagship), claude-sonnet-4-6 (workhorse), and claude-haiku-4-5-20251001 (fast). Verify against the live model list at docs.anthropic.com/en/docs/about-claude/models before deploying — model strings change frequently and pinning to a deprecated one will silently break agent runs.

    Where Server-Managed Settings Lose

    Three real limitations:

    1. No silent hook/shell-command enforcement. User approval is mandatory for those three categories.
    2. No effect before org auth. Pre-auth sessions ignore server policy entirely.
    3. No fine-grained rollback. Console changes apply globally within the hour. There is no canary group, no staged rollout percentage, no “apply to 10% of fleet for 24 hours” toggle. If you push a bad deny rule, every active session picks it up at next poll.

    Mitigate the third one by maintaining a single non-production test group that you deploy to first, wait 90 minutes, then promote the policy to broader groups. It is a manual canary, but it is the canary you have.

    The 20-Minute Rollout for a Team Already on Team Plan v2.1.38+

    1. Open the admin console at claude.ai → Settings → Claude Code policies
    2. Write a minimum-viable policy: deny curl, wget, rm -rf /, .env reads, credential files
    3. Assign to a single test group (one user)
    4. On that user’s machine, run claude config list — confirm the server policy appears
    5. Try three denied actions, confirm all blocked
    6. Expand assignment to one team
    7. Wait 24 hours, watch for tickets
    8. Roll org-wide

    The whole sequence takes longer than it runs because of the wait windows, not because of the work. The actual work is twenty minutes.

    Why This Article Exists

    The MDM-deployed managed-settings.json approach from last week is still the right answer for orgs that need silent, pre-auth policy enforcement. For everyone else — which is most teams adopting Claude Code in 2026 — server-managed settings are the easier path and most platform teams I talk to do not know they exist yet. Admin console push, no on-disk file, no MDM dependency, group-scoped via SCIM. If you are on a recent Team or Enterprise plan, this is the deployment posture you actually want.

    Sources

    • docs.anthropic.com/en/docs/about-claude/models (model version strings)
    • code.claude.com/docs/en/server-managed-settings (server-managed settings docs)
    • code.claude.com/docs/en/admin-setup (admin setup reference)
    • support.claude.com/en/articles/11845131-use-claude-code-with-your-team-or-enterprise-plan (Team/Enterprise Claude Code usage)
    • support.claude.com/en/articles/13799932-manage-groups-and-group-spend-limits-on-enterprise-plans (group management + spend limits)
    • support.claude.com/en/articles/13133195-set-up-jit-or-scim-provisioning (SCIM provisioning)
    • claude.com/product/claude-code/enterprise (Enterprise plan overview)
    • anthropic.com/news/claude-code-on-team-and-enterprise (admin controls launch)

  • What Actually Drives Claude Code Adoption: Inside a 30-Engineer Rollout That Held 35% at Month Four

    What Actually Drives Claude Code Adoption: Inside a 30-Engineer Rollout That Held 35% at Month Four

    If you want to understand why some Claude Code rollouts compound and others quietly stall, stop looking at license telemetry and start looking at one artifact: the skill library. Every public 2026 case study with sustained productivity gains has the same shape — a committed skill kit, tight CLAUDE.md files, a handful of hooks, and a Friday retro cadence the team actually keeps. Teams that buy seats and skip the artifacts get install-only adoption and a dashboard that reads flat for a quarter.

    The 30-engineer case that landed at 35% productivity lift

    The cleanest recent case study comes from a Digital Applied write-up published May 15, 2026 — an anonymized composite tracking a Series-B SaaS shop with thirty engineers across six squads on a Node/TypeScript monorepo. The team had Claude Code seats for the better part of a year before the engagement started. Roughly half the engineers used the CLI weekly. Zero shared skills, no committed project settings, no hooks, two squads with no project memory at all.

    The day-zero audit on a 50-point scorecard came in at 19/50. Ninety days later it hit 41/50 — a 22-point shift from late Stage 1 to mid-Stage 3. The headline number reported to leadership: a sustained 35% productivity lift, engagement-weighted, that held flat into month four.

    The shipped artifacts behind that number:

    • 22 shared skills, with authorship spread across 9 engineers
    • 11 wired hooks across three archetypes (notification, audit, gate)
    • 3 custom subagents — code-reviewer, ticket-triager, release-notes-writer
    • CLAUDE.md files pruned and held under 400 lines per repo

    The most-invoked skill was commit, accounting for roughly a third of all invocations by month four. That kind of skew is normal in a mature library and tells you which workflow is actually being changed by the rollout.

    Why CLAUDE.md hygiene predicts depth

    The single most actionable lesson from the case study is mechanical: cap CLAUDE.md at 400 lines and enforce it in PR review. Two squads in the engagement drifted past 800 lines in sprint two. Their skill-invocation rate ran roughly 40% lower than the four squads that held the line.

    The hypothesized mechanism, validated in two follow-up retros: bloated memory causes the model to skim the file rather than internalize it, which produces more generic responses, which makes engineers reach for the tool less often, which drops invocation rates further. The cycle is self-reinforcing in either direction. When the team ran a month-four prune that cut the average CLAUDE.md from 520 to 340 lines, skill-invocation rate rose 12% across the team in the following two weeks.

    The discipline: long-form content moves to .claude/docs/ as sub-docs with one-line summaries and links in the main file. The main file stays orientation-shaped — who the team is, what the repo does, where to look for the rest.

    The productivity panel mistake every team makes first

    Version one of this team’s productivity panel was wrong, and that wrongness taught the rollout more than any single milestone after it. The first panel tracked the metrics license telemetry already covered: total sessions opened per week, total tokens, average session length. It read flat for six weeks while the underlying capability of the team was visibly shifting in retros and PRs.

    Version two, rebuilt in week eight, weighted around engagement signals:

    • Skill invocations split by skill
    • Subagent runs per week
    • Time-to-first-meaningful-output for new contributors
    • Audit-score deltas from the quarterly 50-point scorecard
    • PR-to-merge time on Claude-Code-assisted PRs versus baseline

    By month four the panel showed roughly 410 skill invocations per week, 85 subagent runs per week, new-hire time-to-first-meaningful-output at -45% versus baseline, and PR-to-merge time -18% versus baseline. The 35% headline was an engagement-weighted composite of those signals, not a single measurement — and the team was careful never to frame it as “engineers ship 35% more code,” because that framing invites a debate the panel cannot win.

    How this case lines up with the rest of the 2026 cohort

    The Digital Applied 30-dev case is not an outlier. A companion case study from the same firm, dated May 13, 2026, covers a 100-developer engineering organization that sustained a 28% productivity lift with a 32-entry skill library over six months. That team ran Claude Code and Cursor side-by-side: Claude Code as the terminal/CLI surface for refactors, multi-file edits, codebase navigation, and review automation; Cursor as the in-editor surface for line-level completion and inline review.

    The pattern that replicates across both engagements is the cadence, not the contents. Three ninety-day sprints — install, leverage, governance — plus an explicit sustain phase that starts at day 90 with the same owner and the same Friday retro cadence as the active sprints. Treating days 91+ as a vague quarterly review is the most common reason adoption drifts back to install-only inside two quarters.

    What to actually do on Monday

    If you have Claude Code seats and want a rollout that compounds instead of stalls, the operational order matters more than the contents of your skill library:

    1. Run the day-zero audit and write down the score. The 50-point rubric Digital Applied published is a defensible starting point; any scorecard that distinguishes install from artifacts from governance will do. The number is what makes the case for the engagement internally.
    2. Name the rollout lead and carve 20-30% of their week. Less than that and the calendar slips. The role shape is enough seniority to enforce milestone discipline, enough engineering depth to write skills and hooks rather than just steward them, and enough calendar discipline to keep the cadence intact when product pushes back.
    3. Calendar the four phase-end retros and the month-four review before sprint one opens. Friday retros are thirty minutes per squad per week — the cheapest part of the rollout and the most often skipped. The friction they catch in week three compounds silently for the rest of the sprint if you don’t.
    4. Build the productivity panel deliberately badly in sprint two and rebuild it in sprint three. The version-two rebuild is structural, not incremental. Trying to ship the right panel on the first try usually delays the cadence rather than improving the signals.
    5. Cap CLAUDE.md at 400 lines and enforce it in PR. This is the single highest-ROI hygiene rule in the engagement and the one teams skip most often because completeness feels safer than discipline.

    The honest framing: a single-quarter Claude Code rollout takes you from Stage 1 to mid-Stage 3 on a defensible scorecard. Stage 4 — the optimized end-state with deeper subagent governance, a security cadence that catches drift, and a productivity panel that has been iterated against a full quarter of data — is a second-quarter project. The teams that get there are the ones whose sustain phase looks identical to the sprints that preceded it. The teams that drift are the ones whose Friday retro disappeared sometime around month two.

    Model versions referenced throughout this piece reflect Anthropic’s current lineup as of May 2026: claude-opus-4-7 (flagship), claude-sonnet-4-6 (workhorse), and claude-haiku-4-5-20251001 (fast). If you are reading this six weeks from now, check the model docs before you copy any string into a config.

  • Build on Alpha SDKs — and the case for waiting until GA

    Build on Alpha SDKs — and the case for waiting until GA

    A Second Take on a working decision: whether a solo operator should build production-grade infrastructure on alpha SDKs, or wait for general availability. This is not a hypothetical. Yesterday a fleet of ten Notion Workers shipped in three hours on an alpha SDK — eight of them working end-to-end, two of them gated behind capabilities that have not been enabled. Today the question is whether that was leverage or whether that was a detour. Both cases get made here.


    The Thesis from the First Take

    The argument for building on alpha software is older than software itself. It is the argument every operator who ever shipped early made to themselves: the people who get to the new surface first do not just get there first. They shape what arrives. They become the reference customer. Their friction becomes the roadmap. The ones who wait until everything is polished are buying the polish someone else paid for — and giving up the position that polish makes invisible.

    In the specific case of Notion Workers, the argument is even stronger. The SDK is free until August 11, 2026. The fleet built in one session validated four full capability shapes — tool, sync, sync-with-external-HTTP, and webhook with HMAC. The friction points discovered were specific enough to compile into a Slack-ready writeup to Notion’s product-ops team. The auth gotcha that cost four OAuth attempts at the start of the session is now a documented doctrine that any future operator on Windows-WSL will inherit for free. That is the trade you make on alpha. You pay in friction. You earn in surface knowledge and the right to be a voice in what gets built next.

    There is a deeper version of this argument that matters more than the tactical one. Production infrastructure is not built by people who watch other people build production infrastructure. It is built by people who put their hands on the actual surface, find the actual edges, and develop the kind of tacit understanding that no documentation, however good, can transfer. Reading about how a Worker handles a webhook signature is different from having one fail at 11 PM because the secret was not pushed. That second experience is what gets called intuition later. It cannot be downloaded. It has to be earned.

    The first take, then, is not really about Notion Workers at all. It is about the deeper claim that the people who learn the new surfaces first are the people who define what those surfaces are for. Everyone else inherits a category that was already decided.

    And the Case for Waiting

    Now the counter.

    The same fleet of ten Workers that proved four capability shapes also revealed something that the celebration glosses over. Two of the ten — the automation Worker and the AI connector Worker — could not be tested at all. They deployed clean. The code is fine. The bundles are sitting in the Notion infrastructure. They do not run because the user account does not have alpha access to those specific capabilities. The fix is not a code change. The fix is a permission grant that has to come from inside Notion. Until that happens, two of the ten Workers are not Workers. They are receipts for work done that cannot ship.

    That is the first hidden cost of alpha. The capability gates are not announced. They become visible only at the moment of attempted use, which is the most expensive moment to discover them. A solo operator’s time is the binding constraint of the entire operation. Spending it on bundles that cannot run because of an upstream permission is a worse trade than it looks on the surface.

    The second hidden cost is the dispatch gap. The Workers SDK in its current state assumes a developer running commands from a laptop. The `–local` execution mode requires a WSL Ubuntu environment with the right environment variables exported, the right token loaded into the right config file, and a human being to type the command. There is no remote trigger surface available through the Notion MCP server. There is no scheduled execution that an external system can verify. There is no way for an AI assistant working from a mobile session to invoke a Worker, even one already deployed and working. The Workers exist. They can be triggered. But only from one specific laptop, by one specific human, sitting in front of it.

    That gap turns out to matter more than any individual capability. The reason for building Workers in the first place was to remove the operator from the critical path of routine operations. If the operator still has to be physically present to start the Worker, the Worker has not removed the operator from the critical path. It has just changed the operator’s job from doing the work to invoking the thing that does the work. The leverage is real but smaller than advertised.

    The third hidden cost is the one nobody talks about. It is the cost of being early on a surface that may never become widely adopted. Every hour spent learning the idiosyncrasies of an alpha SDK is an hour not spent on a surface with broader applicability. If Notion Workers become the standard automation pattern for the platform, the early learning compounds for years. If Notion deprioritizes the SDK, retires it quietly, or pivots to a different model — none of which are unlikely for an alpha product — that learning has a shelf life measured in months. The operator who waited for GA still has all of the time they did not spend on the deprecated surface. The early adopter has bills receivable in a currency that no longer trades.

    The case for waiting, then, is not a case for timidity. It is a case for opportunity cost. Every alpha SDK is competing with every other thing that operator could have built in the same window. The question is not “is the alpha SDK valuable” — it usually is, in some narrow technical sense. The question is “is the alpha SDK more valuable than the next-best use of the same hours.” For a solo operator, that comparison is often unflattering to the alpha.

    What the First Take Gets Right

    The first take is correct that surface knowledge cannot be downloaded. The team that put hands on the alpha now knows things about how Notion Workers authenticate, how the schema module differs from the builder module, how the webhook HMAC pattern resolves, and how the capability registration phase fails in five different ways. None of this is in any document anyone has written. All of it will be implicit in every future architectural decision the operator makes about Notion as a platform. That is not nothing. That is a kind of capital.

    The first take is also correct that the price of alpha is paid once, while the position earned can compound. The four OAuth attempts that cost an hour of frustration on Worker number two cost zero hours on Worker number three. The capability shape that took thirty minutes to validate the first time took twelve minutes the second time and would take five minutes the next time it appears. Learning curves are nonlinear in the operator’s favor. The cost is front-loaded. The return, if the surface survives, is durable.

    And the first take is correct about something the counter-argument tends to miss: there is no neutral position. The operator who waits for GA is not pausing. They are doing something else with that time. If the something else is also valuable, the wait is rational. If the something else is consuming content about other people’s builds, the wait is just deferral dressed up as discipline.

    What the Second Take Gets Right

    The second take is correct that capability gates are real, that dispatch gaps are real, and that the operator’s time is the binding constraint on everything. None of those are abstract concerns. The two gated Workers from yesterday’s session are sitting in the infrastructure right now, doing exactly nothing, because a permission grant has not arrived. The eight working Workers cannot be triggered from anywhere except one specific laptop. The operator who wanted to invoke a Worker from a mobile session this morning could not.

    The second take is also correct that the deeper question is opportunity cost. If the same three hours had gone to building a Cloud Run service that wrapped the same logic, the result would be a working dispatch surface that any system could invoke — Slack, Notion automations once they’re enabled, scheduled cron, a webhook, an AI assistant on a phone. That service would not have been blocked on alpha permissions. It would not have required a specific WSL environment to invoke. It would have been ready for use the moment it deployed. The Workers fleet is more capable per line of code than the equivalent Cloud Run service would be, but it is less invokable. For an operator whose problem is “I want this to run when I am not there,” the less-invokable solution is the worse solution, even if it is more elegant.

    And the second take is correct that the rhetoric of “shaping the product” tends to flatter the early adopter beyond what the evidence supports. Most early adopters do not shape products. They use products that other early adopters shaped before them, and they generate friction reports that get triaged into a backlog that may or may not produce changes before the product changes direction. The reference customers who actually get heard tend to be the ones with the largest accounts, the most followers, or the deepest relationships with the product team. A solo operator is rarely any of those things. The Slack message to Notion’s product-ops team yesterday was a good message. Whether it produces changes in the SDK is a question whose answer is mostly out of the operator’s hands.

    The Test That Decides It

    Both takes are partially right, which is what makes the decision interesting rather than obvious. The test that decides between them, for any specific operator on any specific alpha SDK, is not whether the SDK is interesting or whether the friction is tolerable. It is a simpler test, and it is the only test that matters:

    Does the alpha SDK shorten the path to a result the operator already wanted, or does it create a new path to a result the operator did not previously care about?

    If the SDK shortens an existing path, alpha is leverage. The operator was going to solve the problem anyway. The alpha tool reduces the time and cost of solving it. The friction is just the friction of any new tool, and the early-mover advantage is real because the operator’s underlying intent was real.

    If the SDK creates a new path to a new problem, alpha is a detour. The operator is now solving a problem the SDK suggested rather than a problem the business required. The friction is no longer in service of any pre-existing goal. The early-mover advantage is hypothetical because there is no business outcome the alpha is actually serving — only an interesting tool that happens to exist.

    The Notion Workers case fails this test on the strict reading. The operator did not have an existing need to schedule recurring Notion automations. The Workers SDK suggested that need. The fleet was built to validate the SDK, not to solve a pre-existing operational problem. By the strict test, this is a detour.

    But the strict test misses something. The operator did have an existing need — to remove themselves from the critical path of routine operations. That need pre-dated the SDK by years and survives the SDK if it gets retired. The Workers SDK was one possible tool to serve that need. Cloud Run was another. Notion’s own automations product was a third. The fleet built yesterday tested whether Workers was the right tool for the existing need. The answer, on the evidence, is: partially. Workers are excellent at the work itself. They are not yet good at the dispatch problem. That is useful information, and it was acquired in three hours at zero dollar cost.

    By the strict test, the build was a detour. By the deeper test, it was a calibration run on a candidate tool for a real need. Both readings are defensible. The operator will know which is correct when the next decision arrives: whether to invest in the dispatch gap that would make Workers fully production-ready, or whether to redirect that investment toward a Cloud Run service that solves the dispatch problem natively. That decision is the verdict. Until it is made, the build is neither leverage nor detour. It is a question still open.

    The Verdict

    The verdict, for this specific case, leans toward continuation but with a different framing.

    Notion Workers are not a production automation platform yet. They are a research investment in what a production automation platform on the Notion surface might look like. The eight working Workers are not deliverables. They are experimental rigs that produced specific knowledge about a specific surface. That knowledge is valuable independent of whether Workers ever become the standard pattern. It is also valuable independent of whether the operator continues to use Workers at all.

    The right next move is not to abandon the Workers fleet. It is also not to keep building Workers as if the dispatch problem will solve itself. The right next move is to add a Cloud Run dispatcher — a small service that accepts authenticated POST requests and, internally, triggers the appropriate Worker. That dispatcher would close the dispatch gap immediately, would work for any future Worker without further integration, and would also work for any non-Worker job the operator wants to invoke from anywhere. It would cost less to build than the original Workers fleet because it would inherit all the lessons.

    That move makes both takes correct. The first take wins on the claim that the alpha investment paid for itself in surface knowledge and capability shape validation. The second take wins on the claim that the dispatch gap is the binding constraint and that the path through Cloud Run is the better answer for that specific gap. Neither take is wrong. Both takes describe a real part of the trade.

    The deeper lesson, if there is one, is that the question “should an operator build on alpha SDKs” is the wrong question. It is too general to answer. The right question is “does this specific alpha SDK shorten a path the operator already cares about, and what is the operator’s plan for the parts of the path the SDK does not yet cover.” If both halves of that question have answers, the alpha investment is rational. If either half is missing, the alpha investment is a detour wearing the costume of leverage.

    For Notion Workers, the first half has an answer. The second half got its answer today. The Cloud Run dispatcher is the missing half. Once it is built, the fleet that looked like a possible waste yesterday becomes the foundation of something usable. That is the way alpha investments usually work, on the cases where they work. They look like a detour right up until the moment the missing piece arrives. Then they look like infrastructure.

    And that, finally, is the second take. Not “wait for GA.” Not “always ship on alpha.” Something more specific: build on alpha when the SDK shortens a path you already care about, and when you have a plan for the parts of the path the SDK does not yet cover. If both conditions hold, alpha is leverage. If either fails, alpha is a detour. The Workers fleet is not yet a finished case. It is a case in progress, and the progress depends on what happens next, not what happened yesterday.

    The original take ran here yesterday, in a different form, when a fleet of ten Workers was treated as proof that alpha investments pay off. This take argues that the proof is still pending — and names the move that converts the pending proof into a finished one.

  • DASH vs Albi vs PSA vs Xcelerate: The Honest 2026 Restoration Software Comparison

    DASH vs Albi vs PSA vs Xcelerate: The Honest 2026 Restoration Software Comparison

    If you run a restoration company doing between $1M and $10M, the software question is no longer “do we need a system?” It’s “which one do we commit to for the next five years, because the switching cost is going to hurt either way.” This is the honest comparison nobody selling you a demo will give you.

    The restoration software market in 2026 has consolidated into roughly four serious purpose-built platforms — DASH, Albi, PSA, and Xcelerate — plus a tier of adjacent tools (Encircle, CompanyCam, JobNimbus, ServiceTitan) that solve part of the problem but force you to stitch the rest together. Below is what each one actually is, who it fits, and where it breaks.

    The short answer for impatient owners

    • DASH (CoreLogic / Next Gear): Deepest integration with the insurance ecosystem. The default if TPA volume is more than 30% of your book.
    • Albi (Albiware): Most customizable. Built by restorers who hated being forced into someone else’s workflow. No native Xactimate integration yet — that is the catch.
    • PSA (Canam Systems): The value play for larger teams. Flat pricing instead of per-user makes it dramatically cheaper once you cross 10–15 users.
    • Xcelerate: Best if you want process discipline baked in. Built by a former restoration GM. Strong native integrations, limited customization.
    • ServiceTitan: Only makes sense above roughly $5M revenue with 20+ technicians and multi-location complexity. Below that, you are buying enterprise overhead.
    • JobNimbus, CompanyCam, Encircle: Component tools, not full systems. Useful inside a stack, dangerous as the stack.

    The four serious platforms, in detail

    DASH

    DASH is owned by CoreLogic and connects natively to Xactimate, XactAnalysis, Symbility, Encircle, Matterport, and DocuSketch. If you are pulling jobs from Contractor Connection, Code Blue, or any TPA that lives inside the CoreLogic ecosystem, DASH is the path of least resistance. Pricing typically starts around $299/month for core plans and scales into custom enterprise quotes. For TPA-heavy operators it is the default answer.

    Where it breaks: Customization is limited. You operate inside DASH’s idea of a restoration workflow, not yours. Owners who pride themselves on “we do it differently” tend to fight the software.

    Albi (Albiware)

    Albi was built by restoration contractors who got tired of being forced into preset workflows. The platform’s calling card is customization — fields, stages, reports, and metrics bend to your operation rather than the other way around. Open API connects to QuickBooks Online, Zapier, CompanyCam, Encircle, Kahi, and others.

    Where it breaks: Per public information, Albi does not have a native Xactimate integration. For a cash-job, retail-heavy shop this is fine. For an insurance-heavy contractor whose entire estimating life lives in Xactimate, it is a real friction point you should walk through with your estimator before signing.

    PSA (Canam Systems)

    PSA’s pricing model is the differentiator. Where competitors charge per user — which punishes you for growing — PSA quotes flat team-based pricing. Public reporting puts a 10-person team at roughly $350/month against $600–$1,000 for per-user alternatives. The savings compound brutally at 20+ users. Integrations cover Xactware and Matterport, among others.

    Where it breaks: The UI is less polished than DASH or Xcelerate. Implementation is more involved. If you have a tech-light operations manager, expect a real ramp.

    Xcelerate

    Xcelerate was founded by a former restoration general manager, and it shows. The platform bakes operational discipline — profitability tracking, stage gates, team accountability — into the default workflow. Native integrations to Xactimate, XactAnalysis, QuickBooks, Matterport, and Zapier are solid.

    Where it breaks: Customization is minimal. The bet Xcelerate is making is that the average restoration company should adopt best practices rather than enshrine its quirks in software. Owners who want the platform to bend to them will be frustrated.

    The adjacent tools: useful, but not the whole system

    ServiceTitan brings enterprise-grade dispatch, reporting, and marketing attribution, plus restoration-specific modules covering moisture tracking and drying logs. Per-user pricing escalates fast. Unless you are running a multi-location restoration franchise at $5M+ with 20+ technicians, this is too much platform for the problem.

    JobNimbus starts around $40/user/month and excels at visual job boards and photo documentation. It lacks restoration-specific guts: no moisture mapping, no equipment tracking, no IICRC S500 compliance prompts. Workable as a starter system under roughly $750K revenue. Above that, you outgrow it.

    CompanyCam is a documentation tool, not a CRM. It is excellent at what it does and pairs cleanly with all four major platforms. Do not buy it as your system of record.

    Encircle is the field documentation specialist — moisture mapping, photo organization, and report generation are best-in-class. Pricing starts around $149/user/month. Many restoration shops run Encircle alongside DASH or Albi rather than as a standalone.

    The decision framework

    Forget feature checklists. Three questions decide this for you.

    1. What percentage of your revenue comes from TPA and direct insurance work? If it’s above 30%, DASH gets the first look because the CoreLogic ecosystem is where your jobs live. If it’s below 30% and you are mostly retail, you have real options.
    2. How many users will be in the system 24 months from now? Above 15 users, PSA’s flat pricing pays for itself within a year. Below 10 users, the per-user platforms are competitive on cost.
    3. Are you the kind of owner who wants the software to enforce your process, or one who wants the software to mirror your process? Xcelerate enforces. Albi mirrors. DASH and PSA sit between.

    What this costs you if you get it wrong

    A restoration company doing $3M with eight users on the wrong platform will typically lose somewhere between 40 and 120 hours of estimator and admin time per month to friction — workarounds, double entry, missing supplements, late invoicing. At a fully loaded $50/hr that is $2,000–$6,000 per month of pure overhead, before you count the supplements that fall through the cracks. Software is not the place to optimize for the cheapest sticker price. It is the place to optimize for the workflow your team will actually use without resentment.

    The bottom line

    If you are TPA-heavy, start with DASH. If you are retail-heavy with strong process opinions, start with Albi. If you are 15+ users and price-sensitive, force PSA into the demo cycle. If you want the software to make your team better operators by default, look at Xcelerate. Anything else — ServiceTitan, JobNimbus, standalone CompanyCam, standalone Encircle — is either too much platform or too little. Pick one of the four, commit, and stop shopping. The compounding ROI of a fully adopted system always beats the theoretical 12% feature edge of the platform you would have switched to.

    Frequently Asked Questions

    What is the best restoration company software in 2026?

    There is no single best — DASH wins for TPA-heavy operators, Albi for customization-heavy retail shops, PSA for teams above 15 users on flat pricing, and Xcelerate for operators who want process discipline baked in.

    Does Albi integrate with Xactimate?

    Per publicly available information, Albi does not have a native Xactimate integration as of 2026. It does offer an open API and integrates with QuickBooks, CompanyCam, Encircle, Kahi, Zapier, and others.

    How much does restoration CRM software cost?

    DASH starts around $299/month for core plans. PSA flat pricing for a 10-person team runs roughly $350/month. Per-user platforms typically run $99–$199 per user per month. Encircle starts around $149/user/month. JobNimbus starts around $40/user/month. All pricing is approximate and subject to vendor quote.

    Is ServiceTitan good for restoration companies?

    ServiceTitan makes sense for restoration companies above roughly $5M in revenue with 20+ technicians and multi-location complexity. Below that, the cost and implementation burden outweigh the benefit versus a purpose-built restoration platform.

    Can I run my restoration company on JobNimbus or CompanyCam alone?

    JobNimbus works as a starter system below roughly $750K in revenue but lacks restoration-specific tools like moisture mapping and equipment tracking. CompanyCam is a documentation tool, not a CRM, and should be paired with a full platform.

  • The Three-Legged Stack: Why I Stopped Shopping for New Tools

    The Three-Legged Stack: Why I Stopped Shopping for New Tools

    Last refreshed: May 15, 2026

    Companion piece: This article describes how the three-legged stack came together over fourteen months. For the full operating doctrine — why three legs specifically, what each leg’s job is, and how they hold each other up — see The Three-Legged Stack: Why I Run Everything on Notion, Claude, and Google Cloud. The two pieces complement each other; this one is the journey, that one is the doctrine.

    I almost got excited about Google’s Googlebook last week. Then I caught myself. I have a stack that’s starting to feel like a broken-in baseball glove — pocket exactly where I want it, leather oiled, laces holding. The last thing I need is a new glove.

    This is the operating philosophy I’ve landed on after a year of building Tygart Media as an AI-native content operation. It’s not a tech-stack post. It’s a posture. The stack I use — Claude as the intelligence layer, Notion as the control plane, GCP as the compute plane — happens to be the visual the rest of this piece is built around, but the real point is what holding still does to leverage.

    Walnut stool with copper, porcelain, and steel legs representing the Tygart Media AI operating stack of Claude, Notion, and GCP
    The Stack. Three legs is the minimum for stability. Add a fourth and you’ve added wobble, not strength.

    The temptation in any AI-adjacent business right now is to chase. Every week there is a new model, a new IDE, a new agent framework, a new laptop category. Googlebook arrives this fall promising Gemini at the kernel and an AI-powered cursor. OpenRouter sits there offering me every model in the world through one API. Six months ago I would have been wiring both of them in before the announcements cooled.

    I’m not doing that anymore. Here’s why, in seven images.

    The Three-Legged Stool

    Three legs is the minimum number for stability. Add a fourth and you haven’t added strength — you’ve added wobble. A three-legged stool sits flat on any surface, no matter how uneven, because three points define a plane. A four-legged stool needs the floor to be perfect, and if it isn’t, one leg is always lifting.

    My stack has three legs. Claude is the intelligence layer — every reasoning step, every draft, every architectural decision passes through it. Notion is the control plane — every project, client, task, ledger, and standard operating procedure lives there. Google Cloud Platform is the compute plane — Cloud Run services, BigQuery ledgers, Workload Identity Federation, the publisher infrastructure that moves content to 27 client sites without a single stored API key.

    People keep asking me when I’ll add a fourth leg. Will I move to OpenRouter for model diversity? Will I switch to Linear for project management? Will I migrate compute to AWS for the better startup credits? The honest answer is that adding a fourth leg right now would not make me more stable. It would make me less. I haven’t mastered the three I have.

    The Anvil and the Glove

    Walnut anvil on three legs with a worn baseball glove on top, sitting in a sunlit workshop
    Roots. Operations is operations. The discipline learned in restoration carries straight into AI-native content work.

    Before Tygart Media, I spent years in property damage restoration operations — Munters, Polygon, the kind of work where a phone call at 2 AM means a water line burst at a hotel and a crew needs to be on-site in forty-five minutes with the right equipment and the right paperwork. That world taught me everything I now use to run an AI-native content business. It taught me to batch. It taught me to absorb scope rather than push it back on the client. It taught me that subcontracting is a form of collaboration, not a failure mode. It taught me that operations is operations — the substrate changes, the discipline doesn’t.

    The baseball glove on top of the anvil is the metaphor I keep returning to. A new glove is stiff. It catches awkwardly. The webbing is too tight, the leather hasn’t formed to your hand yet, and every ball that comes in feels foreign. A broken-in glove is the opposite. It closes around the ball before you’ve consciously decided to squeeze. You don’t think about catching. You just catch.

    That’s what fourteen months on the same stack has done. I don’t think about how to publish to WordPress anymore. I don’t think about how to route a model decision between Haiku, Sonnet, and Opus. I don’t think about whether a new automation belongs in Cloud Run or as a Notion Worker. The catching is automatic. Every hour spent in the same three tools is another stitch in the glove.

    The Surveyor’s Tripod

    Surveyor's tripod with copper, porcelain, and steel legs planted on rocky ground at sunrise above the clouds
    Precision. The stack as a measurement instrument. Three legs, one truth.

    A tripod is a stool that measures. It’s the same three-legged geometry, but you put a sextant on top, or a transit, or a telescope, and suddenly the stability isn’t ornamental — it’s the whole point. If the legs aren’t planted, the measurement is wrong. If the measurement is wrong, you build in the wrong place.

    The three-legged stack as a measurement instrument is how I now think about content operations. Claude measures what to say. Notion measures what’s been said, what’s been promised, what’s been promoted, what’s been demoted. GCP measures what’s been deployed and what’s been logged. Together they make a single coherent reading of where the business actually is — not where I imagine it to be, not where I hope it is, but where it actually stands at 3 AM on a Tuesday.

    That reading is what lets me trust the work. The Promotion Ledger inside Notion tracks every autonomous behavior the system runs — content publishes, schema injections, taxonomy fixes, image optimizations — by tier and by clean-day count. Seven clean days on a tier means a candidate for promotion. A failure resets the clock. The instrument doesn’t lie. It either reads green or it doesn’t.

    The Trefoil

    Carved walnut trefoil with three interlocking loops of copper, porcelain, and steel meeting at a gold TM monogram
    Synthesis. Three loops meeting at the center. The synthesis point is where knowledge becomes a distillery.

    The trefoil is an ancient symbol — three interlocking loops meeting at a single point in the center. Heraldic shields use it. Cathedral architecture uses it. The Celtic version goes back to the Iron Age. It shows up everywhere because it answers a question every human system eventually asks: how do you get three independent things to produce a fourth thing that none of them could produce alone?

    Synthesis is the answer. Where the loops meet, the third thing happens. Claude alone is a smart conversation. Notion alone is a well-organized library. GCP alone is a pile of compute. None of those by themselves is a business. But the place where the three loops overlap — that’s where a client brief becomes a draft becomes an optimized article becomes a scheduled publish becomes a tracked outcome — and that center point is where the work actually lives.

    I think of Tygart Media as a Human Knowledge Distillery. The raw material is messy human knowledge — a client’s twenty years of trade experience, my own restoration background, a comedian’s stage instincts, a recovery contractor’s job-site stories. The distillery boils that down into something that can travel: an article, a schema block, a social post, a referral asset. The three legs aren’t doing the distilling. The synthesis at the center is.

    The Pocket Watch

    Open antique pocket watch on navy velvet with three mechanical bridges in copper, porcelain, and steel, TM monogram on the dial
    Mastery. Mechanism over magic. The watch doesn’t get better because a new watch came out.

    Independent horology — the world of small, fiercely independent watchmakers who build their movements by hand — is one of my private obsessions, and it has shaped how I think about AI tooling more than I expected. The watchmakers I admire most don’t release a new caliber every year. They spend a decade on one movement. They refine the escapement, balance the wheel, polish the bridges, and over time the watch gets better not because the parts are new but because the maker understands the parts better.

    This is the opposite of how most of the AI industry operates. The cadence is: ship a new model, ship a new agent, ship a new IDE, ship a new laptop. The implicit promise is that the latest thing will do more than the previous thing, and the implicit demand is that you keep up. Mastery is impossible in that mode. By the time you’ve learned the mechanism, the mechanism has been replaced.

    Holding still is a competitive advantage exactly because most people can’t. While everyone else is unboxing their Googlebook in October and figuring out where Gemini’s Magic Pointer fits into their workflow, my workflow won’t have changed — because the workflow doesn’t live on the laptop. It lives in the stack. The laptop is just a window into the stack. A new laptop is a new window. The view is the same.

    The Lighthouse

    Three-section lighthouse model with copper base, porcelain middle, and steel top projecting a warm beam through workshop fog
    Signal. Authority compounds when you stay put and keep the light on.

    Lighthouses don’t move. That’s the whole point of them. A lighthouse that wandered around the coastline trying to find the best vantage would not be useful to anyone — ships wouldn’t know where it was, the beam would never settle, and the entire purpose of having a fixed reference point in a foggy world would collapse.

    Content authority works the same way. The sites that get cited by AI models — that show up in Google’s AI Overviews, in Perplexity’s citations, in Claude’s own retrieval — are not the sites that pivoted the most. They are the sites that have been on the same beam for years, publishing the same kind of work, building the same kind of entity recognition, and giving language models a stable reference point to anchor to.

    This is true at the stack level too. The reason my content operations get more efficient month over month is not because I’m using new tools — it’s because Claude, Notion, and GCP have learned each other inside my workspace. The skill files in Claude know exactly which Notion databases to write to. The Notion routers know exactly which GCP services to dispatch. The GCP services know exactly which WordPress sites to publish to and how each one wants its content shaped. The beam is on. It keeps being on. Authority compounds in the version of you that didn’t move.

    The Hourglass

    Antique hourglass with three pillars of copper rope, porcelain grid, and brushed steel, golden sand falling onto polished gemstones
    Compounding. Time spent doesn’t drain. It crystallizes into something more valuable.

    This is the image that closes the piece, and it’s the one that took me the longest to understand. An hourglass usually represents time running out. Sand falls. The bulb empties. Eventually you’re done. The version I commissioned reframes it: golden sand falls into a bed of polished gemstones. Time doesn’t disappear into nothing. It compounds into something more valuable.

    That is the entire thesis of the broken-in glove. Time spent on the same stack does not drain. It crystallizes. Every additional week with Claude, Notion, and GCP makes the next week more leveraged, because the pattern library is bigger, the muscle memory is deeper, and the surface area I can act on without re-learning is wider. The opposite path — switching stacks, chasing the new thing, restarting the muscle memory — is the path where time actually drains. The bulb empties and there is no gemstone bed underneath.

    So when Googlebook launches in fall 2026 and people ask me whether I’m getting one, the answer is: maybe, eventually, as a window into the stack I already have. But not as a replacement for anything. The stool is the stool. The legs are the legs. And the glove is finally starting to feel like mine.

    Frequently Asked Questions

    What is the three-legged stack at Tygart Media?

    The three-legged stack is the operating system Tygart Media uses to run an AI-native content and SEO agency across 27+ client sites. The three legs are Claude as the intelligence layer, Notion as the control plane, and Google Cloud Platform as the compute plane. The architecture follows an Integration Spine: GitHub stores the source of truth, GitHub Actions plus Workload Identity Federation move work to Cloud Run with no stored credentials, and Cloud Run reports back to Notion.

    Why three tools instead of more?

    Three is the minimum number of points required to define a plane, which makes a three-legged structure inherently stable on any surface. Adding a fourth tool before mastering the first three adds switching cost and surface area without adding capability. Depth in three tools produces more leverage than breadth across six.

    How does the stack handle a 27-site content operation?

    Claude generates and optimizes content via skills that encode the standards for SEO, AEO, and GEO. Notion stores the editorial calendar, client briefs, Promotion Ledger, and the operating manual. GCP runs the Cloud Run publisher services that push optimized articles into WordPress sites via REST API, with all publishing actions logged back to Notion for audit. The stack is designed so that any single article passes through all three legs before going live.

    Is Tygart Media planning to adopt Googlebook when it launches?

    Not as a replacement for any part of the current stack. Googlebook will likely become useful as a thicker client surface over the same backend, but the actual operating system — Claude, Notion, GCP, and the Integration Spine — does not live on the laptop. The laptop is just a window into the stack. Switching laptops doesn’t change the view.

    What does “broken-in advantage” mean in an AI context?

    Broken-in advantage is the compounding effect that comes from sustained mastery of a single toolchain. Skills, automations, and muscle memory build on each other when the underlying tools stay constant. Operators who switch stacks frequently never reach the inflection point where the system becomes leveraged. Operators who hold still long enough to master the same three tools build a moat that’s harder to copy than any individual feature.

    Where does the restoration industry background fit in?

    Years of property damage restoration operations at Munters and Polygon taught the discipline that the AI-native content stack now runs on — batching, scope absorption, subcontracting as collaboration, and tiered trust systems. The thesis is that operations is operations. The substrate (restoration crews then, AI agents now) changes. The operating discipline doesn’t.

    How does the Promotion Ledger fit into the stack?

    The Promotion Ledger is a Notion database under a top-level page called The Bridge. Every autonomous behavior the system runs is tracked there by tier — A for proposed, B for human-flown, C for autonomous — with a clean-day counter and a failure log. Seven clean days on a tier qualifies a behavior for promotion. A failure resets the clock and demotes the behavior one tier. The Ledger is how the stack proves to itself that it can be trusted.

  • How to Evaluate Restoration AI Tools Without Getting Fooled: The Buyer Framework for a Difficult Vendor Environment

    How to Evaluate Restoration AI Tools Without Getting Fooled: The Buyer Framework for a Difficult Vendor Environment

    This is the fifth and final article in the AI in Restoration Operations cluster under The Restoration Operator’s Playbook. It builds on the four previous articles in this cluster: why most projects fail, what to build first, the source code frame, and the economics of agent-assisted operations.

    The buying environment in 2026 is genuinely difficult

    A restoration owner trying to evaluate AI tools in 2026 is operating in one of the most adversarial buying environments any business owner has faced in a generation. Vendor sales motions have been refined over two years of selling AI capabilities to operators who do not have the technical background to evaluate the claims. Demos have been engineered to showcase the strongest moments of the tool’s capability under controlled conditions. Reference customers have been carefully selected and coached. Pricing structures have been designed to obscure the real long-term cost. Capability descriptions blend the model’s general competence with the vendor’s specific implementation in ways that make it hard to tell what the buyer is actually getting.

    None of this is unusual for an emerging technology category. All of it is expensive for the buyer who does not have a framework for cutting through it.

    This article is the framework. It is not a list of vendors to consider or avoid. Vendors change every quarter and any list would be out of date by the time it is read. The framework is designed to be durable across vendor cycles, so that an owner using it in 2027 or 2028 will still be making good decisions even as the specific products and providers shift.

    The first question: what work, exactly, is the tool doing?

    The most useful first question to ask any AI vendor in restoration is also the question that most often does not get asked clearly. The question is: describe, in operational terms, the specific work this tool will do that a human is currently doing in my company.

    Vendors are usually prepared to answer this question in capability terms — the tool has natural language understanding, the tool integrates with our existing systems, the tool produces outputs in the formats we already use. None of those answers identifies the actual work being done. The follow-up has to be specific. Is the tool reading inbound communications and producing summaries that a senior operator would otherwise produce? Is it generating draft scopes that an estimator would otherwise write? Is it organizing photo files that a technician would otherwise organize? Is it drafting customer communications that a customer service lead would otherwise draft?

    If the vendor cannot answer this question in concrete operational terms, the deployment will fail. The vendor either does not understand the operational reality of the work the tool is supposed to support, or they do understand and are obscuring it because the operational impact is smaller than their marketing suggests. Either way, the answer is to keep evaluating other options.

    If the vendor can answer this question clearly, the next question is: show me an example of the tool doing that work on a file that resembles the kind of file my company actually handles, with operational detail similar to ours, not on a curated demo file. The willingness to do this is itself diagnostic. Vendors who can show this without retreating to the controlled demo are operating from a position of confidence in their tool. Vendors who cannot are signaling that the tool only performs reliably under conditions the buyer will not actually replicate.

    The second question: where is the captured judgment coming from?

    The second high-leverage question is about the source of the operational judgment the tool will be applying. As established in the source code piece, AI tools render the patterns they have been given access to. The buyer needs to know what those patterns are.

    The right question is: where does the operational judgment in this tool’s outputs come from? Is it the model’s general training? Is it your company’s internal patterns from working with other restoration customers? Is it patterns from my own company’s documentation that I would provide as part of the deployment? Is it some combination?

    Vendors offering tools whose operational judgment comes primarily from the model’s general training are offering generic AI with a restoration interface. The outputs will be plausible and superficially competent, but they will not reflect the operational specificity that makes outputs actually useful. These tools fail in the way described in the failure piece: the senior operators see the outputs, recognize them as wrong, and stop trusting the tool.

    Vendors offering tools that draw on patterns from other restoration customers are offering something more specific, but with a complication the buyer needs to understand. Those patterns reflect the operational standards of the other customers, which may or may not match the buyer’s standards. If the buyer’s company has a deliberate operational discipline that differs from the industry average, the tool’s outputs will pull toward the industry average rather than reflecting the buyer’s specific standards. This is sometimes acceptable and sometimes a serious problem, depending on whether the buyer wants their tool to reinforce their differentiation or dilute it.

    Vendors offering tools that explicitly draw on the buyer’s own documentation, standards, and captured judgment are offering the only configuration that produces reliably useful outputs at the operational level. These are also the deployments that require the most upfront work from the buyer, because the captured judgment has to actually exist before the tool can use it. There is no shortcut. If the buyer has not done the documentation work, no vendor can fix that.

    The third question: what does the success metric look like?

    The third question is about how the deployment will be evaluated, which determines whether the company will know whether the tool is working.

    The right question is: what specific operational metric will tell us whether this tool is creating value, and how will that metric be measured?

    Vendors who answer this question with usage metrics — engagement, login frequency, feature adoption — are offering something that is easy to measure and irrelevant to whether the tool is actually working. Usage metrics measure whether people are interacting with the tool. They do not measure whether the interaction is producing operational value.

    Vendors who answer this question with operational metrics — senior operator hours saved per week, files processed per estimator per week, scope accuracy improvement, documentation quality scores — are offering something that is harder to measure and meaningful. The buyer’s job is to make sure the operational metric is concrete, measurable, and tied to a number that already exists in the business. A claimed metric that requires inventing new measurement infrastructure to track is a metric that will not actually be tracked, which means it will not actually be measured, which means the deployment cannot actually be evaluated.

    The answer the buyer is looking for is something like: before the deployment, your senior estimators handle thirty files per week each. After the deployment, with the tool’s review acceleration, the same estimators should handle sixty to seventy files per week with comparable accuracy. We will measure files-per-estimator-per-week starting baseline at deployment and tracking weekly through the first six months. This is a defensible commitment. Vendors who will not make this kind of commitment do not believe their own claims.

    The fourth question: what happens when the tool is wrong?

    The fourth question is about the tool’s behavior under failure. AI tools are wrong sometimes. The question is what happens when they are.

    The right question is: walk me through what happens when this tool produces an incorrect output. How does the user discover the error? How does the system learn from the error? How does the company avoid acting on the error?

    Vendors who have not designed for failure will answer this question vaguely. The tool is very accurate, the model is constantly improving, the outputs are reviewed by users before being used. None of these answers describes a failure-handling architecture. They describe a hope that failures will be rare.

    Vendors who have designed for failure will describe a specific architecture. The tool flags its own confidence level on outputs. The user has a defined workflow for marking an output as incorrect. The marked errors flow into a feedback queue that is reviewed and acted on. The tool’s behavior changes in response to the corrections. The architecture is concrete enough that the buyer can imagine the workflow operating in their company.

    This question is one of the highest-signal questions in any AI vendor evaluation. Vendors who have built serious tools have thought hard about failure handling, because the failure handling is what determines whether the tool maintains credibility with users over time. Vendors who have not thought about failure handling are offering tools that will lose user trust within the first three months of deployment.

    The fifth question: what are the long-term costs?

    The fifth question is about the real economics of the deployment, which is rarely what the initial pricing conversation suggests.

    The right question is: walk me through the total cost of running this tool in my company at full deployment scale, twenty-four months from now, including model usage, runtime, integration maintenance, internal personnel time for review and configuration, and any growth in vendor pricing.

    Vendors who price AI tools as fixed monthly subscriptions are absorbing the variable cost of model usage and runtime into their margin. This works for them as long as average usage stays below their pricing assumption. As the buyer’s deployment matures and usage grows, the vendor either absorbs the loss, raises prices significantly, or imposes usage caps that constrain the buyer’s ability to scale the capability. The buyer needs to understand which of these will happen and plan for it.

    Vendors who price AI tools as usage-based often present a low headline cost based on initial usage assumptions. As the deployment matures and usage grows, the cost grows proportionally. The headline number is misleading. The buyer needs to model usage at full deployment scale, not initial scale.

    Vendors who are honest about the cost structure will walk through both the model and runtime costs and the personnel cost of maintaining the deployment internally. The personnel cost is the largest component for any meaningful AI deployment, as discussed in the economics piece, and it is the cost most often left out of vendor pricing discussions because it does not flow through the vendor’s invoice. The buyer who does not account for it has not understood the real cost.

    The sixth question: what is the exit?

    The sixth question is about what happens if the relationship does not work out.

    The right question is: if I decide in eighteen months that I want to stop using this tool, what do I take with me, what do I leave behind, and how disruptive is the transition?

    Vendors who have built tools designed for buyer power will describe an exit that allows the buyer to keep their captured operational standards, their training data, and their workflow configurations in transferable form. The buyer can move to a different runtime if they need to.

    Vendors who have built tools designed for vendor power will describe an exit that leaves the buyer with very little. The captured operational substrate is locked into the vendor’s proprietary format. The configuration work cannot be replicated elsewhere. The buyer has to start over if they leave.

    The question is diagnostic regardless of whether the buyer ever actually exits. A vendor who has designed a tool the buyer can leave is a vendor who is confident enough in the tool’s value to compete on quality rather than lock-in. A vendor who has designed lock-in into the architecture is a vendor who is preparing to extract more value from the relationship than they would otherwise be able to. The buyer should know which kind of vendor they are dealing with before signing.

    What the framework excludes

    This framework intentionally does not include several questions that are commonly asked in AI vendor evaluations and that are usually less informative than they seem.

    It does not include questions about the underlying model. Which AI model the vendor is using matters less than how they are deploying it. The same model can be configured to produce excellent outputs or terrible outputs depending on the deployment architecture. Asking which model is the foundation tells the buyer almost nothing about what they are buying.

    It does not include questions about technical certifications, security badges, or compliance frameworks. These matter for procurement, but they do not predict whether the tool will produce operational value. Many tools with extensive security documentation are operationally useless. Many tools that produce real operational value have less impressive security documentation. The two dimensions need to be evaluated independently.

    It does not include questions about the vendor’s funding, growth rate, or customer count. These matter for vendor risk assessment but do not predict tool quality. Some of the best operational AI tools in restoration come from small focused vendors. Some of the worst come from well-funded category leaders. The buyer should care about whether the tool works, not whether the vendor will exist in five years — the latter being a question that is difficult to answer reliably regardless of how it is researched.

    The cluster ends here, and what to do with it

    The five articles in this cluster describe a complete mental model for thinking about AI in restoration operations in 2026. The model has six components. Most projects fail for predictable reasons. The right place to start is the operational middle layer, with documentation acceleration. The senior operator is the source code, and protecting that operator is the central strategic question. The economics of agent-assisted operations are the underdiscussed factor that will determine who is profitable in 2028. The buyer’s framework above is the practical instrument for cutting through vendor noise.

    Owners who internalize this model will make consistently better decisions about AI than owners who chase vendor cycles, follow industry trends, or try to evaluate each tool on its own marketing. The model is the asset. The specific tools the model leads to are interchangeable.

    The cluster on AI in Restoration Operations is closed. The next clusters in The Restoration Operator’s Playbook will go deep on senior talent, on financial operations discipline, on carrier and TPA strategy, on crew and subcontractor systems, and on end-in-mind decision frameworks. Each cluster compounds with the others. The full body of work, when it is complete, will give restoration operators a durable mental architecture for navigating an industry that is changing faster than at any time in its history.

    Operators who read it and act on it will know what to do. Operators who do not will find out later what their competitors knew earlier.

  • Four-Layer Data Architecture: Building Around Behaviors, Not Tools

    Four-Layer Data Architecture: Building Around Behaviors, Not Tools

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    The instinct, when building a complex operation, is to find one tool that can hold everything. One source of truth. One dashboard. One system of record for all data types.

    This instinct is wrong, and it produces exactly the kind of system it’s trying to avoid: a single tool that does everything poorly, a migration project that costs more than the original implementation, and a team that has learned to distrust the data because the tool was never designed for the behaviors it was forced to support.

    The behavior-first alternative for data architecture doesn’t start with “what tool can hold everything.” It starts with: what are the distinct behaviors this data needs to support, and which tool is genuinely best suited for each one?

    The Four Data Behaviors

    In a multi-site AI-native content operation, four distinct data behaviors emerge:

    Machine-generated operational data needs to be written and read by automated systems at high speed. Batch job results, embedding vectors, image processing logs, Cloud Run execution histories. No human looks at this data directly. It needs to be fast, cheap, and structured for programmatic access. GCP serves this behavior — Firestore for structured operational state, Cloud Storage for large artifacts, BigQuery for analytical queries across the full dataset.

    Human-actionable signals need to be displayed clearly enough that a person can take action without wading through noise. Site health alerts, content gaps, client status changes, task assignments. This data needs to be readable, filterable, and connected to the people who need to act on it. Notion serves this behavior — not because it’s the most powerful database, but because it’s the most human-readable one, with views that can surface exactly the signal each role needs.

    Published content needs to be delivered to web visitors and search engines at performance standards those audiences require. WordPress serves this behavior. It was designed for it. The mistake is asking WordPress to also serve as the storage layer for unpublished content, the analytics layer for content performance, or the task management layer for content production. It wasn’t designed for those behaviors and it’s not good at them.

    Files and documents need to be stored, versioned, and shared across tools and collaborators. Google Drive serves this behavior. Skills, SOPs, brand guidelines, exported data — anything that exists as a file rather than as structured data belongs in Drive, not in a database trying to handle file attachments as a secondary feature.

    Why Separation Produces Better Systems

    A four-layer architecture feels like more complexity than a single-tool approach. In practice it produces less complexity, because each tool is operating within its design constraints instead of being stretched beyond them.

    The signal-to-noise problem in most dashboards comes from forcing machine-generated data and human-actionable signals into the same view. The machine data overwhelms the human signals. The solution is usually “better filtering” — which is the wrong answer. The right answer is storing machine data where machines can read it and surfacing human signals where humans can act on them.

    The performance problem in most content operations comes from asking WordPress to be a content management system when it’s a content delivery system. The content that belongs in a CMS — drafts, revisions, briefs, research notes — should be in Notion. The content that belongs in a CDS — published articles, page templates, media files — should be in WordPress. When you separate these, both tools perform their actual function better.

    The data loss problem in most operations comes from treating the most convenient tool as the system of record. When content lives only in WordPress, a site failure is a data failure. When operational state lives only in a Cloud Run service, a deployment change is a state failure. The four-layer architecture ensures that each data type has a permanent home in the tool designed to hold it — and that the tools interact through APIs rather than through manual migration.


  • Build the System Around the Behavior, Not the Tool

    Build the System Around the Behavior, Not the Tool

    Tygart Media Strategy
    Volume Ⅰ · Issue 04Quarterly Position
    By Will Tygart
    Long-form Position
    Practitioner-grade

    There is a mistake that kills more technology projects than bad code, bad vendors, or bad timing combined. It happens before a single line is written, before a single subscription is purchased, before anyone even knows there’s a problem.

    The mistake is this: choosing the tool before understanding the behavior.

    It looks like a reasonable decision. You need to manage customer relationships, so you buy a CRM. You need to publish content, so you build around WordPress. You need to organize knowledge, so you set up Notion. The tool selection feels like the hard part — the research, the demos, the pricing comparisons. By the time you’ve chosen, you feel like the work is half done.

    It isn’t. You’ve just committed to building a system shaped like a tool instead of shaped like a behavior. And when the behavior and the tool don’t match, the system fails quietly — not in a crash, but in a slow drift toward abandonment, workarounds, and the quiet understanding that “we don’t really use that anymore.”

    The alternative is building the system around the behavior first. It sounds obvious. Almost nobody does it.


    What “Behavior-First” Actually Means

    A behavior is what actually happens — or needs to happen — in your operation. It’s not a goal, not a feature request, not a capability. It’s the specific sequence of actions, decisions, and handoffs that produce a result.

    Most system design starts with tools and works backward to behaviors. Behavior-first design starts with the behavior and works forward to the minimum set of tools that can serve it.

    The difference sounds subtle. The outcomes are not.

    When you start with the tool, you spend the first six months learning the tool’s shape and then trying to reshape your operation to fit it. When you start with the behavior, you spend the first six months building a system that serves the operation — and then choosing the simplest tool that delivers what the behavior requires.

    The tool-first approach produces complexity. The behavior-first approach produces leverage.


    Six Behaviors That Built This Operation

    The following examples are drawn from a single AI-native operation built over three years. None of them started with a tool selection. All of them started with the question: what actually needs to happen here?

    1. Write → Store → Distribute (The Content Pipeline)

    Most content operations are built around WordPress. The platform is the system. Articles go into WordPress, WordPress manages drafts, WordPress publishes, WordPress is the source of truth. This is tool-first design.

    The behavior is different. The behavior is: write a piece of content, preserve it permanently, distribute it to wherever it needs to go.

    When you build around that behavior, WordPress becomes one destination among several — not the system. Notion becomes the storage layer. WordPress becomes the distribution layer. The article exists independently of where it’s published. If WordPress goes down, if the WAF blocks you, if the site moves hosts — the content is not at risk. The behavior (write → store → distribute) is served by a stack of tools, none of which is the irreplaceable center.

    The practical result: every article written in this operation goes to Notion first, WordPress second. Not because Notion is a better publishing platform — it isn’t. Because the behavior requires permanent, accessible storage before distribution, and WordPress was never designed to be that.

    2. Identify → Deposit → Execute (The Work Order Architecture)

    The problem: an AI system can identify what’s wrong with a WordPress site in seconds — thin content, missing schema, broken taxonomy, orphan pages — but the identification and the fix are handled by completely different systems. The identification lives in a conversation. The fix lives in a deployment. There’s no bridge.

    The behavior is: Claude identifies a problem, deposits a structured work order, a Cloud Run worker executes it. The intelligence and the execution are decoupled. Neither layer needs to know how the other works.

    Built around that behavior, the tool choices become obvious. Notion holds the work order queue — not because Notion is a task management tool (though it is), but because Claude can write to it via API and a Cloud Run service can read from it. The tools serve the behavior. The behavior doesn’t contort to serve the tools.

    3. Extract → Distill → Deploy (The Human Distillery)

    The behavior here is one of the rarest in any knowledge-intensive industry: taking tacit knowledge — the unwritten, unspoken operational intelligence that lives in people’s heads — and converting it into structured artifacts that AI systems can immediately use.

    Tacit knowledge doesn’t fit into forms, surveys, or databases. It surfaces through conversation. The extraction behavior is a specific sequence: disarm the subject, descend through four layers of questioning (documented protocol → exception cases → sensory knowledge → counterfactual pressure), capture what surfaces, and distill it into a dense artifact.

    That behavior existed long before any tool was selected to support it. The tool choices — which models to run distillation through, how to structure the output schema, where to store the resulting knowledge concentrates — all came after the behavior was understood. The behavior is irreplaceable. The tools are interchangeable.

    4. Observe → Route → Produce (Task Routing for Variable Attention)

    Most productivity systems are built around the assumption that the operator applies consistent, scheduled attention to work. Tasks sit in queues. Work happens in order. Focus is managed through priority.

    That behavior doesn’t match how an ADHD-wired operator actually works. The actual behavior is: attention arrives unbidden, attaches to whatever has activated the interest system, runs at extraordinary intensity, and then ends — also unbidden. The work happens in spirals, not lines.

    An AI-native operation designed around this actual behavior routes tasks differently. High-interest, high-judgment work goes to the operator when the operator’s attention is activated. Low-interest, deterministic work gets routed to automated pipelines that run on schedule regardless of operator state. The behavior — variable, interest-driven, high-intensity — shapes the system. The system doesn’t demand behavior the operator can’t deliver.

    The result is not a workaround. It’s an architecture. And the architecture works better for a neurotypical operator too — because the constraints that neurodivergence makes extreme are present in milder form in everyone.

    5. Touch → Remind → Refer (The CRM Community Framework)

    The restoration industry spends $150–$500 per lead acquiring customers and then never contacts them again. Not because they don’t want to. Because the tool they have — a job management system built around transactions — doesn’t support the behavior they need.

    The behavior is: make consistent, relevant, human contact with warm relationships at regular intervals, using legitimate business moments as the reason. That’s it. The behavior is simple. The tool selection is almost irrelevant — a spreadsheet and a Mailchimp free account can execute it. What matters is that the system is built around the behavior (stay present in warm relationships) rather than around the tool (send marketing emails).

    When you build around the tool, you get a marketing email campaign. When you build around the behavior, you get a community — a network of people who feel a genuine two-way relationship with your company and who refer you business because you’re the company that actually stayed in touch.

    The technical implementation of this — segmentation from ServiceTitan and Jobber, email automation in Mailchimp or Brevo, relationship intelligence in a Notion Second Brain — is documented in full in the CRM Community Framework series. Every tool choice in that series is downstream of the behavior. None of it works if you start with the tool.

    6. Signal → Display → Act (The Four-Layer Data Architecture)

    A complex multi-site operation generates data from dozens of sources simultaneously — WordPress post metrics, GCP Cloud Run logs, Notion task statuses, client pipeline movements, content performance signals. The instinct is to find one tool that can hold all of it. The tool becomes the system.

    The behavior is different for each data type. Machine-generated operational data (image processing logs, batch job results, embedding vectors) needs to be written and read by automated systems at high speed. Human-actionable signals (site health alerts, content gaps, client status changes) need to be displayed in a way a person can act on without noise. Content in progress needs to be stored independently of where it will ultimately be published.

    Four behaviors. Four tool layers. WordPress for published content, GCP for machine data, Notion for human signals, Google Drive for files. No single tool tries to do all four. Each tool is chosen because it’s the best fit for one specific behavior — not because it can technically handle the others.


    How to Apply This in Your Operation

    The behavior-first design process has three steps, and none of them involve opening a browser tab to research tools.

    Step 1: Write down what actually needs to happen. Not what you want to accomplish. Not what you wish the system could do. The specific sequence of actions that produces the result you need. Subject → verb → object, repeated until the behavior is fully described. “Someone writes an article. The article needs to be findable in six months. The article needs to be published to a website.” That’s a behavior. “We need better content management” is not.

    Step 2: Identify where the behavior breaks down today. Every system has the places where it works and the places where it silently fails. A CRM that nobody updates after the job closes. An email platform that has contacts from three years ago and no segmentation. A content process that lives in someone’s head. These are the behavior gaps — the places where the actual behavior doesn’t match the intended behavior.

    Step 3: Choose the simplest tool that serves the behavior. Not the most powerful. Not the most popular. Not the one with the best demo. The one that makes the behavior easiest to execute consistently. A $13/month Mailchimp account and a Google Sheet will outperform a $400/month marketing platform if the behavior is four emails per year to a warm local database — because the complexity of the expensive tool introduces friction that kills the behavior entirely.


    The AI-Native Operation Is Behavior-First by Definition

    The reason AI-native operations tend to outperform tool-native operations has nothing to do with AI being smarter. It has to do with design philosophy.

    AI tools, at their best, are infinitely flexible. They don’t impose a shape on your operation. They serve whatever behavior you describe. The operator who builds an AI-native operation is forced — by the nature of the tools — to understand their own behaviors first. You cannot prompt your way to a useful output without knowing what useful looks like. You cannot build a pipeline without understanding the sequence it’s meant to automate.

    This is why the AI-native operator has a structural advantage over the SaaS-native operator. Not because their tools are better. Because the process of building with AI forces behavior-first thinking, and behavior-first thinking produces systems that compound over time instead of decaying into expensive shelf-ware.

    The tool will change. The behavior won’t. Build the system around the behavior.


    Frequently Asked Questions

    How do you identify the behavior if you’ve always built around tools?

    Start with the breakdowns. Wherever your current system has workarounds, manual steps, or things people do “outside the system,” those are the places where the tool’s shape and the behavior don’t match. The workarounds are the behavior. Build the new system to serve them directly.

    Doesn’t this make tool selection harder and slower?

    It makes it faster. When you know the behavior precisely, you have a clear evaluation criterion: does this tool make the behavior easier to execute consistently, or does it add complexity? Most tool evaluations fail because the criteria are vague. Behavior-first evaluation is fast because the test is concrete.

    What if the behavior changes over time?

    Behaviors evolve. Systems built around behaviors can evolve with them — you swap the tool layer without disrupting the behavior layer. Systems built around tools can’t evolve without a full rebuild, because the tool is the system. Behavior-first architecture is inherently more resilient to change.

    Is this just another way of saying “process before technology”?

    It’s related but more specific. “Process before technology” is usually interpreted as documentation before implementation — write the SOPs, then build the tools to support them. Behavior-first design is about understanding the actual behavior of the operation, which often differs significantly from the documented process. You’re designing around what people and systems actually do, not what they’re supposed to do.

    How does this apply to AI tool selection specifically?

    AI tools are especially susceptible to tool-first thinking because they’re impressive in demos. The demo shows capability; the behavior question asks whether that capability serves a specific sequence in your operation. Most AI tool adoptions fail not because the tools are bad but because they were selected based on capabilities rather than behaviors. The question is never “what can this tool do?” It’s “which of my behaviors does this tool serve, and does it serve them better than what I have now?”


  • MCP Servers Explained: Model Context Protocol Tutorial

    MCP Servers Explained: Model Context Protocol Tutorial

    Last refreshed: May 15, 2026

    Claude AI · Fitted Claude

    Model Context Protocol (MCP) is the most important infrastructure development in Claude’s ecosystem in 2026. It’s an open standard for connecting AI models to external tools, data sources, and services — replacing fragmented one-off integrations with a universal interface. This guide explains what MCP is and how to set up your first server.

    What Is MCP?

    MCP defines a universal interface: any tool that implements the MCP server specification can connect to any AI application implementing the MCP client specification. Build once, connect anywhere. Before MCP, connecting Claude to external systems required custom integration code for every integration — and none of it worked across different AI tools.

    MCP Architecture

    • MCP Host: The AI application (Claude desktop, Claude Code, your custom app)
    • MCP Client: Built into the host; manages connections to servers
    • MCP Server: Lightweight program exposing tools, resources, or prompts

    Setting Up MCP in Claude Desktop

    Go to Settings → Developer → Edit Config. Add your server configuration:

    {
      "mcpServers": {
        "filesystem": {
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-filesystem", "/path/to/directory"]
        }
      }
    }

    Restart Claude Desktop. Claude can now read, write, and manage files in your specified directory.

    Popular MCP Servers

    Server What It Does
    Filesystem Read/write local files
    GitHub Manage repos, issues, PRs
    PostgreSQL Query databases
    Slack Read/send messages
    Brave Search Real-time web search
    Zapier Connect to 8,000+ apps

    Frequently Asked Questions

    Is MCP open source?

    Yes. Anthropic open-sourced the MCP specification and official server implementations.

    Do I need to code to use MCP?

    To install existing servers: basic command-line comfort is enough. To build custom servers: TypeScript or Python knowledge required.


    Need this set up for your team?
    Talk to Will →

  • Will’s Second Brain as an API: Should You Productize Your Context Stack?

    Will’s Second Brain as an API: Should You Productize Your Context Stack?

    Tygart Media / Content Strategy
    The Practitioner JournalField Notes
    By Will Tygart
    · Practitioner-grade
    · From the workbench

    Origin note: This started as a half-formed thought — “what if my second brain is what makes my Claude work so well, and what if I could let other people rent it?” The article below is the honest answer to that question, including the parts that argue against doing it.

    The Observation That Started It

    If you spend enough time building an operational stack on top of Claude — skills, Notion databases, retrieval pipelines, project knowledge, accumulated SOPs — you start to notice something strange. Your Claude does not just answer better than a fresh Claude. It moves better. It picks the right tool the first time. It remembers patterns from work you did six months ago on a different client. It improvises in ways that look almost like learning, even though the underlying model has not changed at all.

    The model is the same. The context is doing the work.

    That observation leads to an obvious question: if a curated context layer is what separates a useful AI from a frustrating one, could you sell access to your context layer? Not the model, not the prompts, not the chat interface — just the accumulated patterns, conventions, and operational wisdom, exposed as an API that any other AI workflow could pull from. Call it “Will’s Second Brain” or anything else. The pitch is: connect this to whatever you are building, and somehow it just works better. You will not always know why. That is part of the value.

    This article walks through whether that is actually a good idea, what it would cost, what the conversion math looks like, what the legal exposure is, and where the real moat would have to come from.

    The Category Already Exists (And That Is Mostly Good News)

    The “memory layer for AI agents” category is real and growing fast. Mem0, which is probably the most visible player, raised a $24M Series A in October 2025 and reports more than 47,000 GitHub stars on its open-source SDK. Their pitch is essentially the one above: instead of stuffing the entire conversation history into every LLM call, route through a memory layer that retrieves only the relevant context. They claim around 90% lower token usage and 91% faster responses compared to full-context approaches. Their pricing tiers run from a free hobby plan (10K memories, 1K retrieval calls per month) to $19/month Starter to $249/month Pro to custom enterprise pricing.

    Letta, formerly MemGPT, takes a different approach — it is a full agent runtime built around tiered memory (core, recall, archival) that mirrors how operating systems manage RAM and disk. Zep and its Graphiti engine focus on temporal knowledge graphs. SuperMemory bundles memory and RAG with a generous free tier. Hindsight publishes benchmark results claiming 91.4% on LongMemEval versus Mem0’s 49.0%, and offers all four retrieval strategies on its free tier. LangMem ships with LangGraph for teams already on that stack. AWS has Bedrock AgentCore Memory as the managed equivalent.

    The good news in all of that: the category is validated. Buyers exist. Pricing precedents exist. The bad news: you are not going to win on infrastructure. You are not going to out-engineer a YC-backed team with $24M in funding and 47K stars. If you enter this space, you have to enter on a different axis entirely.

    Where The Real Moat Would Be

    The moat is not the storage. The moat is what is in the storage.

    Mem0, Letta, and the rest sell empty memory layers. You bring the data. The promise is: if you put your facts in here, retrieval will be fast and cheap. That is a real value proposition, but it is a tooling pitch, not a knowledge pitch. The customer still has to build the knowledge themselves.

    A second-brain-as-a-service offering would sell a pre-loaded memory layer. Not “here is a fast retrieval system,” but “here is a retrieval system that already knows how an AI-native content agency thinks about WordPress, SEO, GEO, AEO, taxonomy architecture, content refresh strategy, hub-and-spoke linking, Notion command center design, GCP publishing pipelines, and the operational lessons from running 27 client sites.” That is not a tooling product. That is consulting wisdom packaged as middleware.

    The closest analogies are not Mem0 or Letta. They are things like:

    • Cursor’s index of best practices baked into its autocomplete — the tool ships with an opinion about what good code looks like, and that opinion is the product.
    • Linear’s opinionated workflows — the value is not the database, it is the prescribed way of working that the database enforces.
    • 37signals’ Shape Up methodology being sold as a book — accumulated operational wisdom packaged as a product separate from the consulting practice.

    The “second brain as an API” pitch is closer to Shape Up than to Mem0. The technical layer is just the delivery mechanism.

    The Economics: Cheaper Than You Think, Harder Than You Think

    Per-query costs for serving a RAG API are genuinely low. A typical retrieval call against a vector store runs somewhere in the range of fractions of a cent to a few cents depending on embedding model, vector store, and how many chunks you return. If you self-host on GCP using Cloud Run, BigQuery, and Vertex AI embeddings, marginal serving cost per query is negligible at small scale and only becomes meaningful at thousands of queries per minute.

    The cost problems are not the queries. They are:

    • Free trial abuse. Developer-facing API products with free trials get hammered. Bots, scrapers, people running benchmarks against you for blog posts, competitors testing your retrieval quality. If you offer any free tier without a credit card on file, expect a meaningful percentage of total traffic to be abuse. Hard rate limits and required payment methods from day one are not optional.
    • Support load. Even a “just connect this and it works” product generates support tickets. Integration questions, schema confusion, “why did it return X when I asked Y,” “how do I cite this in my own product.” For a single operator, support load is the actual scaling constraint, not infrastructure.
    • Conversion math. Free-trial-to-paid conversion for self-serve developer tools typically runs in the 2% to 5% range, with some outliers higher and many lower. A trial that converts at 2% needs roughly 50 trial signups per paying customer. If your trial is generous and your conversion is on the low end, you can spend more on serving free users than you earn from paid ones, especially in early months when paying user count is small.

    None of this kills the idea. It just means the business case has to be built on top of realistic assumptions, not aspirational ones.

    The Scrubbing Problem (This Is The Scariest Part)

    An accumulated operational knowledge base built from real client work is, by definition, contaminated with information that cannot leave the building. Client names. Service URLs. App passwords. Internal strategy documents. Competitor analysis. Personal references. Names of contractors and partners. Slack-style observations about which clients are easy to work with and which are not. Pricing conversations. Things a client said in a meeting.

    “I will scrub the data before I expose it” is a sentence that gets people sued. The problem is that scrubbing, done as a filter on top of live data, always misses things. You build a regex for client names, but you forget a client was referenced obliquely in a footnote. You strip URLs, but a screenshot or a code example contains a domain. You remove credentials, but an old version of a SOP still has an example token in it. Filters are 95% solutions to a problem that needs a 100% solution, because the failure mode of the missing 5% is “client finds their internal information being served to a stranger via your API.”

    The right architecture is not a filter. It is a clean room.

    That means a separate knowledge base, built from scratch, that contains only the patterns, conventions, and methodology — never the source material it was extracted from. You read your accumulated work, you write generalized lessons by hand or with heavy review, and those generalized lessons become the product. The production knowledge base never touches the serving knowledge base. There is an air gap, not a pipeline.

    This is more work than the “scrub and ship” approach. It is also the only version that does not end in a lawsuit.

    Liability Exposure

    The moment “Will’s Second Brain” is connected to someone else’s workflow, three new liability vectors open up:

    1. Bad output causes a bad decision. Customer uses your API to generate strategy, follows the strategy, loses money, blames you. Mitigated by ToS, liability caps, and clear disclaimers that the service is informational and not professional advice.
    2. Hallucinated facts get cited as authoritative. Your knowledge base says something confident, customer publishes it, the something is wrong, customer’s audience holds them responsible. Mitigated by disclaimers and by being conservative about what gets included in the seed data.
    3. Your contaminated data ends up in front of the wrong eyes. See previous section. Mitigated by the clean-room architecture, not by promises.

    The minimum legal infrastructure to launch is: an LLC, a Terms of Service with clear liability caps, a Privacy Policy, errors and omissions insurance, and ideally a separate entity that owns the product so the consulting business is shielded if the product business gets sued. None of these are expensive individually. All of them are necessary together.

    The Loss Leader Question

    One framing of the idea is: do not try to make money from it directly. Give it away. Let it serve as the most aggressive top-of-funnel content marketing asset Tygart Media has ever shipped. Every developer who connects “Will’s Second Brain” to their workflow becomes aware of Tygart Media. Some fraction of them will eventually need the consulting practice that the second brain was extracted from.

    This is a much more defensible version of the idea, for three reasons:

    • It removes the trial conversion math from the critical path. You are not optimizing for paid signups. You are optimizing for awareness and mindshare.
    • It removes most of the support burden. Free tools have lower customer expectations. “It is free, here is the docs page” is a complete answer in a way that “you are paying $19 a month, please help me debug my integration” is not.
    • It changes the liability story. Free tools used at the user’s own risk have a much easier time enforcing liability caps than paid services do.

    The cost side of a free version is real but manageable. Hard rate limits, required signup with a real email address (for the funnel, not the billing), aggressive abuse detection, and serving costs absorbed as a marketing line item rather than a COGS line item. A few hundred dollars a month of GCP spend is cheaper than most paid ad campaigns and probably reaches more qualified people.

    Verdict

    The idea is good. The business is hard. The two are not the same thing.

    The version that probably works is the loss-leader version: a free, rate-limited, clean-room knowledge API marketed as a top-of-funnel asset for the consulting practice, built from a hand-curated knowledge base that never touches client data, wrapped in a basic legal entity with a real ToS and E&O insurance. The version that probably does not work is the standalone subscription business with a free trial, because the trial economics, the support load, and the liability surface area are all more hostile than they look from the outside.

    The thing worth building first is not the API. It is the clean-room knowledge base. If you can hand-write 100 generalized operational patterns from the existing stack, in a way that contains zero client-specific information and reads as standalone wisdom, you have proven the product is possible. If you cannot — if every pattern keeps wanting to reference a specific client situation to make sense — then the wisdom is not yet abstract enough to package, and the right move is to keep accumulating and revisit in six months.

    Either way, the question that started this is the right question. Context is doing more work in modern AI than most people realize, and someone is going to figure out how to sell curated context as a product. It might as well be the operator who already has the most interesting context to sell.


    Reference Data and Knowledge Node Notes

    This section exists to make this article more useful as a knowledge node when scanned later. It contains the underlying market data, pricing references, and structural notes that informed the analysis above.

    Memory Layer Market Snapshot (2026)

    • Mem0: $24M Series A October 2025 (Peak XV, Basis Set Ventures). 47K+ GitHub stars. Apache 2.0 open source. Pricing: free Hobby (10K memories, 1K retrieval calls/month), $19 Starter (50K memories), $249 Pro (unlimited, graph memory, analytics), custom Enterprise. Claims 90% token reduction, 91% faster, +26% accuracy on LOCOMO benchmark vs OpenAI Memory. SOC 2, HIPAA available. Independent evaluation: 49.0% on LongMemEval.
    • Letta (formerly MemGPT): Full agent runtime, not just memory layer. Three-tier OS-inspired architecture (core, recall, archival). Self-editing memory where agents decide what to store. Apache 2.0, ~21K GitHub stars. Python-only SDK. Best for new agent builds, not for adding memory to existing stacks.
    • Zep / Graphiti: Temporal knowledge graphs. Strongest option for queries that need to reason about how facts changed over time. Reportedly scores 15 points higher than Mem0 on LongMemEval temporal subtasks.
    • Hindsight: MIT licensed. Claims 91.4% on LongMemEval. All retrieval strategies (graph, temporal, keyword, semantic) available on free tier including self-hosted.
    • SuperMemory: Bundled memory + RAG. Closed source. Generous free tier. Small API surface.
    • LangMem: Memory tooling for LangGraph. Three memory types: episodic, semantic, procedural (agents updating their own instructions). Free, open source. Requires LangGraph.
    • Bedrock AgentCore Memory: AWS managed equivalent. Out-of-the-box short-term and long-term memory.

    Conversion Rate Reference Numbers

    • Self-serve developer tool free trial → paid conversion: typically 2-5%, with B2B SaaS averages around 14-25% across all categories but developer tools tend to be lower because the audience is more skeptical and self-sufficient.
    • Freemium to paid conversion (no trial, just free tier): typically 1-4%.
    • Required credit card on free trial: roughly 2x conversion rate vs no card required, but 50-75% lower trial signup rate. Net result is usually higher quality but lower quantity.

    Cost Reference Numbers (GCP, 2026)

    • Vertex AI text embedding (gecko-003 or similar): roughly $0.000025 per 1K characters. A typical 500-word document chunk costs less than $0.0001 to embed.
    • BigQuery vector search: storage is cheap, queries scale with the size of the result set. A retrieval against 100K vectors returning top-10 typically costs well under a cent.
    • Cloud Run serving costs: minimum-instance-zero deployments cost nothing at idle. Per-request cost for a typical retrieval API is a fraction of a cent including CPU time and egress.
    • Realistic monthly serving cost for a free, rate-limited “second brain” API at modest usage (say, 100 active users averaging 50 queries per day): probably $50-200/month total infrastructure.

    The Clean Room Architecture (Recommended Approach)

    Two completely separate knowledge bases, never connected:

    1. Production knowledge base: The existing accumulated stack. Notion command center, Claude skills library, client SOPs, BigQuery operations ledger, everything tagged to specific clients and projects. This is the source of truth for the consulting practice. It never touches the public-facing system.
    2. Clean room knowledge base: Hand-written or heavily-reviewed generalized patterns. Contains zero client-specific information, zero credentials, zero internal strategy, zero personal references. Each entry is a standalone generalized lesson that could have been written by anyone with similar experience. This is what gets exposed via the API.

    The transfer between the two is manual or heavily reviewed, never automated. A regex filter is not a clean room. A human reading each entry and rewriting it is.

    Minimum Viable Legal Stack

    • Separate LLC for the product (shields the consulting practice)
    • Terms of Service with explicit liability cap (typically capped at fees paid in last 12 months, or for free service, capped at $0 plus minimal statutory damages)
    • Privacy policy covering what gets logged and retained
    • Errors and omissions insurance ($1M coverage typical, runs $500-1500/year for a small operation)
    • Clear “informational, not professional advice” disclaimers on every API response
    • Logged consent that the user understands the service is generative and may produce incorrect output

    Adjacent Concepts Worth Tracking

    • “Context as a service” as an emerging category — distinct from memory layers. Memory layers store what the user told them. Context services ship with knowledge already loaded.
    • The methodology-as-product pattern — Shape Up, Getting Things Done, the 4-Hour Workweek. These are all examples of operational wisdom productized into something that can be sold separate from the consulting practice that generated it.
    • Loss leaders as PR for consulting practices — 37signals’ Basecamp, Stripe’s documentation, Vercel’s open source projects. The free or cheap thing is the marketing for the expensive thing.
    • The “API for vibes” risk — products that promise “it just works better” without explaining why are hard to differentiate, hard to defend in court, and hard to upsell. The product needs at least one concrete claim that can be measured.

    Last updated: April 2026. Knowledge node tags: AI memory layers, productization, second brain, RAG, context engineering, loss leader strategy, clean room architecture, Mem0, Letta, Zep, agency productization, AI tooling business models.