Claude Code Insider - Tygart Media

Category: Claude Code Insider

Practitioner-level guides, comparisons, workflows, and real-world patterns for Claude Code — the agentic AI coding tool from Anthropic.

  • Claude Code’s Rate Limit Doubling: What May 2026 Changed and How to Pick a Plan Now

    Claude Code’s Rate Limit Doubling: What May 2026 Changed and How to Pick a Plan Now

    If you bought a Claude Code subscription in March or April and felt like you were hitting the 5-hour wall every single afternoon, you weren’t imagining it. Anthropic spent six months tightening Claude Code’s quotas — and then, over two weeks in May 2026, gave most of them back. The rate-limit math that drove plan-selection advice on the internet through April is now obsolete. Here’s what actually changed, what the numbers look like today, and how to think about Pro versus Max if you’re picking a plan this week.

    What Anthropic actually did

    On May 6, 2026, Anthropic doubled the 5-hour rate limits on Claude Code across every paid plan — Pro, Max 5x, Max 20x, Team Premium, and seat-based Enterprise. In the same announcement, they removed the peak-hour throttle that had been quietly halving available quota for Pro and Max users during weekday business hours. They also lifted API-side rate limits on the Opus tier.

    One week later, on May 13, 2026, they followed up with a 50% increase to the weekly cap across the same plans. Unlike the 5-hour change, that weekly bump carries an expiration date: July 13, 2026, unless extended. Treat it as a temporary boost, not a permanent feature.

    The trigger Anthropic pointed to is a deal that brings the full capacity of the Colossus 1 data center in Memphis online — over 300 megawatts and roughly 220,000 NVIDIA GPUs. That detail matters less than the practical one: capacity-driven throttling that had been the dominant constraint since late 2025 has loosened.

    The new numbers, by plan

    The shape of the plan ladder hasn’t changed — Pro at $20, Max 5x at $100, Max 20x at $200, Team Premium at $100/seat with a 5-seat minimum. What changed is what each tier actually delivers per window.

    • Pro ($20/mo): Roughly 90 prompts per 5-hour window now (up from a number that, in practice, was hovering around 45 once the peak-hour throttle kicked in). No peak penalty. Weekly cap is 50% higher through July 13.
    • Max 5x ($100/mo): Same doubled 5-hour window. Weekly Opus 4.7 budget moved from approximately 50 hours to approximately 75.
    • Max 20x ($200/mo): Doubled 5-hour window. Weekly Opus 4.7 budget moved from approximately 200 hours to approximately 300.
    • Team Premium ($100/seat/mo, annual; $125 monthly): Mirrors Max 5x quotas at the seat level. 5-seat minimum still applies.

    Two numbers that haven’t changed: the API pay-as-you-go pricing for the underlying models (claude-sonnet-4-6 at roughly $3 per million input tokens and $15 per million output; claude-opus-4-7 at roughly $5 in and $25 out), and the existence of the weekly cap itself. The weekly cap is still the thing that kills Max users mid-Friday.

    What this changes about plan selection

    Most of the “which plan should I buy” guides written before May 6 over-recommend Max 5x because they were sizing it against artificially compressed Pro limits. With a doubled 5-hour cap and no peak throttle, Pro at $20 is now genuinely enough for a developer doing focused coding sessions a few hours a day — something that wasn’t reliably true a month ago.

    The Max 5x case still holds, but it’s narrower now. The honest test: if you regularly burn through your Pro 5-hour window before lunch, or if you run two or three concurrent Claude Code sessions on different repos, $100 still pays for itself. If you don’t, Pro will hold.

    Max 20x is increasingly a workflow choice rather than a quota choice. The doubled limits made Max 5x sufficient for almost every solo workflow I can describe. Where 20x still earns its price is multi-agent workflows, where a coordinator-and-workers pattern can burn three to seven times the tokens of a single-agent session because every teammate maintains its own context window.

    The hidden costs that didn’t change

    The rate-limit relief is real, but several gotchas that drove “Claude Code costs me more than I expected” complaints in Q1 are still live:

    • Set ANTHROPIC_API_KEY in your shell and Claude Code bills at API rates — your subscription is silently ignored. Unset it before launching the CLI if you’re on a plan.
    • Weekly caps count active processing time only. Idle browsing is free. Long-running tool calls and extended-thinking budgets aren’t.
    • Extended thinking is billed as output tokens. On Opus 4.7 that’s roughly $25 per million. Default thinking budgets of tens of thousands of tokens per request stack up fast on API.
    • MCP server output sits in context for the rest of the session. A “list the last 20 PRs” call can dump 8,000 tokens of metadata that you’ll re-pay for on every subsequent turn until the conversation rolls over.

    If you were running into the 5-hour wall and assumed it was a usage problem, check whether one of those four is actually the cause before you upgrade.

    What to do this week

    If you’re on Pro and were considering Max 5x, wait two weeks. The new Pro ceiling is high enough that the upgrade decision now needs different evidence than it did in April.

    If you’re already on Max 5x and felt squeezed, the May 13 weekly bump should give you breathing room — but mark July 13 on your calendar. If the temporary 50% increase isn’t extended, the squeeze comes back.

    If you’re picking a plan from scratch today: start on Pro. The doubled limits are real, the peak-hour penalty is gone, and the upgrade path to Max stays open with no friction. Buy quota when you’ve measured that you need it, not before.

    The model versions to use

    For anyone writing the API string into a script this week: flagship is claude-opus-4-7, workhorse is claude-sonnet-4-6, fast tier is claude-haiku-4-5-20251001. Pull from docs.anthropic.com/en/docs/about-claude/models before shipping anything — the version strings have moved twice already this year and they’ll move again.

  • MCP Scopes in Claude Code: Why –scope Is the Flag That Saves Your Team

    MCP Scopes in Claude Code: Why –scope Is the Flag That Saves Your Team

    Everyone teaches you how to add an MCP server to Claude Code. Almost nobody teaches you where to add it — and that one decision, the scope flag, is the difference between a clean team setup and three engineers debugging why the same server works on one machine and not another. I’ve watched it happen. The fix is always the same: someone added a server at the wrong scope.

    If you run claude mcp add without thinking about scope, Claude Code makes the choice for you. It defaults to local. That’s fine for a throwaway experiment and wrong for almost everything else.

    The three scopes, and what each one actually controls

    Claude Code stores MCP server configurations in three places, and the --scope flag decides which one you’re writing to.

    Local scope (the default) writes the server config into your personal settings, keyed to the current project path, inside ~/.claude.json. Nobody else sees it. It doesn’t get committed. Open the same repo on your laptop at home and the server isn’t there. This is the scope you want for a one-off — a database you’re poking at this afternoon, a server you’re still deciding whether to keep.

    Project scope writes to a .mcp.json file at the root of the repository. You commit that file to git. Everyone who clones the repo gets the same servers, configured the same way. This is the scope that makes MCP a team decision instead of a personal one — and it’s the one most people skip because the default never points them at it.

    User scope writes to your global config so the server is available in every project you open, regardless of which repo you’re in. This is for the handful of servers you genuinely use everywhere — a documentation search server, a personal notes tool — not for anything project-specific.

    The mental model I use: local is “me, here, now.” Project is “anyone on this repo.” User is “me, everywhere.” If you can articulate which of those three sentences describes the server, you know the flag.

    The command, written three ways

    Same server, three scopes. The only thing that changes is the flag.

    # Local — default, personal, not committed
    claude mcp add --transport stdio my-db -- npx -y @some/db-mcp-server
    
    # Project — shared via .mcp.json, commit to git
    claude mcp add --scope project --transport stdio my-db -- npx -y @some/db-mcp-server
    
    # User — available in every project you open
    claude mcp add --scope user --transport stdio my-db -- npx -y @some/db-mcp-server

    Verify what’s connected and where it came from with claude mcp list. If a teammate reports a server “isn’t working” and yours is fine, this is the first command to run on both machines — the discrepancy is almost always a scope mismatch, not a broken server.

    The .mcp.json pattern that actually pays off

    Here’s the workflow that turns this from trivia into leverage. When you onboard a repo that the whole team uses, you decide once which MCP servers belong to that codebase — the Postgres server pointed at the dev database, the issue tracker, whatever the repo’s daily work requires — and you add them all at project scope. The resulting .mcp.json looks like this:

    {
      "mcpServers": {
        "postgres": {
          "command": "npx",
          "args": ["-y", "@some/postgres-mcp-server", "postgresql://localhost/devdb"]
        },
        "linear": {
          "type": "http",
          "url": "https://mcp.linear.app/mcp"
        }
      }
    }

    Commit it. Now a new hire clones the repo, opens Claude Code, and the agent already knows how to query the dev database and read tickets — no setup doc, no Slack thread asking “wait, how do I connect the database again.” The repo carries its own integration surface.

    One safety detail worth knowing: when Claude Code encounters project-scoped servers from a .mcp.json it didn’t write, it asks you to approve them before they run. That prompt exists because a committed config file is, technically, code other people can put on your machine. Read what you’re approving — the same way you’d read a package.json script before running it.

    Where this bites people

    Three failure modes I see repeatedly. First: adding a server at local scope, then wondering why it vanished on a different machine — local is path-and-machine specific, that’s the design. Second: putting a secret directly into .mcp.json and committing it to a public repo. Don’t. Reference an environment variable in the config and keep the actual token out of git. Third: piling everything into user scope so every project loads servers it doesn’t need, which bloats the context the agent has to reason over and slows routing when you have many tools connected.

    The cost angle, since it’s a fair question: scoping itself costs nothing. But every connected MCP server adds its tool definitions to the model’s context on each turn. With Sonnet 4.6 as the workhorse model, a lean per-project tool set is faster and cheaper than a kitchen-sink user-scope config you never pruned. Scope discipline is, indirectly, token discipline.

    The rule that replaces all of this

    Before you run claude mcp add, finish this sentence: “This server should be available to ___.” If the answer is “just me, just here” — local. If it’s “anyone working in this repo” — project, commit the file. If it’s “me, in everything I do” — user. The flag follows from the sentence. Get that habit, and the entire class of “works on my machine” MCP bugs disappears from your team’s life.

  • Claude Code vs Cursor in 2026: Token Efficiency, Agent Teams, and What I Actually Run

    Claude Code vs Cursor in 2026: Token Efficiency, Agent Teams, and What I Actually Run

    I’ve been running both Claude Code and Cursor on the same codebases for the last eight months. Not as a reviewer — as someone who has to actually ship features in both tools and watch the credit meter tick. Here is what the comparison actually looks like in May 2026, after Cursor’s credit overhaul, after Claude Opus 4.7, and after Claude Code’s agent teams went GA.

    The Real Pricing Picture

    The headline subscription numbers are nearly identical: Claude Pro at $20/month, Cursor Pro at $20/month. That’s where the similarity ends.

    Cursor’s Pro tier in 2026 ships with unlimited “Auto” mode requests plus a $20 credit pool for premium models. Pro+ is $60/month with roughly 3x credits and background agents. Ultra is $200/month at 20x usage. Hobby is still free with limited requests. Teams is $40/user/month.

    Claude Code on the Pro plan gets you Sonnet-tier usage with quota limits. Max at $100/month unlocks Opus access and 5x the usage envelope. The team plan for Claude Code is where the real spread shows: Anthropic’s team pricing on Claude Code lands materially higher than Cursor Teams for a comparable seat count. If you’re a 10-person team buying the most generous tier of each, you’re looking at roughly 3x more for Claude Code.

    For solo developers, the cost is a wash at the entry tier. The decision is not about money — it’s about how each tool burns tokens.

    Token Efficiency Is the Hidden Variable

    This is the number I wish I had known a year ago: independent benchmarking through 2026 has Claude Code using roughly 5.5x fewer tokens than Cursor on identical tasks. Not 5.5% — five and a half times fewer.

    The why matters. Cursor’s agent loop tends to re-read files, re-include context, and verify intermediate steps by stuffing prior turns back into the prompt. Claude Code’s CLI architecture leans on a tighter context budget by default, and on Opus 4.7 the model itself is doing more work per token. When you’re paying by credit (Cursor) and your power-user-hours start adding up, that ratio is the difference between a $60 month and a $200 month.

    The honest counterpoint: Cursor’s median completion time on simple, single-file edits is roughly 12% faster than Claude Code. If you live in the find-and-fix-a-typo loop, Cursor’s IDE integration genuinely wins.

    Where Claude Code Wins

    The 1M token context window is now generally available on Claude Opus 4.6, Opus 4.7, and Sonnet 4.6, at standard per-token pricing with no long-context surcharge. A 900,000-token request costs the same per-token rate as a 9,000-token one. For codebases that need to be understood holistically — monorepos, large migrations, anything where “ctrl-F across 200 files” is part of the problem — this is the single most consequential capability difference in 2026.

    Agent teams went past experimental in 2026 with Claude Code v2.1.32 and the CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 flag. The team-lead pattern — one Claude session coordinating teammates who can message each other, share a task list with dependencies, and lock files — is a genuinely different primitive than Cursor’s background agents. The cost is real: agent teams use approximately 7x the tokens of a single session in plan mode. The benefit is also real: the work that previously needed a human program manager now runs unattended.

    On full-feature implementation tasks — the kind where a benchmark measures end-to-end PR shipment, not single edits — Claude Code was roughly 18% faster on median wall-clock time. Opus 4.7 specifically lifted resolution on a 93-task coding benchmark by 13% over Opus 4.6, including four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve.

    Where Cursor Wins

    The editor. This is not a small thing. Cursor is still a VS Code fork that evolved into an agent workbench. The integrated diff view, the multi-file edit preview, the in-line ghost text completions, the model picker in the corner — none of that exists in Claude Code, which lives in a terminal pane. If you have a strong opinion about your IDE and you want AI features welded inside it, Cursor is the answer.

    Cloud agents on Cursor Pro and above run AI tasks in isolated cloud VMs with no access to your local machine. The use case — fire off a refactor and walk away from your laptop — is well-served. The catch: background agents always use MAX mode, which adds a 20% surcharge on credit cost, and a single agent run on a 50,000-line codebase can consume around 22.5% of a Pro plan’s monthly credits. One bad day of agent runs eats your month.

    Model variety is also a Cursor advantage. You can route a task to a non-Anthropic model when the situation calls for it. Claude Code is Claude all the way down.

    What I Actually Run

    Both. For $40/month at the Pro tier on each, I get the most powerful AI coding setup available in 2026. Claude Code handles the long-context architectural work, the cross-cutting refactors, the agent-team orchestration where one Claude is doing program management and three teammates are touching different services. Cursor handles the IDE work — the small-bore edits, the in-line completions, the moments where I want to see a diff hover above the line I just changed.

    If forced to pick one, the answer depends on the work. Heavy backend, large codebases, multi-agent workflows: Claude Code. UI-heavy, single-file iteration, “I just want my editor to be smarter”: Cursor.

    The Honest Limitation

    Claude Code on a team plan is genuinely expensive at scale. A 10-person team running Claude Code at the team-equivalent tier is roughly 3x the Cursor Teams equivalent. If you’re cost-sensitive at headcount, that math may decide the question regardless of capability. The token-efficiency advantage helps Claude Code claw back some of that on per-task economics, but the subscription line item is the line item.

    The other honest limitation: model versions move fast. As of May 26, 2026, the current Anthropic lineup is Claude Opus 4.7 (flagship), Claude Sonnet 4.6 (workhorse), and Claude Haiku 4.5. Any comparison written more than a quarter ago is already partially wrong on the model column. Read pricing pages, not blog posts, when you’re committing budget.

    The Bottom Line

    Cursor wins on editor experience, model variety, and team subscription cost. Claude Code wins on token efficiency, context window economics, agent-team primitives, and Opus 4.7’s raw coding capability on hard tasks. If you’re optimizing for one tool, pick the one that matches the bulk of your work. If you can afford $40/month, run both — and pay attention to which one you actually open first in the morning. That’s your real answer.

  • The Worktree Workflow: How Many Parallel Claude Code Sessions Are Actually Worth Running

    The Worktree Workflow: How Many Parallel Claude Code Sessions Are Actually Worth Running

    There is a moment, usually around your third Claude Code session, when the fantasy of parallel agents collides with the reality of a single human attention span. You opened a second terminal because Claude was thinking and you had nothing to do. You opened a third because the second one was thinking too. Now there are three streams of edits happening across three branches, and the question is no longer “can I run parallel sessions” — it is “did I just spend the next forty minutes reviewing diffs I could have written myself in twenty.”

    Git worktrees plus claude --worktree make multi-session work physically safe. They do not make it economical. The pattern below is the one I have settled on after running everything from two-up to a foolish eight-up. The short version: two is the default, four is the ceiling, and the ceiling is set by review bandwidth, not by Claude.

    What worktrees actually fix

    A git worktree is a second working directory attached to the same repository — its own branch, its own files on disk, sharing history and remote with the main checkout. Without them, running two Claude Code sessions against one clone is asking for a merge conflict you will not see coming. One session edits src/auth.ts while the other is mid-refactor of the same file; the second write wins; your tests pass; the first session’s logic is gone.

    The claude --worktree my-feature flag creates the worktree for you under .claude/worktrees/, checks out a new branch, and scopes the entire session to that directory. Edits in one -w session cannot touch files in another. That is the entire safety guarantee, and it is the only one you need to run multiple sessions in good conscience.

    The catch nobody mentions in the launch posts: a worktree is a fresh checkout. Your .env, your .env.local, your node_modules, your virtualenv — none of it carries over. The first session in a new worktree spends ten minutes failing in confusing ways because the environment is empty.

    The fix is a .worktreeinclude file at the project root, gitignore-syntax. Files that match a pattern and are gitignored are copied into the new worktree at creation. Tracked files are never duplicated. A two-line .worktreeinclude with .env* and .env.local is usually enough to get past the first failure mode. Dependencies are a separate problem — most teams either symlink node_modules or run install fresh per worktree depending on how strict their lockfile discipline is.

    The two-session pattern

    This is the default and the one that pays its setup cost on the first week. The shape:

    • Foreground session. What you are actively working on. You read every diff. You answer questions. You /plan the multi-file changes.
    • Background session, separate worktree. A bounded task you can describe in one paragraph and verify in one diff. Documentation update, refactor of a single module, dependency bump and test run, generated-API-client rebuild. You start it, switch back to the foreground, and check on it when it finishes.

    The economics are clean. The background task would have cost you context-switching time anyway — opening it, loading the problem into your head, doing the work, putting it down. Instead you describe it once, let Claude run, and review one diff at the end. Two sessions, one human, real time saved.

    The discipline that makes this work: the background task is always something you would be comfortable letting a junior engineer commit without a meeting. If you would not, it is not a background task — it belongs in the foreground.

    The four-session ceiling

    Above two, the gains compress fast. Three is fine if the third session is something near-trivial — a script, a one-off data migration, a README pass. Four is the practical ceiling and only on days where the work decomposes that cleanly.

    The reason is review, not Claude. Each session produces a diff. Diffs are not free to read. A senior engineer in flow reads code at maybe two hundred lines per minute with comprehension; a five-hundred-line diff from a Claude session costs at least two minutes of focused attention, often five if the change is subtle. Four sessions producing four diffs in the same ten-minute window means twenty minutes of review queued up — and you cannot review them in parallel.

    The pattern that breaks first is the one where you stop reading the diffs carefully because there are too many of them. That is the failure mode worth naming. The point of running parallel sessions is to compress wall-clock time. The point is not to compress review time, because review time is the part that actually catches the bug.

    If you find yourself merging diffs you have only skimmed, you are running too many sessions. Drop back to two.

    When the pattern earns its keep

    Two scenarios where multi-session worktrees clearly beat the single-session default:

    The refactor-and-feature split. You are in the middle of building a feature. Halfway through you notice the underlying module needs a refactor before the feature can be finished cleanly. In the single-session model you stop the feature, do the refactor, and restart the feature with the refactored module in your head. In the worktree model you fork the refactor into its own worktree, keep the feature work going in the main worktree against the unrefactored code, and rebase the feature onto the refactor once it lands. You do not lose your place on the feature work.

    The long-tail cleanup pass. A list of twelve small chores nobody wants to do in series: dependency updates, doc fixes, lint cleanups, deprecated-API migrations. Worktree per chore, three at a time, Stop hooks running the test suite, you reviewing as each finishes. The single-session alternative is a forty-five-minute slog. The parallel version is fifteen minutes of dispatch and review.

    The scenario where it does not earn its keep: novel design work where the right answer requires you to hold the whole problem in your head. Splitting attention across two unfamiliar design problems means doing both of them worse than you would have done either of them alone.

    The setup that makes it usable

    If you have not done this before, this is the order:

    First, write a .worktreeinclude with at minimum .env* and any other untracked config your project needs. Test it by running claude --worktree test-include and verifying your env loads. Delete the worktree with git worktree remove .claude/worktrees/test-include once verified.

    Second, add a WorktreeCreate hook if your project has any setup beyond env files — installing dependencies, running migrations, building a generated client. The hook fires when claude --worktree is invoked, before the session starts, so any setup you script runs against a clean checkout every time. The hook output prints the worktree path on stdout and Claude opens there.

    Third, establish a worktree naming convention before you have ten of them sitting around. feature/auth-rewrite, chore/dep-bump-react, fix/oauth-callback — anything that tells you in a year what the worktree was for. The default .claude/worktrees/ directory fills up faster than you expect.

    Fourth, set a personal ceiling. Mine is four. Yours might be three. The ceiling is whatever number of diffs you can review carefully in the time the sessions take to run. Write that number down somewhere you will see it when you are about to open a fifth terminal.

    What this pattern is not

    It is not parallelization in the engineering sense. The sessions are not coordinating. They are independent. Tasks with real dependencies — “refactor module A, then build feature B against it” — belong in one session, sequenced, with a /plan step in the middle. Splitting them across worktrees just means you spend extra time rebasing.

    It is also not a productivity hack in the autonomous-agent sense. Claude Code’s subagents and skills are the right tools when you want the same session to delegate work to context-isolated children. Worktrees are for when you want to run independent sessions in parallel because the tasks are independent and you can review the outputs separately. Different layer, different problem.

    The worktree pattern works because git worktrees were already the right primitive for parallel feature work before Claude Code existed. claude --worktree is the convenience flag that makes the primitive cheap. The discipline — two by default, four at the ceiling, never more diffs than you will actually read — is the part that turns convenience into useful workflow.

  • Claude Code Server-Managed Settings: The Admin Console Push That Replaces Your MDM Pipeline

    Claude Code Server-Managed Settings: The Admin Console Push That Replaces Your MDM Pipeline

    Last week I argued that if you have more than a handful of engineers on Claude Code, repo-level .claude/settings.json is not enough — you need managed-settings.json deployed through MDM. That is still true. What changed in 2026 is that you no longer need an MDM team to roll it out.

    Claude Code now supports server-managed settings: a remote configuration tier pushed from the Claude.ai admin console, with no file on disk and no MDM involvement. If you are on the Team plan running Claude Code 2.1.38+ or the Enterprise plan running 2.1.30+, this is available to you today, and most platform teams I talk to are still treating MDM-deployed managed-settings.json as the only option.

    It is not. And the precedence rules matter.

    The New Top of the Settings Hierarchy

    Claude Code’s settings stack already had a clear order — repo > user > project > local — with managed settings sitting on top of all of them as the unoverridable tier. Server-managed settings now sit at the same top tier alongside MDM and the on-disk managed-settings.json file. Within that managed tier, the documented precedence is:

    1. Server-managed settings (admin console push)
    2. MDM / OS-level policies (Jamf, Kandji, Group Policy, Intune)
    3. managed-settings.json on disk (the file we deployed last week)
    4. HKCU registry (Windows)

    Server-managed wins. If you push a policy from the admin console that conflicts with a fleet managed-settings.json deployed by MDM, the server policy applies. That is the entire point.

    What This Actually Replaces

    For organizations without a mature endpoint management pipeline — which is most companies smaller than a couple hundred engineers — the old path looked like this: get IT to package a JSON file, push it through Jamf or Group Policy, verify on a pilot machine, then deploy fleet-wide. Two-week ticket minimum.

    Server-managed settings collapse that to: log into the admin console, write the policy in the UI, save. Claude Code clients fetch the new policy at startup and re-poll hourly during active sessions. No reboot. No reinstall. No ticket.

    This is a real change in posture. The friction that kept smaller teams from deploying any managed policy at all just dropped to near zero.

    The Approval Gate Most Teams Will Hit

    Server-managed settings have one behavior MDM-deployed settings do not: certain categories require explicit user approval before they apply on a given machine. The current list per the docs:

    • Shell command settings (custom commands surfaced to the model)
    • Custom environment variables (anything injected into the model’s process env)
    • Hook configurations (pre/post-tool-use hooks)

    These three need the user to click through an approval prompt the first time the new policy hits their client. Deny rules in permissions.deny, the audit log path, telemetry settings, default model — those apply silently.

    The reasoning here is sound: a malicious admin (or a compromised admin account) could otherwise inject a hook that exfiltrates every prompt or a shell command that pipes diffs to an external endpoint. Approval gating those three categories means a developer at least sees the change before it takes effect. It also means your “push the new hook policy fleet-wide” plan has a manual confirmation step you cannot skip.

    If you need silent enforcement of hooks or shell commands, MDM-deployed managed-settings.json still does that without the prompt. Use the right tool for the right setting.

    What Belongs on the Server, What Belongs in MDM

    After running both for two weeks across a small fleet, the split that has held up:

    Push from the admin console:

    • permissions.deny rules that should be hot-updatable when a new exfil vector is discovered
    • Default model pinning (when you want to change it without re-deploying)
    • Telemetry and audit log endpoints
    • Anything you want to A/B across user groups (more on this in a second)

    Keep in MDM managed-settings.json:

    • Hook configurations you need to enforce silently
    • Shell command allowlists that must apply before first launch
    • Anything that needs to survive the user being signed out of their org account

    The reason for the second list is that server-managed settings only apply once the user authenticates with org credentials. A fresh laptop with a developer running claude before signing in gets no server policy. MDM-deployed settings apply from the first invocation.

    Group-Targeted Policies Are the Sleeper Feature

    Anthropic added user groups to the admin console earlier in 2026. Groups can be created manually or synced from an IdP via SCIM, and each group can be assigned a custom role plus its own spend limit. The piece most teams have not connected yet: server-managed settings respect group membership.

    This means you can push one permissions.deny policy to the “Security” group and a different one to the “Platform” group without writing two separate managed-settings.json files and pushing them through MDM with different scoping. Write two policies in the console, assign to groups, done. Group membership changes via SCIM propagate within the hour-long polling window.

    For a 200-engineer org that previously needed Jamf smart groups + MDM JSON variants to do the same thing, this is significant.

    Verification Workflow

    The same verification workflow from the MDM-deployed setup still applies, with one addition:

    1. Push the policy in the admin console
    2. On a test machine, run claude config list — server-managed settings should appear flagged as such
    3. Attempt a denied action, confirm immediate block
    4. If hooks or shell commands are in the policy, walk through the approval prompt
    5. Sign the test user out, sign back in, confirm policy reapplies

    The sign-out test matters because that is where server-managed differs most from on-disk managed settings — the policy is bound to the org-authenticated session, not the machine.

    Model Versions for Org-Wide Pinning

    If you pin a default model via server-managed settings, the current strings are: claude-opus-4-7 (flagship), claude-sonnet-4-6 (workhorse), and claude-haiku-4-5-20251001 (fast). Verify against the live model list at docs.anthropic.com/en/docs/about-claude/models before deploying — model strings change frequently and pinning to a deprecated one will silently break agent runs.

    Where Server-Managed Settings Lose

    Three real limitations:

    1. No silent hook/shell-command enforcement. User approval is mandatory for those three categories.
    2. No effect before org auth. Pre-auth sessions ignore server policy entirely.
    3. No fine-grained rollback. Console changes apply globally within the hour. There is no canary group, no staged rollout percentage, no “apply to 10% of fleet for 24 hours” toggle. If you push a bad deny rule, every active session picks it up at next poll.

    Mitigate the third one by maintaining a single non-production test group that you deploy to first, wait 90 minutes, then promote the policy to broader groups. It is a manual canary, but it is the canary you have.

    The 20-Minute Rollout for a Team Already on Team Plan v2.1.38+

    1. Open the admin console at claude.ai → Settings → Claude Code policies
    2. Write a minimum-viable policy: deny curl, wget, rm -rf /, .env reads, credential files
    3. Assign to a single test group (one user)
    4. On that user’s machine, run claude config list — confirm the server policy appears
    5. Try three denied actions, confirm all blocked
    6. Expand assignment to one team
    7. Wait 24 hours, watch for tickets
    8. Roll org-wide

    The whole sequence takes longer than it runs because of the wait windows, not because of the work. The actual work is twenty minutes.

    Why This Article Exists

    The MDM-deployed managed-settings.json approach from last week is still the right answer for orgs that need silent, pre-auth policy enforcement. For everyone else — which is most teams adopting Claude Code in 2026 — server-managed settings are the easier path and most platform teams I talk to do not know they exist yet. Admin console push, no on-disk file, no MDM dependency, group-scoped via SCIM. If you are on a recent Team or Enterprise plan, this is the deployment posture you actually want.

    Sources

    • docs.anthropic.com/en/docs/about-claude/models (model version strings)
    • code.claude.com/docs/en/server-managed-settings (server-managed settings docs)
    • code.claude.com/docs/en/admin-setup (admin setup reference)
    • support.claude.com/en/articles/11845131-use-claude-code-with-your-team-or-enterprise-plan (Team/Enterprise Claude Code usage)
    • support.claude.com/en/articles/13799932-manage-groups-and-group-spend-limits-on-enterprise-plans (group management + spend limits)
    • support.claude.com/en/articles/13133195-set-up-jit-or-scim-provisioning (SCIM provisioning)
    • claude.com/product/claude-code/enterprise (Enterprise plan overview)
    • anthropic.com/news/claude-code-on-team-and-enterprise (admin controls launch)

  • What Actually Drives Claude Code Adoption: Inside a 30-Engineer Rollout That Held 35% at Month Four

    What Actually Drives Claude Code Adoption: Inside a 30-Engineer Rollout That Held 35% at Month Four

    If you want to understand why some Claude Code rollouts compound and others quietly stall, stop looking at license telemetry and start looking at one artifact: the skill library. Every public 2026 case study with sustained productivity gains has the same shape — a committed skill kit, tight CLAUDE.md files, a handful of hooks, and a Friday retro cadence the team actually keeps. Teams that buy seats and skip the artifacts get install-only adoption and a dashboard that reads flat for a quarter.

    The 30-engineer case that landed at 35% productivity lift

    The cleanest recent case study comes from a Digital Applied write-up published May 15, 2026 — an anonymized composite tracking a Series-B SaaS shop with thirty engineers across six squads on a Node/TypeScript monorepo. The team had Claude Code seats for the better part of a year before the engagement started. Roughly half the engineers used the CLI weekly. Zero shared skills, no committed project settings, no hooks, two squads with no project memory at all.

    The day-zero audit on a 50-point scorecard came in at 19/50. Ninety days later it hit 41/50 — a 22-point shift from late Stage 1 to mid-Stage 3. The headline number reported to leadership: a sustained 35% productivity lift, engagement-weighted, that held flat into month four.

    The shipped artifacts behind that number:

    • 22 shared skills, with authorship spread across 9 engineers
    • 11 wired hooks across three archetypes (notification, audit, gate)
    • 3 custom subagents — code-reviewer, ticket-triager, release-notes-writer
    • CLAUDE.md files pruned and held under 400 lines per repo

    The most-invoked skill was commit, accounting for roughly a third of all invocations by month four. That kind of skew is normal in a mature library and tells you which workflow is actually being changed by the rollout.

    Why CLAUDE.md hygiene predicts depth

    The single most actionable lesson from the case study is mechanical: cap CLAUDE.md at 400 lines and enforce it in PR review. Two squads in the engagement drifted past 800 lines in sprint two. Their skill-invocation rate ran roughly 40% lower than the four squads that held the line.

    The hypothesized mechanism, validated in two follow-up retros: bloated memory causes the model to skim the file rather than internalize it, which produces more generic responses, which makes engineers reach for the tool less often, which drops invocation rates further. The cycle is self-reinforcing in either direction. When the team ran a month-four prune that cut the average CLAUDE.md from 520 to 340 lines, skill-invocation rate rose 12% across the team in the following two weeks.

    The discipline: long-form content moves to .claude/docs/ as sub-docs with one-line summaries and links in the main file. The main file stays orientation-shaped — who the team is, what the repo does, where to look for the rest.

    The productivity panel mistake every team makes first

    Version one of this team’s productivity panel was wrong, and that wrongness taught the rollout more than any single milestone after it. The first panel tracked the metrics license telemetry already covered: total sessions opened per week, total tokens, average session length. It read flat for six weeks while the underlying capability of the team was visibly shifting in retros and PRs.

    Version two, rebuilt in week eight, weighted around engagement signals:

    • Skill invocations split by skill
    • Subagent runs per week
    • Time-to-first-meaningful-output for new contributors
    • Audit-score deltas from the quarterly 50-point scorecard
    • PR-to-merge time on Claude-Code-assisted PRs versus baseline

    By month four the panel showed roughly 410 skill invocations per week, 85 subagent runs per week, new-hire time-to-first-meaningful-output at -45% versus baseline, and PR-to-merge time -18% versus baseline. The 35% headline was an engagement-weighted composite of those signals, not a single measurement — and the team was careful never to frame it as “engineers ship 35% more code,” because that framing invites a debate the panel cannot win.

    How this case lines up with the rest of the 2026 cohort

    The Digital Applied 30-dev case is not an outlier. A companion case study from the same firm, dated May 13, 2026, covers a 100-developer engineering organization that sustained a 28% productivity lift with a 32-entry skill library over six months. That team ran Claude Code and Cursor side-by-side: Claude Code as the terminal/CLI surface for refactors, multi-file edits, codebase navigation, and review automation; Cursor as the in-editor surface for line-level completion and inline review.

    The pattern that replicates across both engagements is the cadence, not the contents. Three ninety-day sprints — install, leverage, governance — plus an explicit sustain phase that starts at day 90 with the same owner and the same Friday retro cadence as the active sprints. Treating days 91+ as a vague quarterly review is the most common reason adoption drifts back to install-only inside two quarters.

    What to actually do on Monday

    If you have Claude Code seats and want a rollout that compounds instead of stalls, the operational order matters more than the contents of your skill library:

    1. Run the day-zero audit and write down the score. The 50-point rubric Digital Applied published is a defensible starting point; any scorecard that distinguishes install from artifacts from governance will do. The number is what makes the case for the engagement internally.
    2. Name the rollout lead and carve 20-30% of their week. Less than that and the calendar slips. The role shape is enough seniority to enforce milestone discipline, enough engineering depth to write skills and hooks rather than just steward them, and enough calendar discipline to keep the cadence intact when product pushes back.
    3. Calendar the four phase-end retros and the month-four review before sprint one opens. Friday retros are thirty minutes per squad per week — the cheapest part of the rollout and the most often skipped. The friction they catch in week three compounds silently for the rest of the sprint if you don’t.
    4. Build the productivity panel deliberately badly in sprint two and rebuild it in sprint three. The version-two rebuild is structural, not incremental. Trying to ship the right panel on the first try usually delays the cadence rather than improving the signals.
    5. Cap CLAUDE.md at 400 lines and enforce it in PR. This is the single highest-ROI hygiene rule in the engagement and the one teams skip most often because completeness feels safer than discipline.

    The honest framing: a single-quarter Claude Code rollout takes you from Stage 1 to mid-Stage 3 on a defensible scorecard. Stage 4 — the optimized end-state with deeper subagent governance, a security cadence that catches drift, and a productivity panel that has been iterated against a full quarter of data — is a second-quarter project. The teams that get there are the ones whose sustain phase looks identical to the sprints that preceded it. The teams that drift are the ones whose Friday retro disappeared sometime around month two.

    Model versions referenced throughout this piece reflect Anthropic’s current lineup as of May 2026: claude-opus-4-7 (flagship), claude-sonnet-4-6 (workhorse), and claude-haiku-4-5-20251001 (fast). If you are reading this six weeks from now, check the model docs before you copy any string into a config.

  • Installing Claude Code on Windows in 2026: The Native Installer Walkthrough That Actually Works

    Installing Claude Code on Windows in 2026: The Native Installer Walkthrough That Actually Works

    If you have spent any time in the Claude Code subreddit or the GitHub issues tracker in the last six months, you have seen the same Windows install problem cycle through every week. Someone runs the install command, the installer prints “successfully installed,” and then claude --version returns “is not recognized as the name of a cmdlet.” Then come the suggestions: switch to Git Bash, switch to WSL2, reinstall Node, blow away npm. Half of them are wrong for the current installer. This guide is the one I wish existed when I set up Claude Code on a fresh Windows 11 machine this month.

    What changed in 2026: the native installer is now the default

    Anthropic shipped a native installer in 2025 that removed the Node.js dependency entirely. As of May 2026 it is the recommended path on every platform, and npm install of @anthropic-ai/claude-code is still supported but is no longer the primary method Anthropic tests and updates. The native installer downloads a single binary, drops it in ~/.local/bin, registers it on your PATH, and auto-updates in the background.

    What this means in practice on Windows: you do not need Node, you do not need npm, and you do not need WSL2 unless you specifically want a Linux toolchain. PowerShell on Windows 10 or 11 (64-bit) is enough.

    The two commands that actually work

    Open Windows PowerShell — not the x86 version, not Git Bash, not Command Prompt. The x86 entry runs as a 32-bit process and will fail on a 64-bit machine. Git Bash does not support the TTY features Claude Code’s interactive CLI needs, so you will hit the “Raw mode is not supported” error before you finish authenticating.

    Then run:

    irm https://claude.ai/install.ps1 | iex

    That is the entire install. irm is Invoke-RestMethod, iex is Invoke-Expression, and the script handles the binary download, PATH update, and shell hooks. When it finishes, close the terminal and open a new PowerShell window. This is the step everyone skips. The PATH change applies to new shells only — your current session still has the old PATH and will not find the binary.

    In the new window:

    claude --version

    You should see a version string. Then run claude with no arguments from any project directory. The CLI opens your default browser, asks you to sign in to your Anthropic account, and authorizes the local install. Setup, end to end, is under five minutes on a clean machine.

    You need a paid account — the free tier does not include Claude Code

    This catches new users every week. The free Claude.ai plan gets you chat on web, iOS, Android, and desktop. It does not get you Claude Code. To use the terminal CLI you need one of:

    A Pro subscription at $20 per month (or $17 per month billed annually). A Max 5x subscription at $100 per month. A Max 20x subscription at $200 per month. A Team Premium seat at $100 per seat per month annual or $125 monthly, minimum five seats. Or API credits — new API accounts get a small free credit pool to test with, but you are billed per token from there.

    Pro and Max draw from the same token budget as your regular Claude chat usage. The Pro window is roughly 44,000 tokens per five-hour rolling window, which third-party tracking puts at 10 to 40 prompts depending on codebase complexity. Max 5x and 20x scale that linearly. If you are evaluating whether to upgrade, the Pro window will tell you within a week — you either hit the cap during real work or you do not.

    The five errors you will hit, and what fixes them

    “claude is not recognized as the name of a cmdlet.” Your PATH was not updated, or you did not open a new terminal. First, close PowerShell and reopen. If the error persists, the install location exists but your user PATH does not reference it. Run this in PowerShell:

    $currentPath = [Environment]::GetEnvironmentVariable('PATH', 'User')
    [Environment]::SetEnvironmentVariable('PATH', "$currentPath;$env:USERPROFILE\.local\bin", 'User')

    Close the terminal again, open a new one, and claude --version should work.

    “Raw mode is not supported.” You are running Claude Code inside Git Bash. Git Bash does not provide the TTY interface the CLI needs. Switch to Windows PowerShell. Everything you would do in Git Bash you can do in PowerShell; you just need to use Windows path syntax inside the prompt.

    Microsoft Store popup interrupts installation. A popup saying “Get an app to open this ‘claude’ link” sometimes appears during the install on Windows 11. This is a known issue tracked in Anthropic’s GitHub. Dismiss the popup, then re-run the install command. If it persists, install Git for Windows first — the installer registers a couple of URL handlers that resolve the popup.

    Duplicate npm and native installs. If you previously installed via npm and later ran the native installer, you have two binaries on PATH. The native one wins on some shells and the npm one wins on others, which produces confusing version mismatches. Remove the npm install:

    npm uninstall -g @anthropic-ai/claude-code

    Then verify with where.exe claude in PowerShell. Only one path should come back.

    “Invalid code” during OAuth. The browser-based login generates a one-time code that you paste back into the terminal. The code expires fast and is sensitive to copy-paste truncation. Press Enter to retry, complete the browser flow, and paste the code immediately — do not let it sit in your clipboard while you check email.

    What to do in the first session

    Once claude --version returns and the OAuth flow completes, run claude from inside a real project directory — not a fresh empty folder. Claude Code reads context from the surrounding repo, and the first thing it does in a useful session is index files and look for a .clauderules or CLAUDE.md. If you start in an empty directory the first interaction feels useless because there is nothing to ground the model on.

    If you want to lock to a specific model rather than the default, the current strings as of May 2026 are claude-opus-4-7 for the flagship, claude-sonnet-4-6 for the workhorse, and claude-haiku-4-5-20251001 for the fast tier. Sonnet 4.6 is what you want for almost all coding work — it is 30 to 50 percent faster than Sonnet 4.5 and ships with a 1M context window. Reserve Opus 4.7 for the hardest agentic refactors; it eats tokens noticeably faster.

    The setup is not the hard part

    Most of the Windows pain in the Claude Code ecosystem comes from people following install guides written for the npm-era CLI, then layering troubleshooting from the WSL2-era guides on top of that, then asking why nothing works. The current path is one PowerShell command, a new terminal, and a browser login. If you hit one of the five errors above, the fix is short. If you hit something else, the troubleshooting docs at code.claude.com cover it — most novel issues turn out to be PATH or shell-choice problems in a slightly different costume.

    The next thing to figure out is not installation. It is whether your Pro window survives a real week of work, and whether your team needs Premium seats. That math is what determines the actual cost of Claude Code on Windows — not whether the binary runs.

  • When I Stopped Being the Bottleneck

    When I Stopped Being the Bottleneck

    For a long time, everything ran through me.

    Every decision, every deliverable, every edge case that didn’t fit the template. I was the person who knew where everything was and why it worked the way it did. Clients called me. Problems waited for me. The operation was fast when I was available and stuck when I wasn’t.

    I told myself this was just what running a lean operation looked like. That being indispensable was the same thing as being valuable. That the bottleneck was evidence of how much I mattered.

    It took me longer than I’d like to admit to understand that those aren’t the same thing at all.


    The shift didn’t happen because I hired more people or built a more sophisticated system. It happened because I started writing things down differently.

    Not the what — I’d always documented the what. What the process was. What the deliverable looked like. What the client expected.

    The change was writing down the why.

    Why is this built this way. Why did I make this trade-off. Why does this rule exist and what would have to be true for it to change. The reasoning that lives in my head during a decision but never makes it into the documentation because by the time the decision is made, the reasoning feels obvious and I’ve already moved on.

    That reasoning — the why, the context, the judgment — is exactly what’s missing when someone else tries to run something you built. They can follow the steps. They can’t follow the thinking. And the thinking is most of what they actually need.


    I had a client engagement once where the real work wasn’t the content or the SEO or any of the visible deliverables. The real work was extraction — pulling out everything the founder knew about his industry and making it queryable.

    He had thirty years of pattern recognition in his head. He knew, from a thirty-second conversation, whether a prospective client was going to be a nightmare. He knew which product lines had margin left to squeeze and which ones were already at ceiling. He knew the right answer to questions his team asked him forty times a week.

    But none of it was written down. It lived in him, and because it lived in him, every decision that touched that knowledge had to touch him first. He was the bottleneck in his own business, not because he was bad at delegating, but because there was nothing to delegate to. The judgment wasn’t portable.

    We spent three months making it portable.


    I’ve been doing the same thing for myself.

    The Notion workspace I run on isn’t just a project management tool. It’s an attempt to externalize the reasoning that would otherwise die with the session — the doctrine pages that explain why the operation is structured the way it is, the decision logs that capture what I considered before choosing, the second brain that holds the context I’d otherwise have to rebuild from scratch every time.

    It’s slow work. It runs against the instinct to just move. Documentation always feels like it’s competing with execution, and execution is what pays the bills today.

    But the compound effect is real and I’ve felt it. Questions I would have had to think through from scratch six months ago have written answers now. New automations start from an existing base of explained decisions rather than a blank page. When something breaks, the fix is findable because the original thinking is findable.

    More than that: I’ve noticed that the act of writing down why I’m doing something makes me smarter about whether I should be doing it. A decision you can’t explain clearly enough to document is often a decision you haven’t thought through clearly enough to make well.


    The version of me from three years ago would be confused by how I work now.

    Then, I was the point of contact for everything. Clients called when there was a problem. I held the answers in my head and dispensed them on demand. The business ran because I ran it, continuously, in real time.

    Now, most of what the operation does, it does without me. Workers run on schedules I set. Content moves through pipelines I designed. Decisions I’ve already made a hundred times get made automatically against rubrics I wrote once.

    I show up for the things that genuinely need me — strategy, relationships, the judgment calls that don’t fit any pattern I’ve encountered before. Everything else runs.

    The thing I had to let go of to get here was the idea that being needed for everything was the same as being important. It isn’t. Being needed for everything is exhausting and fragile and it doesn’t scale. Being needed for the right things — the hard things, the high-leverage things, the things only you can actually do — that’s something different.


    I don’t think of myself as having solved this. The work of making a one-person operation less dependent on one person is ongoing and probably never finished.

    But there’s a version of it that’s better than the version where everything runs through you and breaks when you’re not there.

    The path to that version isn’t more people or fancier tools. It’s the slow, unglamorous work of writing down why. Making the thinking portable. Building a system that holds the reasoning, not just the steps.

    The bottleneck doesn’t go away. It just stops being you.

  • The Trust Gap

    The Trust Gap

    Here’s the moment I’m talking about.

    The agent finishes. The output is sitting there. It looks right — it usually looks right — and now you have to decide whether you’re going to use it or check it first.

    That moment, that pause, is the trust gap. And if you’re running AI at any real volume, it’s the thing that’s quietly eating your time, your confidence, and sometimes your credibility.


    Most people handle it badly. I did too, for a while.

    The two failure modes are mirror images of each other. The first is reviewing everything — reading every output, checking every claim, treating the agent like an intern you don’t trust yet. This works. It catches errors. It also means the agent isn’t actually saving you time. You’ve moved the work from doing to checking, which is a trade-off that only makes sense at low volume or when the stakes are very high.

    The second failure mode is trusting everything — shipping what the agent produces without a meaningful review layer, because you’re busy and it usually looks right and you can fix things later. This also works, until it doesn’t. Bad output compounds quietly. A wrong fact in an article becomes a wrong fact that got cited. A misformatted record becomes a database full of exceptions you have to clean manually. By the time you notice, the problem is bigger than the original task.

    The thing both failure modes have in common is that they’re reactions to the trust gap rather than designs for closing it.


    The design question is different from the reaction question.

    The reaction question is: how much should I check this particular output right now?

    The design question is: what is the system that makes agent output trustworthy enough that I can scale it?

    I spent a long time asking the wrong question.


    What changed for me was thinking about trust as something that gets earned over time, not assessed in the moment.

    The system I ended up with has a name — the Promotion Ledger — and it tracks every autonomous behavior by tier. Tier A behaviors are things I always approve before they ship. Tier B behaviors are things I prepare but decide on. Tier C behaviors run on their own without me touching them.

    Nothing starts at Tier C. Everything earns its way there through seven consecutive clean days — seven days where the behavior ran, I sampled the output, and found no gate failures. If something fails a gate, it drops a tier and the clock resets.

    The clock is the key part. Trust isn’t a feeling I have about an agent in a given moment. It’s a count of consecutive clean runs. When I look at the Ledger and see that a behavior has been running cleanly for 23 days, I don’t need to review that output today. The track record is the review.


    There are three things that made this work where other approaches didn’t.

    The first is that sampled review is different from universal review. I don’t read every output. I read a percentage of outputs, randomly selected, with a defined rubric for what “good” looks like. If the sample is clean, the population is trusted. If failures cluster around a pattern, I fix the prompt and restart the clock. This scales in a way that reading everything doesn’t.

    The second is source attribution. Every agent output that contains a factual claim has to show where the claim came from. Not because I’m going to verify every citation — I’m not. But because the presence of a citation converts “is this right?” from a research task into a spot check. A trust gap you can close in five seconds is functionally not a gap.

    The third is the rubric. I have a written definition of what “good enough” looks like for each type of output — what voice match means, what coherence means, what the acceptable error rate is. Without the rubric, every review is a fresh judgment call. With it, review is comparison. Comparison is faster, more consistent, and easier to delegate.


    The thing I kept getting wrong before I had this system was trying to close the trust gap with better prompts.

    More detailed instructions. More explicit warnings. Be careful. Double-check your facts. Don’t make up numbers.

    This doesn’t work. The agent already believes it’s being careful. Adding adjectives to a prompt doesn’t change behavior — it changes the agent’s self-description of its behavior, which is not the same thing. The agent that was going to hallucinate a statistic will still hallucinate it, but now it’ll do so with more confidence because you told it to be careful and it thinks it was.

    Structural changes work. Rubrics, sampling rates, attribution requirements, tiered trust with observable clean-day counts. These change what the system produces, not just how it describes what it’s producing.


    I want to be clear that this took a while to build and I’m still refining it.

    There are behaviors on my Ledger that have been running at Tier C for months without a gate failure. There are others that keep dropping back to Tier B because they’re inconsistent in ways I haven’t fully diagnosed yet. The system doesn’t make trust automatic — it makes trust measurable.

    That’s the shift. Not “I trust this agent” as a feeling, but “this behavior has 31 clean days and a gate failure rate of zero” as a fact. You can act on a fact in a way you can’t always act on a feeling.

    The trust gap doesn’t close all at once. It closes by accumulation — one clean run at a time, tracked, until the track record speaks for itself.


    If you’re running agents at any volume and you feel like you’re either checking too much or not checking enough, you’re in the gap. The way out isn’t a better prompt. It’s a system that makes trustworthiness visible over time.

    Start with one agent. Define what “good” looks like. Sample 20% of its output for four weeks. Log what you find.

    By week four you’ll know whether you have a trust problem, a prompt problem, or a rubric problem. Those have different fixes. But you can’t see which one you have until you start measuring.

  • The Bus Factor Problem

    The Bus Factor Problem

    There’s a question I’ve been avoiding for about two years.

    What happens to all of this if something happens to me?

    Not in a morbid way. Just practically. I run 27 client sites. I have an AI stack with dozens of moving parts — Cloud Run services, scheduled jobs, Notion databases, Workers that fire on their own while I sleep. I’ve built systems that work exactly the way I want them to work, in exactly the ways I understand, documented in exactly the language that makes sense to me.

    The bus factor for this entire operation is one. It’s me. If I’m not here, none of it survives in any meaningful way.

    I’ve been sitting with that long enough that I think it’s time to say it out loud.


    The bus factor is an old software engineering concept. It asks: how many people would need to get hit by a bus before this project fails? One is the worst possible number. It means everything lives in a single person’s head — their habits, their passwords, their way of naming things, their unwritten rules about how the system works.

    Most solo operators are a bus factor of one. They know this and they don’t talk about it because it sounds like a personal failing. Like you should have hired more people, or documented better, or not let yourself become the single point of failure for something people depend on.

    But I think the honest version is more complicated than that. A lot of what makes a solo operation valuable is exactly the thing that makes it fragile: it’s shaped entirely around one person’s judgment. The reason the system works is because I know when to break the rules I wrote. I know what the edge cases are before they happen. I know which automations to trust and which ones to watch. That’s not something you write in a runbook.

    So the question isn’t just “how do I document this better.” It’s “how do I make the judgment portable without turning it into something that loses the judgment in the process.”


    I’ve been building toward an answer, in pieces, over the last several months.

    The first piece was Notion as the control plane. Everything that matters about how this operation runs lives in Notion — specs, work orders, site credentials, content pipelines, system standards, the doctrine documents that explain why things are built the way they are. If I disappeared tomorrow, someone with the right access could open that workspace and read their way into understanding the shape of the operation, even if they couldn’t run it yet.

    The second piece was the two-plane architecture — Notion for thinking and storage, GCP for compute. Every Cloud Run service, every scheduled job, every Worker is defined somewhere in Notion before it runs somewhere on GCP. The compute is durable. The logic is documented. Those are two different things, and keeping them separate means neither one is a black box.

    The third piece is the hardest and I’m the least done with it: making the judgment readable.

    I write doctrine pages. Long ones, sometimes. They explain not just what the system does but why it works that way — what the original problem was, what I tried that didn’t work, what the rule is now and what would have to be true for the rule to change. I write them mostly for myself, because I forget things. But they’re also written for the hypothetical person who has to pick this up without me.

    That hypothetical person might be a future employee. It might be a contractor. It might be an AI agent working from a context window that needs to understand the operation well enough to continue it.

    It might be my partner, trying to figure out what to do with the business side of things if I’m not around.

    That’s the version that focuses the mind.


    I don’t have this solved. I want to be clear about that.

    What I have is a direction. The direction is: every decision should live somewhere outside my head. Every system should be explainable by someone who didn’t build it. Every credential should be in the registry, every automation should have a spec, every rule should have a reason written next to it.

    It’s slow work. It runs against the instinct to just build the thing and move on. There’s always something more urgent than documentation, and “I’ll remember how this works” is almost always true right up until it isn’t.

    But I’ve started treating the documentation as part of the product. Not the boring part — the part that makes the product real. A system that only works because I’m here isn’t really a system. It’s a performance.

    The goal is to build something that could survive me. Not because I’m planning to leave, but because the work of making it survivable is also the work of making it understandable, and a system I can’t fully explain is a system I don’t fully own.


    If you’re running something like this — solo or nearly so, more complexity than your headcount would suggest — I’d ask you the same question I’ve been sitting with.

    If something happened to you tomorrow, what would survive?

    Not what you hope would survive. What actually would.

    That gap is the work.