The Bus Factor Problem

About Will

I run a multi-site content operation on Claude and Notion with autonomous agents — and I write about what we do, including what breaks.

Connect on LinkedIn →

There’s a question I’ve been avoiding for about two years.

What happens to all of this if something happens to me?

Not in a morbid way. Just practically. I run 27 client sites. I have an AI stack with dozens of moving parts — Cloud Run services, scheduled jobs, Notion databases, Workers that fire on their own while I sleep. I’ve built systems that work exactly the way I want them to work, in exactly the ways I understand, documented in exactly the language that makes sense to me.

The bus factor for this entire operation is one. It’s me. If I’m not here, none of it survives in any meaningful way.

I’ve been sitting with that long enough that I think it’s time to say it out loud.


The bus factor is an old software engineering concept. It asks: how many people would need to get hit by a bus before this project fails? One is the worst possible number. It means everything lives in a single person’s head — their habits, their passwords, their way of naming things, their unwritten rules about how the system works.

Most solo operators are a bus factor of one. They know this and they don’t talk about it because it sounds like a personal failing. Like you should have hired more people, or documented better, or not let yourself become the single point of failure for something people depend on.

But I think the honest version is more complicated than that. A lot of what makes a solo operation valuable is exactly the thing that makes it fragile: it’s shaped entirely around one person’s judgment. The reason the system works is because I know when to break the rules I wrote. I know what the edge cases are before they happen. I know which automations to trust and which ones to watch. That’s not something you write in a runbook.

So the question isn’t just “how do I document this better.” It’s “how do I make the judgment portable without turning it into something that loses the judgment in the process.”


I’ve been building toward an answer, in pieces, over the last several months.

The first piece was Notion as the control plane. Everything that matters about how this operation runs lives in Notion — specs, work orders, site credentials, content pipelines, system standards, the doctrine documents that explain why things are built the way they are. If I disappeared tomorrow, someone with the right access could open that workspace and read their way into understanding the shape of the operation, even if they couldn’t run it yet.

The second piece was the two-plane architecture — Notion for thinking and storage, GCP for compute. Every Cloud Run service, every scheduled job, every Worker is defined somewhere in Notion before it runs somewhere on GCP. The compute is durable. The logic is documented. Those are two different things, and keeping them separate means neither one is a black box.

The third piece is the hardest and I’m the least done with it: making the judgment readable.

I write doctrine pages. Long ones, sometimes. They explain not just what the system does but why it works that way — what the original problem was, what I tried that didn’t work, what the rule is now and what would have to be true for the rule to change. I write them mostly for myself, because I forget things. But they’re also written for the hypothetical person who has to pick this up without me.

That hypothetical person might be a future employee. It might be a contractor. It might be an AI agent working from a context window that needs to understand the operation well enough to continue it.

It might be my partner, trying to figure out what to do with the business side of things if I’m not around.

That’s the version that focuses the mind.


I don’t have this solved. I want to be clear about that.

What I have is a direction. The direction is: every decision should live somewhere outside my head. Every system should be explainable by someone who didn’t build it. Every credential should be in the registry, every automation should have a spec, every rule should have a reason written next to it.

It’s slow work. It runs against the instinct to just build the thing and move on. There’s always something more urgent than documentation, and “I’ll remember how this works” is almost always true right up until it isn’t.

But I’ve started treating the documentation as part of the product. Not the boring part — the part that makes the product real. A system that only works because I’m here isn’t really a system. It’s a performance.

The goal is to build something that could survive me. Not because I’m planning to leave, but because the work of making it survivable is also the work of making it understandable, and a system I can’t fully explain is a system I don’t fully own.


If you’re running something like this — solo or nearly so, more complexity than your headcount would suggest — I’d ask you the same question I’ve been sitting with.

If something happened to you tomorrow, what would survive?

Not what you hope would survive. What actually would.

That gap is the work.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *