Books for Bots: What a Knowledge Concentrate Actually Is and How It’s Built

By Will Tygart
• Long-form Position
• Practitioner-grade

A transcript is not a knowledge artifact. Neither is a summary. Both are containers for words. Neither is optimized for the thing that needs to consume them.

When you capture an expert’s knowledge and then feed the transcript to an AI system, the AI gets the words. It does not get the structure. It does not know which claims are firsthand vs. secondhand. It cannot distinguish a confident assertion from a hedged one. It has no way to chain the decision logic — the “when X, do Y because Z” sequences that constitute the operational core of what the expert knows. It just has a long document full of things that may or may not be true, with no metadata to tell it which is which.

This is why most knowledge capture projects fail to deliver on their promise. The content is there. The structure that makes it usable isn’t.

A knowledge concentrate is the alternative. It is the distilled, structured artifact produced by the Human Distillery extraction protocol — smaller than a transcript, denser than any summary, and specifically formatted for the AI systems that will consume it.

The Five Components of a Knowledge Concentrate

1. The Entity Graph

Every named concept, process, role, piece of equipment, regulation, and decision point that surfaces in extraction gets represented as a node. The edges between nodes are typed: causal, conditional, hierarchical, associative. The graph is not a list — it’s a map of relationships, and the relationships are the knowledge.

An AI system with a list of entities knows vocabulary. An AI system with an entity graph knows how the domain works — how a change in one thing propagates to another, which concepts are upstream of which decisions, which relationships are conditional and which are structural.

For a water damage restoration operation: the graph connects moisture readings to drying equipment selection to drying time estimates to invoice amounts to adjuster response patterns. None of those connections are in the documentation. All of them are in the head of a senior project manager who has run 400 jobs.

2. Decision Logic

The most directly usable component of the concentrate. Every when-then-because statement extracted from the session, structured as:

Condition: When this situation is present
Action: This is what we do
Because: This is why (the reasoning, not just the rule)
Exceptions: The cases where this breaks down
Confidence score: 0.0–1.0, based on how many independent sources confirmed it

The “because” is what makes this different from a policy. A policy says do Y. A knowledge concentrate says do Y because Z, which means an AI system can recognize when Z is absent and adjust accordingly — rather than applying the rule in cases where the underlying condition that made the rule sensible doesn’t apply.

The exceptions are equally important. Expert judgment is largely the accumulation of exceptions — the cases where the standard answer is wrong. Capturing those is the whole point of Layer 2 extraction.

3. Benchmarks

Every number that surfaces in extraction: thresholds, timelines, costs, rates, ratios, counts. Stored with context, source count, and variance.

A benchmark from a single extraction session has low confidence. The same benchmark confirmed by six independent subjects in the same domain and market has high confidence and is ready to be used as ground truth in an AI system’s reasoning. The concentrate tracks the difference.

This is the component that makes the concentrate valuable as a competitive intelligence product. The numbers in an industry that everyone knows but nobody has published — the real margin thresholds, the actual response time expectations, the price per square foot that experienced operators actually charge vs. what appears in public pricing guides — these exist only in people’s heads. The concentrate captures them with provenance.

4. Tacit Signatures

The things that are hard to explain. Captured as best as they can be verbalized, with a confidence flag.

A tacit signature sounds like: “The drywall feels wrong before the moisture meter confirms it.” Or: “You can tell within the first five minutes of a call whether the adjuster is going to be cooperative or difficult, and it’s not anything specific they say.” These are not mysticism. They are pattern recognition operating below the level of conscious articulation — real knowledge that has never been verbalized because no one asked slowly enough.

The confidence flag on tacit signatures signals to the consuming AI: this is approximate. This is the residue of knowledge the extraction process got close to but couldn’t fully surface. Don’t treat it as ground truth. Treat it as a signal that this is where human judgment is concentrated, and flag it for human review when it’s relevant.

5. Provenance

Traceable but anonymized. For every claim in the concentrate: how many independent sources confirmed it, what their roles were, what domain and market the data came from, and whether the claim is individual knowledge or cross-validated pattern.

Provenance is what makes the concentrate auditable. An AI system that gives an answer based on a knowledge concentrate should be able to say: this answer comes from claim X, which was confirmed by three independent subjects with 10+ years of experience in this domain. That’s a very different epistemic standing than “I was trained on this.”

The Density Test

A useful heuristic for evaluating whether you have a transcript, a summary, or a true knowledge concentrate:

A transcript contains everything that was said. It’s large, raw, and unstructured. An AI can search it but cannot reason from it efficiently.

A summary contains the main points. It’s smaller. It has lost specificity, exceptions, confidence information, and relationships. It’s optimized for human reading, not AI consumption.

A knowledge concentrate is smaller than the summary in tokens but larger in information. It contains relationships the summary dropped. It contains confidence scores the summary didn’t capture. It contains decision logic the summary flattened into assertions. An AI system can reason from it, not just retrieve from it.

If what you have could be produced by someone reading a transcript and taking notes, it’s a summary. A knowledge concentrate requires the extraction protocol — it can only be produced from a session where the tacit layer was deliberately surfaced.