I was trying to write a short post for LinkedIn. The post was fine. The post was also missing the actual insight that made the topic worth writing about. I asked one of the larger AI models to query my Notion workspace and bring back any material I had already written that touched on the topic. It returned a clean, organized summary. Useful. But I had a quiet hunch that the summary was less complete than it looked.
So I asked six other AI models the same question. Different companies, different training data, different objective functions. Same workspace. Same prompt. Then I pasted all the responses back into one synthesizer model and asked it to compare them.
What I found was not subtle. Each model walked into the same room and saw a different room. The agreement zone — what three or more models independently surfaced — turned out to be my actual canon. The divergence zone — the unique pulls only one model found — turned out to contain the most interesting material in the whole set.
This is the writeup of that process, what worked, what did not, and why I think it is genuinely a new way to do research on your own corpus.
The setup
I have a Notion workspace that holds about three years of structured thinking, framework drafts, content strategy notes, and operational documentation. It is the operating brain of a content agency. Roughly 500 pages, a few thousand chunks of indexed text. The kind of corpus that is too big to re-read but too valuable to ignore.
The standard way to get value out of a corpus this size is to use a single AI assistant — Notion AI, ChatGPT with workspace access, Claude with MCP, whatever — and ask it to summarize, search, or extract. This works. It is also limited in a specific way: you only get one model’s reading of your material. One editorial sensibility. One set of training-data biases shaping what gets surfaced and what gets walked past.
The experiment was simple. Run the same comprehensive prompt across seven models in parallel. Paste each response into a single conversation with a synthesizer model. Compare.
The prompt
The prompt asked each model to sweep the workspace for any content related to a specific cluster of themes — personal branding, skill development, niche authority, content strategy, and learning systems. It instructed each model to skip generic logs and surface only specific frameworks, named concepts, distinctive sentences, and concrete examples already in the user’s voice. It explicitly asked them to ignore noise and return concentrated signal.
The same prompt went to every model. No customization. No second pass. Just one query each, then their raw responses pasted into a synthesis conversation.
The seven models
- Claude Opus 4.7
- Claude Opus 4.6
- Claude Sonnet 4.6
- Google Gemini 3.1 Pro
- OpenAI GPT 5.4
- OpenAI GPT 5.2
- Moonshot Kimi 2.6
One additional model — Gemini 2.5 Flash — was queried but declined. It honestly reported that it could not access the workspace from chat mode. That non-result turned out to be useful information of its own kind, which I will come back to.
What happened
The agreement zone is the canon
A small set of concepts showed up in three or more model responses. Same source pages. Same quotes. Same framing. When seven independently trained AI models — different companies, different architectures, different objective functions — converge on the same handful of ideas pulled from your own writing, that convergence is not coincidence. It is signal that those ideas are structurally important in your corpus.
For my own workspace, the agreement zone surfaced about a dozen high-conviction concepts that had been scattered across hundreds of pages. I had written all of them. I had not realized which ones were structurally load-bearing in my own thinking. The triangulation made it obvious.
This is the first practical use case: multi-model concentration tells you what your canon actually is. Not what you think it is. Not what you wish it was. What the corpus, read by neutral readers, demonstrably contains.
The divergence zone is the edge
The more interesting half of the experiment was where the models disagreed. Each model surfaced unique material the others walked past. Not because the others missed it accidentally. Because each model has a different training signature that shapes what it values reading.
- One Claude model went structural. It proposed a spine for the article and called out gaps in the corpus where I would need to do net-new research.
- A different Claude version went concept-cartographer. It found named framework clusters the others scattered across multiple sections.
- A Sonnet model surfaced operational mechanics — the actual step-by-step inside frameworks the others mentioned at headline level.
- Gemini found pragmatic material no one else touched, including specific productivity numbers from the corpus.
- One GPT version played hidden-gem hunter, surfacing single sentences with article-grade force that other models read past.
- The other GPT version restructured everything into a finished reference document — designed as something publishable, not just retrievable.
- Kimi went deep-system archaeologist, finding named frameworks in corners of the workspace others did not reach.
Reading the seven outputs in sequence felt like getting feedback from seven editors. None of them were wrong. None of them were complete. The full picture only emerged when I treated all seven as inputs to a synthesis layer.
The negative result mattered
Gemini Flash’s honest “I cannot access this workspace from chat mode” was, in a quiet way, the most useful single response. It told me that workspace access is not equally distributed across the models I have available. Future runs of this methodology need to verify connectivity first — otherwise I am not comparing models, I am comparing connection states.
It also reminded me that an AI that says “I cannot” is, on average, more trustworthy with deeper work than one that hallucinates a confident-sounding pull from a workspace it could not see. Worth weighting that into model selection going forward.
The complication: recursive consensus
Partway through the experiment I noticed something I had not predicted. Three of the models cited previous AI synthesis pages already living in my workspace. Pages titled things like “Cross-Model Second Brain Analysis Round 1” or “Round 3: Embedding-Fed Generative Pass.” These were artifacts of earlier concentration sessions I had run weeks ago and saved into Notion as canonical pages.
Which means: when models queried my workspace, they were sometimes finding pages where previous models had already done this exact exercise and reached conclusions. Those pages were then read back as “discovered” insight by the current round of models.
This matters. It means the agreement zone is partially inflated. When four models all surface the same concept as “an undervalued piece of intellectual property,” some of that consensus might be coming from a Notion page that already says exactly that — written by a prior AI synthesis based on a still-earlier round of consensus.
That is a feedback loop. Earlier AI conclusions become canonical workspace content that later AI reads back as independently-discovered insight. It is not bad — in some sense it is exactly how a knowledge system should compound over time — but it should be named, because if you do not name it, you mistake echo for verification.
The two types of signal
Once you know about the recursive consensus problem, you can sort the agreement zone into two cleaner buckets:
Primary-source canon. Concepts that surface across multiple models because the models independently found them on pages you originally wrote. These are the cleanest possible signal. Multiple neutral readers, reading your original material, all flagged the same idea as structurally important.
Recursive AI consensus. Concepts that surface across multiple models because the models found them on pages that were themselves AI syntheses of earlier AI rounds. These are not worthless — the original AI rounds were also synthesizing real material — but they should be weighted lower than primary-source canon.
Practically, this means tagging synthesis pages clearly in your knowledge base. Something like a metadata field on each Notion page declaring whether it is primary-source thinking or AI-derived synthesis. Future model runs can then be instructed to weight primary higher than synthesis, or to exclude synthesis entirely on a given pull.
Why this is a real methodology, not just a curiosity
I want to be careful not to overclaim. This is not magic. It is a specific application of well-understood ensemble principles — the same logic that says combining multiple weak classifiers usually beats a single strong one — applied to retrieval and synthesis over a personal corpus.
What makes it useful in practice is that the cost is near zero, the inputs are already sitting in your workspace, and the output is a brief that is grounded in your own material rather than confabulated by a single model. For anyone who writes long-form, builds frameworks, or runs a knowledge-driven business, this is a genuine upgrade over single-model summarization.
The four properties that make it work
- Different training signatures. The models must come from different labs with different training data. Two Claude models from the same family produce more correlated readings than a Claude and a Gemini. The diversity of the readers is the entire point.
- Same prompt, no customization. The comparison only works if every model sees the identical query. Optimizing the prompt for each model defeats the purpose.
- Same workspace access. All models must have read access to the same corpus. Otherwise the divergence is a function of who could see what, not a function of editorial sensibility.
- A synthesizer that compares, not summarizes. The final layer is not “give me a summary of all seven outputs.” It is “tell me where they agree, where they diverge, and what each model uniquely contributed.” That second framing is what makes the canon and the edge visible.
What you actually do with the output
The synthesizer’s comparison is the deliverable, not the source pulls. The pulls are raw material. The synthesis tells you:
- What is undeniably canonical in your corpus (3+ model agreement)
- What is structurally important but only one model spotted (the article-grade gems)
- What is missing from your corpus entirely and would require external research (the gap analysis)
- Which models are best at which types of retrieval (so you can pick better next time)
That output is the brief. Whatever you build next — an article, a pitch, a framework, a new product — starts from there.
The methodology in five steps
- Decide what you want to extract. Pick a thematic cluster. Not “summarize my workspace” — too broad. Something like “everything related to my personal branding, skill development, and authority-building thinking.” Specific enough to focus the readers, broad enough to invite real coverage.
- Write one prompt. The prompt should ask for specifics — frameworks, distinctive phrases, named concepts, examples in your voice — and explicitly tell each model to filter out generic notes, meeting logs, and task lists. Tell it you want concentrated signal, not summary.
- Run the same prompt across as many cross-lab models as you have access to. Three is the minimum useful sample. Five to seven gives a much clearer picture. Pull in Anthropic, OpenAI, Google, and at least one frontier model from outside the big three.
- Paste every response into a single synthesis conversation. Tell the synthesizer to compare, identify the agreement zone, identify the divergence zone, flag any negative results (models that could not access the corpus), and call out where the consensus might be inflated by recursive AI synthesis pages.
- Use the synthesis as your brief. Whatever you build next starts from this output, not from a blank page or a single model’s summary.
The honest caveats
Three things to keep in mind before you try this.
It only works on a corpus worth triangulating. If your knowledge base is small, generic, or mostly meeting notes, the multi-model approach will not surface anything more useful than a single model would. The methodology assumes you have done the work of building a substantive corpus first.
Connectivity is not uniform. Not every model has the same access to your workspace. Some will refuse the query honestly. Some may try to answer without true workspace access and confabulate. Verify what each model actually had access to before you compare outputs.
The recursive consensus is real. If your workspace contains prior AI syntheses, future syntheses will be partially echoing past ones. This is not a fatal flaw — it is how a knowledge system compounds — but you should know it is happening so you do not over-weight findings that are bouncing around inside your own AI history.
Why this matters beyond writing one article
The bigger frame is this: most of the value in any modern knowledge worker’s life lives inside a corpus they have written themselves but cannot fully see. Notes, drafts, frameworks, half-finished documents, scattered insights. The brain that produced all of it cannot reread all of it.
Single-model retrieval lets you query that corpus through one editorial lens. Useful. Limited.
Multi-model concentration lets you query that corpus through several editorial lenses simultaneously, then triangulate. The agreement zone reveals what is structurally important in your own thinking. The divergence zone reveals the high-value material that only some kinds of readers will catch. The negative results reveal capability gaps you should know about. The whole thing produces a much higher-resolution map of your own intellectual material than any one model can produce alone.
It cost almost nothing to run. It took maybe two hours from first prompt to final synthesis. The output was substantively better than anything I have produced from a single-model query. And the meta-insight — that AI consensus over your own corpus is partially recursive and needs to be tagged accordingly — is itself the kind of finding I would not have noticed without running multiple models in parallel.
This is a methodology, not a one-off trick. I will keep using it. If you have a corpus worth concentrating, you should try it too.
Frequently asked questions
How many models do I need?
Three is the minimum. Five to seven is the sweet spot. Past about ten you hit diminishing returns and start spending more time managing the inputs than reading the synthesis.
Do the models need to come from different companies?
Yes. Two Claude models will produce more correlated readings than a Claude and a Gemini. The diversity of training data is what makes the triangulation work. Mix Anthropic, OpenAI, Google, and at least one frontier model from outside the three big labs.
What if my models cannot access my workspace?
Then the methodology does not run. Connectivity is the prerequisite. Verify each model’s access before you start. A model that confabulates a confident-sounding pull from a workspace it cannot see is worse than a model that honestly declines.
How do I handle the recursive consensus problem?
Tag synthesis pages in your workspace with a metadata field declaring them as AI-derived. Then either instruct future model runs to weight primary-source pages higher, or run two passes: one with all sources, one with synthesis pages excluded. The delta between the two passes shows you what is genuine new signal versus what is echo.
What is the synthesizer model supposed to do differently than the source models?
The synthesizer is not summarizing your corpus. It is comparing the seven readings of your corpus. Its job is to identify agreement, divergence, and gaps across the inputs, and to flag the methodological caveats. That is a different task than retrieval. Pick a model with strong reasoning over long context for the synthesis layer.
Can I use this for things other than writing articles?
Yes. Anywhere you need to extract a brief from a substantial corpus — pitch decks, framework design, product positioning, board prep, strategic planning — multi-model concentration gives you a higher-resolution starting point than single-model retrieval. The article use case is just where I noticed it. The methodology generalizes.
The bottom line
One AI reading of your knowledge base is one editor’s opinion. Seven AI readings, compared properly, is a triangulation. The agreement zone is your actual canon. The divergence zone contains the highest-value unique material. The negative results tell you about capability gaps. The recursive consensus problem tells you which conclusions to trust and which to weight lower.
The whole thing is cheap, fast, and produces material no single model can produce alone. If you have a corpus worth thinking about, you have a corpus worth concentrating across multiple models. Start with three. Compare what they bring back. The methodology gets sharper from there.
Leave a Reply