Tag: SEO

  • llms-full.txt vs llms.txt: Why AI Agents Crawl It More (2026)

    llms-full.txt vs llms.txt: Why AI Agents Crawl It More (2026)

    Most conversations about AI crawlability focus on one file: llms.txt. But if you look at what Anthropic, Vercel, and LangGraph actually ship – and what GEO crawler research found AI agents fetching most – the file that matters more is its companion: llms-full.txt.

    Here’s the practical reality: llms.txt is the map. llms-full.txt is the territory. And in 2026, the agents that matter for citation traffic are fetching the territory.

    The Full File Family You Probably Don’t Know About

    The original llms.txt proposal – published by Jeremy Howard in September 2024 – defined one file. Implementers built the rest. The complete family as of mid-2026 is four files, but most sites only need two:

    FileWhat’s in itWhen to use
    /llms.txtCurated index – H1, summary, link sectionsAlways. The orientation layer.
    /llms-full.txtFull content of every linked page, concatenated as MarkdownWhen you want a model to deep-ingest your docs in a single fetch
    /llms-ctx.txtPre-expanded context without URLsFastHTML-style implementations
    /llms-ctx-full.txtPre-expanded context with URLs preservedSame, but URL-aware

    The pattern that works – and the one Anthropic, Vercel, and LangGraph all run – is the index + export pair: llms.txt for orientation, llms-full.txt for deep ingestion.

    Why llms-full.txt Gets Crawled More

    GEO researchers analyzing AI crawler behavior – including work cited by Profound – have noted that agents from Microsoft, OpenAI, and others tend to fetch llms-full.txt more frequently than llms.txt when both are present. The working explanation is structural: when a file contains the full content, it removes one retrieval step. An agent that fetches llms-full.txt gets everything it needs in a single HTTP request instead of fetching the index, parsing the links, then fetching each linked page individually. This is consistent with how developer documentation platforms like Mintlify describe the behavior of IDE agents operating under tight latency budgets.

    For IDE agents (Cursor, Continue, Cline) and MCP integrations, this is even more pronounced. These tools are operating under tight context windows and latency budgets. A single fetch that returns a clean Markdown blob of your entire docs is structurally preferable to a multi-step crawl.

    The implication: if you’ve shipped llms.txt but not llms-full.txt, you’ve done half the job.

    How to Build llms-full.txt

    The construction logic is simple: take every URL in your llms.txt, fetch each page, strip HTML to Markdown, and concatenate. In practice, most sites do this in their build pipeline.

    Here’s the minimal Node.js pattern:

    const fs = require('fs');
    const fetch = require('node-fetch');
    const TurndownService = require('turndown');
    const turndown = new TurndownService();
    
    async function buildLlmsFullTxt(llmsIndexPath, outputPath) {
      const index = fs.readFileSync(llmsIndexPath, 'utf8');
      const urlRegex = /\[.*?\]\((https?:\/\/[^\)]+)\)/g;
      const urls = [...index.matchAll(urlRegex)].map(m => m[1]);
    
      let output = '';
      for (const url of urls) {
        const res = await fetch(url);
        const html = await res.text();
        const markdown = turndown.turndown(html);
        output += \n\n---\n# Source: \n\n;
      }
    
      fs.writeFileSync(outputPath, output);
      console.log(Built llms-full.txt:  pages,  chars);
    }
    
    buildLlmsFullTxt('./public/llms.txt', './public/llms-full.txt');

    One constraint to manage: keep llms-full.txt under roughly 200,000 tokens (about 150K words, around 700KB). That’s the threshold where most models can ingest the file in a single context window. If your docs are larger, segment by product or language the way Supabase does – llms-full-api.txt, llms-full-guides.txt – and list the segmented files in your main llms.txt.

    The 2026 robots.txt Stack That Completes the Picture

    Shipping llms.txt and llms-full.txt is the visibility layer. The access-control layer is robots.txt – and it changed significantly in Q2 2026.

    The key development: Anthropic split its crawler into two separate user-agents. ClaudeBot is the training scraper (high bandwidth, no citation value – block it). Claude-Web is the live-retrieval agent that fetches pages to answer Claude.ai user queries in real time (allow it, because it drives citation traffic). Brands that blanket-block “all Anthropic crawlers” lose Claude citations entirely.

    Meta also shipped two active training scrapers in March 2026 – FacebookBot and Meta-ExternalAgent – at GPTBot-level crawl volume. Most sites have no rules for them yet.

    Here’s the 2026 template:

    # BLOCK: Training scrapers - high bandwidth, zero referral value
    User-agent: GPTBot
    Disallow: /
    
    User-agent: CCBot
    Disallow: /
    
    User-agent: ClaudeBot
    Disallow: /
    
    User-agent: FacebookBot
    Disallow: /
    
    User-agent: Meta-ExternalAgent
    Disallow: /
    
    # OPT OUT: Google Gemini training (keeps Search indexing intact)
    User-agent: Google-Extended
    Disallow: /
    
    # ALLOW: Live-retrieval agents - drive citation traffic
    User-agent: OAI-SearchBot
    Allow: /
    
    User-agent: ChatGPT-User
    Allow: /
    
    User-agent: Claude-Web
    Allow: /
    
    User-agent: anthropic-ai
    Allow: /
    
    User-agent: PerplexityBot
    Allow: /

    One important caveat on robots.txt enforcement: aggressive training scrapers often ignore the file or spoof their user-agents. The robots.txt rules signal intent and work for compliant bots; a WAF rule at the edge is the only deterministic block for non-compliant crawlers.

    The Honest State of the Technology

    The SERanking study of 300,000 domains (November 2025) found no measurable correlation between having llms.txt and being cited by ChatGPT, Claude, Gemini, or Perplexity. Google’s John Mueller compared the file to the deprecated keywords meta tag – something site owners declare but that search systems derive from the content itself.

    None of that means you shouldn’t ship both files. The cost is low, the optionality is real, and the IDE-agent ecosystem (Cursor, Continue, Cline) does actively use llms.txt. But the robots.txt work is the lever that moves outcomes today. The llms.txt + llms-full.txt pair is infrastructure investment – you want to be correct when major LLM providers start honoring it, and building the build pipeline now costs far less than retrofitting it later.

    The practical sequence for a site that hasn’t done this yet:

    1. Update robots.txt first. Add the Q2 2026 user-agent rules above. This takes twenty minutes and immediately affects how training scrapers treat your content.
    2. Ship llms.txt. Curated index, 20-50 priority pages, one-sentence description per link, sections in priority order.
    3. Build llms-full.txt. Concatenated Markdown of every linked page, under 200K tokens. Run it in your build pipeline so it stays current.
    4. Verify both files are served correctly. curl -I https://yoursite.com/llms.txt should return 200 with Content-Type: text/plain. A 404 on either file is the most common implementation error.
    5. Add an access-log check. Once per month, grep your logs for requests to /llms.txt and /llms-full.txt by user-agent. You want to see live-retrieval agents (Claude-Web, OAI-SearchBot, PerplexityBot) in the results – not just training scrapers.

    The goal isn’t to optimize for a standard that isn’t fully adopted yet. It’s to build the infrastructure correctly now, while the field is still forming, so that adoption changes work in your favor rather than requiring catch-up.

    Related Reading

    Frequently Asked Questions

    What is the difference between llms.txt and llms-full.txt?

    llms.txt is a curated index — an H1, a summary, and link sections that orient an AI agent to your site. llms-full.txt is the full content of every linked page concatenated as Markdown, so an agent can deep-ingest your documentation in a single fetch. The index is the map; the full file is the territory.

    Why do AI agents crawl llms-full.txt more often than llms.txt?

    Fetching llms-full.txt removes a retrieval step: the agent gets everything in one HTTP request instead of fetching the index, parsing links, and fetching each page individually. For IDE agents like Cursor, Continue, and Cline operating under tight latency and context budgets, a single clean Markdown blob is structurally preferable to a multi-step crawl.

    How big should llms-full.txt be?

    Keep it under roughly 200,000 tokens (about 150K words, around 700KB) so most models can ingest it in a single context window. If your docs are larger, segment by product or language — for example llms-full-api.txt and llms-full-guides.txt — and list the segmented files in your main llms.txt.

    Does having llms.txt actually improve AI citations?

    Not measurably on its own. A November 2025 SERanking study of 300,000 domains found no correlation between having llms.txt and being cited by ChatGPT, Claude, Gemini, or Perplexity, and Google’s John Mueller compared it to the deprecated keywords meta tag. The lever that moves outcomes today is robots.txt configuration; llms.txt and llms-full.txt are low-cost infrastructure for when adoption grows.

    Which AI crawlers should I allow in robots.txt in 2026?

    Allow live-retrieval agents that drive citation traffic — Claude-Web, OAI-SearchBot, ChatGPT-User, anthropic-ai, and PerplexityBot. Block high-bandwidth training scrapers with no referral value such as GPTBot, CCBot, ClaudeBot, FacebookBot, and Meta-ExternalAgent, and opt out of Google-Extended to skip Gemini training while keeping Search indexing intact.

  • How AI Engines Actually Cite Your Content: Grounding and GEO Guide

    How AI Engines Actually Cite Your Content: Grounding and GEO Guide

    Last verified: June 2026.

    Most “GEO” advice is recycled SEO with the word “AI” pasted on top. This guide is different. It describes what actually happens when Microsoft Copilot, Bing’s AI answers, and Google’s AI Overviews build a response and decide whose page to cite — based on running content sites that get cited tens of thousands of times a month. The short version: AI engines do not cite the page that ranks #1 for a head term. They cite the page that most directly answers the specific sub-question the model is grounding on. That distinction changes everything about what you should write.

    How grounding actually works (the part nobody explains)

    When you ask Copilot or Bing’s AI a question, the model does not answer from memory. It runs a retrieval step called grounding: it rewrites your question into one or more search queries, fetches a handful of live web results, reads them, and composes an answer with inline citations pointing back at the pages it used. Google’s AI Overviews work the same way with a technique it calls “query fan-out” — one user question becomes many narrower synthetic queries.

    Two things follow directly from this mechanism:

    • The model is not searching for your keyword. It is searching for the answer to a decomposed sub-question. A user who asks “what’s the best way to instantly index a new page” triggers grounding queries like “IndexNow API endpoint”, “submit URL to Bing programmatically”, and “IndexNow key file location”. The page that wins is the one that answers those narrow strings, not the one optimized for “indexing tips”.
    • Citations are extracted at the passage level, not the page level. The model lifts the specific sentence or table that answers the sub-question. If your answer is buried under 600 words of preamble, it loses to a page that states the fact in the first line under a matching heading.

    This is why a niche, specific page routinely out-cites a high-authority generalist. The generalist ranks; the specialist gets quoted.

    Why operational and comparison pages win over head terms

    Across real citation data, the pages that get pulled into AI answers cluster into three shapes. None of them are “ultimate guide to X”.

    1. Operational pages with real commands, configs, and error messages

    When someone asks an AI assistant “how do I fix [specific error]” or “what’s the exact command to do X”, the model needs a page that contains the literal command, the literal config, or the literal error string. Generic advice cannot be cited because there is nothing concrete to quote. A page that says:

    curl "https://www.bing.com/indexnow?url=https://example.com/new-page/&key=YOUR_KEY"
    # 200 = received (not "indexed"), 422 = URL/key mismatch, 429 = too many submits

    …is citation gold, because the model can extract that block verbatim and the user can act on it. The error-code annotations matter: questions about failures (“IndexNow 422”, “why am I getting 429”) are high-intent and low-competition, and a page that names the exact codes owns them.

    2. Comparison pages (“X vs Y”)

    “Which is better, X or Y” is one of the most common shapes of AI query, and comparison content is structurally easy to cite because it maps cleanly to a decision. If you maintain honest, current head-to-head pages, you become the default source the model reaches for when a user is choosing between tools. This is exactly why we keep dedicated comparison pages like Claude Code vs Cursor and Claude Code vs Codex — they answer a decision the model is constantly being asked to make, and a table of differences is trivially quotable.

    3. Fresh, dated pages on fast-moving topics

    For anything that changes — pricing, model versions, API limits, feature availability — grounding strongly favors recency. The model would rather cite a page dated this month than an “authoritative” page from two years ago that might be wrong. A visible “Last verified” date and a real publish/update timestamp are not decoration; they are a relevance signal the retrieval layer reads.

    The losing move is chasing broad head terms. “Best AI coding assistant” is saturated, generic, and rarely the literal grounding query. The winning move is to own the long, specific, operational and comparison strings that the fan-out actually generates.

    IndexNow: how to get cited the same day you publish

    Grounding can only cite pages the engine knows about. The bottleneck for new content is crawl latency — and IndexNow collapses it. IndexNow is an open protocol (backed by Microsoft Bing and Yandex) that lets you push a URL to the index the instant you publish, instead of waiting for a crawler to wander by.

    Setup is two steps:

    1. Host a key file. Generate a key of 8-128 hex characters and place it at your site root as a UTF-8 text file named {key}.txt containing exactly that key. Example: https://example.com/daa44a2c....txt. This proves you own the host.
    2. Ping on publish. Single URL via GET:
      curl "https://api.indexnow.org/indexnow?url=https://example.com/new-page/&key=YOUR_KEY"

      Or batch up to 10,000 URLs in one POST:

      curl -X POST "https://api.indexnow.org/indexnow" \
        -H "Content-Type: application/json" \
        -d '{"host":"example.com","key":"YOUR_KEY","urlList":["https://example.com/a/","https://example.com/b/"]}'

    A 200 means the endpoint received your URL (not that it is indexed yet). Submitting to api.indexnow.org shares the ping with all participating engines, so you do not need to hit Bing and Yandex separately. Most WordPress SEO plugins (Rank Math, Yoast, SEOPress) have IndexNow built in — turn it on and it fires automatically on every publish and update. The practical payoff: pages can enter Bing’s crawl queue within hours, which means they are eligible to be grounded and cited the same day, not next week.

    One caveat worth stating plainly: IndexNow accelerates indexing, which is a precondition for citation. It does not force a citation. You still need the page to be the best answer to the sub-question. But for fresh, time-sensitive content, same-day indexing is often the difference between getting cited while the topic is hot and showing up after the conversation has moved on.

    How to actually measure your AI citations

    For a long time AI citations were invisible — you could see referral clicks in analytics but not the citations themselves (most AI answers are zero-click). That changed. As of February 2026, Bing Webmaster Tools ships an AI Performance report (public preview) that shows when your pages are cited across Microsoft Copilot, Bing’s AI answers, and partner surfaces. It is the first direct, free window into AI citation behavior, and you should be reading it weekly.

    The four metrics that matter:

    • Total citations — how many times your site was cited as a source in AI answers over the period.
    • Average cited pages — the daily average count of unique URLs from your site that got referenced. This tells you whether citations are concentrated on one page or spread across the site.
    • Grounding queries — sample query phrases the AI used to retrieve and cite you. This is the single most actionable field in the report. It is a literal list of the sub-questions you are winning, which tells you exactly which operational/comparison angles to expand next.
    • Page-level citation activity — citations by URL, so you can see which pages are doing the work.

    Two limitations to keep in mind so you read the data honestly: the report does not show click data (you see citations, not visits from them), and it aggregates Copilot with Bing summaries, so you cannot isolate one surface from the other. For Google’s AI Overviews there is still no equivalent citation dashboard — the closest proxy is watching impressions and referral patterns in GA4 and Search Console, plus spot-checking your target queries by hand.

    The workflow that works: pull the grounding-queries list, find the patterns, and feed them straight back into your content plan. If you are getting cited for “claude mcp setup” variants, that is a signal to deepen pages like the Claude MCP setup guide and adjacent operational walkthroughs, not to chase a new head term.

    A repeatable checklist for citation-optimized pages

    Everything above reduces to a build pattern. For any page you want AI engines to cite:

    • Lead with the answer. Put a short, factual, quotable answer in the first 1-2 sentences under each heading. Assume the model reads only that passage.
    • Use question-shaped headings. H2s and H3s that mirror real queries (“How does IndexNow work?”, “How do I measure AI citations?”) match the grounding query and give the extractor a clean anchor.
    • Be specific and operational. Real commands, real config, real numbers, real error codes and fixes. Concrete text is extractable; vague advice is not.
    • Add a visible FAQ near the end. Plain question/answer pairs are the single most citation-friendly format, because each pair is a self-contained answer to a discrete sub-question. You do not need JSON-LD schema for this to work — visible Q&A text is what the model reads.
    • Date it and keep it current. A “Last verified” line plus genuine updates on fast-moving topics buys you the recency edge in grounding.
    • Push it with IndexNow so it is indexable the same day, then watch the AI Performance report to see which sub-questions it wins.

    If you want the larger system this fits into — the full toolchain for operating as an AI-first publisher, from MCP servers to publishing pipelines — start with the AI operator’s stack.

    FAQ

    Do AI engines cite the page that ranks #1 on Google?

    Not reliably. AI engines run their own grounding retrieval and cite the page that most directly answers the specific decomposed sub-question, which is often a niche, operational page rather than the head-term winner. Ranking helps your page be discoverable, but the citation goes to whichever passage best answers the exact grounding query.

    What is grounding in AI search?

    Grounding is the retrieval step where an AI assistant rewrites your question into search queries, fetches live web pages, reads them, and builds an answer with inline citations to those pages. It is why current, specific pages can get cited even by a model whose training data predates them.

    Does IndexNow guarantee my page will be cited by AI?

    No. IndexNow guarantees fast indexing, which is a precondition for being cited. The page still has to be the best, most specific answer to the sub-question the model is grounding on. Think of IndexNow as removing the crawl-latency excuse, not as buying a citation.

    How do I measure how often AI cites my site?

    Use the AI Performance report in Bing Webmaster Tools (public preview since February 2026). It shows total citations, average cited pages per day, sample grounding queries, and citation counts by URL across Microsoft Copilot and Bing AI answers. It does not yet show click-through from those citations, and there is no equivalent dashboard for Google AI Overviews.

    Do I need JSON-LD or schema markup to get cited?

    No. Citation extraction works on visible, well-structured text — question-shaped headings, short factual answers, and a plain visible FAQ. Schema can help search features generally, but it is not required for AI grounding to read and quote your page.

    What kind of pages get cited most?

    Three shapes dominate: operational pages with real commands, configs, and error fixes; comparison pages that resolve a “X vs Y” decision; and fresh, dated pages on fast-moving topics like pricing and model versions. Broad head-term content tends to get skipped because it rarely matches the literal grounding query and offers nothing concrete to quote.

  • AI Loves This Site. Humans Don’t Stick Around. The Retention Leak, in Public.

    AI Loves This Site. Humans Don’t Stick Around. The Retention Leak, in Public.

    📡 Radar Update: Claude 4.6 Sonnet

    Field Intel (2026-05-30): Our social listening desks have detected a massive shift in developer sentiment regarding Claude’s context capabilities.

    • 📈 The Upgrade: Developers on r/ClaudeAI are reporting silent upgrades to the API’s output token ceiling, with contiguous code generations exceeding 6,000 lines without hallucination.
    • 💡 Why it matters: If Anthropic is actively tuning the output ceilings, relying on official documentation limits may underestimate what the model can actually handle in production right now.

    Part 3 of 3. Part 1 was the flex — AI assistants cite us and Claude.ai is our #4 traffic source. Part 2 was the playbook — each model cites completely different kinds of pages. Part 3 is the honest one. When I ran the same Claude-powered browser agent against our behavior and event data, the story flipped. The acquisition side of tygartmedia.com is working beautifully. The retention side barely exists. AI assistants like this site more than humans stick around for, and the data makes that painfully clear.

    I am publishing the whole leak in public because the fix is the interesting part.

    99.86% of our readers are brand new

    In 29 days, GA4 fired 1,405 first_visit events against 1,407 active users. That is a returning-visitor rate of roughly 0.14%. A healthy media site runs at 25–40%. We are running at effectively zero. Put another way: every one of our ~1,400 monthly readers has to be re-acquired next month because there is no returning audience to compound on.

    That number is the single most important finding in this whole three-part series. Every story about our AI-referral win in Parts 1 and 2 sits on top of it. If Claude stopped citing us tomorrow, traffic would roughly halve inside 60 days — there is no cushion.

    Only 8.6% of visitors scroll to the bottom

    GA4 fires a scroll event at 90% page depth by default. Over 29 days, 121 users out of 1,407 fired one. That is 8.6%. The publishing benchmark sits at 25–35%. We are at roughly a quarter of that.

    There are two explanations and both are true at once. Some share of the traffic is crawlers and scrapers that do not scroll. And some share of real humans are landing on articles that are either too long for the intent they arrived with, or do not give them a reason to keep going past the first answer.

    Four form submissions. In 29 days. Across 1,400 readers.

    Event Count Users Events / User
    page_view 2,007 1,406 1.43
    session_start 1,652 1,406 1.18
    first_visit 1,405 1,405 1.00
    user_engagement 999 675 1.54
    scroll 192 121 1.59
    click 34 30 1.13
    form_start 15 5 3.00
    form_submit 4 4 1.00

    Four form submissions across 1,655 sessions. 0.24% conversion. Fifteen people started a form and eleven of them walked away, for a 73% abandonment rate on whatever form we have running. There is also no newsletter_signup event, no cta_click event, no outbound_click event, no video_play event, no file_download event. We are running a publication with effectively zero instrumentation of reader behavior beyond “did the page load.” That is the measurement vacuum, and it is on us to fix.

    Pages per session: 1.21

    1,655 sessions produced 2,007 page views. That works out to 1.21 pages per session. Healthy media sites run 1.8–3.0. Wikipedia runs 4+. We are effectively a single-page-entry site. Readers arrive for one article, read it or do not, and leave. Nobody is browsing our categories. Nobody is clicking a related-posts rail, because we do not really have one. The internal link graph between our Claude desk, our restoration B2B content, our Mason County hyperlocal, and our general-interest pieces is not moving anybody between them, and the data proves it.

    There is one exception worth sitting with. Homepage visitors ( / ) hit an average of 1.59 views per user — meaningfully higher than the site average. The homepage is doing its job. The article templates are not.

    Retention is essentially zero

    The GA4 retention cohort chart peaks at about 5% Day-1 retention and drops to effectively zero by Day 7. Out of every 100 readers today, 5 come back tomorrow and 0 come back next week. Healthy publications run 15–25% on Day 1 and 5–10% on Day 7. We are running at a quarter of that across the board.

    The fix here is not content. It is a capture mechanism. Right now we have no durable way to turn a claude.ai referral into a known email address. Every AI-cited reader is a one-night stand with the site. Four form submissions in a month is not a newsletter strategy, it is a rounding error.

    Real human audience: ~675, not 1,407

    GA4 fires user_engagement roughly every 10 seconds of active foreground time. In 29 days only 675 users out of 1,407 ever fired one. That means 52% of our “users” never stuck around long enough for GA4 to confirm they were actually looking at the page. That bucket is some mix of near-instant bounces, back-button users, and crawlers that do not fire the event.

    Flipping it the other direction: 48% of reported users is probably the cleanest “real human reader” estimate in the whole account. Call it ~675 real humans per month. That is the number to plan around, not the 1,407 that shows on the dashboard.

    The 404 problem is real, and worse for AI referrals

    Page not found – Tygart Media is our #7 most-viewed page title in 29 days at 37 pageviews. Some of that is the expected noise of a site that has been through at least one URL restructure — the -2 and -3 suffixed slugs in the data (/anthropic-founders-2, /anthropic-ipo-2, /history-of-anthropic-2) suggest a prior rewrite. But some of it is almost certainly AI assistants citing URLs that no longer resolve.

    That is the single worst trust loop to leave open. The LLM does not know the URL is broken. It will keep citing it. Every 404 from an AI referral is a reader who was told by Claude that we had the answer, clicked through, and got a broken page. Fixing the 37 should be the highest-ROI hour of SEO work on our calendar this week.

    Concentration risk: one page is carrying the site

    /claude-student-discount accounted for 84 of our 2,007 total pageviews in 29 days — roughly 4% of all views on a single URL, and almost 12% when you include everyone who landed on it through any source. It is also the single page cited by all three major LLMs (27 combined sessions from Claude, ChatGPT, and Perplexity). It is both our crown jewel and our single point of failure.

    If Anthropic changes their student policy, or a competitor sherlocks the page with a better answer, we lose a material share of total traffic overnight. The response is not to panic, it is to diversify. The structural template that makes that page cite-worthy — narrow topic, answer-first, scannable facts — is repeatable. We need three to five more pages shaped exactly like it.

    A real-time snapshot that says everything

    While the agent was running the reports, it pulled the real-time view. Two active users were on the site. One was reading /claude-code-vs-aider, a comparison piece. One was bouncing between /selling-into-general-contractors and /selling-into-property-managers, two B2B restoration pages. One landed on a 404. Three verticals, three intents, one broken link — our whole site compressed into thirty minutes.

    The short version

    We have built a site that AI models like more than humans stick around for. The acquisition side is working. The retention side barely exists. The AI-citation layer is the most interesting asset we have, and it is sitting on top of a reader experience that converts at approximately zero. Close that gap and this turns into a real publication. Leave it open and we are running a very sophisticated funnel that leaks at the bottom. Publishing this publicly is the accountability move — we will update these numbers in 60 days.

    The fix, as a list

    • Instrument the site properly. Add GA4 events for newsletter_signup, cta_click, outbound_click, and scroll depth at 25 / 50 / 75 / 100%. Mark at least one as a key event. Right now we are flying blind past page-load.
    • Redirect the 404s. Pull the 37 broken-page pageviews, map each to the closest live URL, and push 301s. This is the single highest-ROI hour of SEO work available this week, and it specifically repairs the AI-citation trust loop.
    • Install a visible capture mechanism on every article. Sticky footer subscribe, mid-article inline form, or both. Pick one default format and ship it across every Claude-desk post first. Without a capture, every AI referral stays a stranger forever.
    • Add a “Related Claude posts” rail to every Claude article. Pages-per-session of 1.21 means the rest of the content library might as well not exist to any given reader. The homepage is the only page on the site that moves people inward. Rebuild article templates to behave the same way.
    • Treat /claude-student-discount and /anthropic-console like crown jewels. Keep them ruthlessly updated. Add FAQ schema. Add explicit Q&A blocks. Keep them in the LLM answer set.
    • Diversify the AI-citation base. Ship three to five new pages in the exact structural template of /claude-student-discount. Narrow, answer-first, scannable. Kill the concentration risk.
    • Consolidate the Cowork cluster. Fifteen pages, near-zero engagement, near-zero AI citations. Collapse to two or three flagships and redirect the rest.
    • Audit the Managed Agents pricing title mismatch. 68 path views, 39 title views. Something is rendering or logging inconsistently and it is worth a ten-minute investigation.

    Frequently asked questions

    What is a healthy returning-visitor rate for a media site?

    Most established publications see 25–40% returning visitors. tygartmedia.com currently runs at roughly 0.14%, which is essentially zero. The gap is not content quality — it is the absence of a capture mechanism to turn first-time readers into known subscribers.

    What percentage of page views should scroll to the bottom?

    The GA4 default scroll event fires at 90% page depth. Healthy content sites see 25–35% of users reach that threshold. tygartmedia.com is at 8.6%, which means either pages are too long for the intent they are arriving with, or a significant share of the traffic is non-human.

    How do you separate real readers from bots in GA4?

    The cleanest in-account signal is the user_engagement event. GA4 only fires it after roughly ten seconds of focused foreground time on the page. Dividing engaged users by total users gives you a rough “real human reader” estimate. On tygartmedia.com that ratio is 48%, so the real monthly audience is closer to ~675 readers than the reported 1,407.

    Why do 404 pages matter more when AI assistants are citing you?

    Because the LLM cannot tell when a URL goes dead. Once Claude, ChatGPT, or Perplexity has indexed a citation URL, it will keep recommending that URL to readers even after the page is moved or deleted. Every 404 from an AI referral is a permanently broken trust loop until the URL is restored or redirected.

    Why does a single crown-jewel page create concentration risk?

    When one URL is responsible for a double-digit share of total traffic and is the only page cited across multiple AI models, any change in the underlying topic — a policy shift by the product being covered, a competitor publishing a better page — can erase that traffic in a single week. The mitigation is to build multiple pages in the same structural template so citation volume is spread across several URLs rather than concentrated in one.

    What comes next

    The browser agent that dug all of this out is the same one we are turning into a repeatable audit any publisher can run against their own GA4. Parts 1, 2, and 3 together are the first real case study of what that audit looks like. The acquisition playbook is now documented. The retention fix is the next sixty days of work. We will publish the follow-up numbers when the fixes have had a chance to work — or not.

    If you want the catch-up: Part 1 — the AI-referral loop and Part 2 — the per-model citation playbook.

  • SEO is Dead, Long Live ‘Source-Worthy’ Content (SGE Reality Check)

    SEO is Dead, Long Live ‘Source-Worthy’ Content (SGE Reality Check)

    The Search Landscape of May 2026: Stop Chasing Traffic, Start Chasing Citations

    The transition is complete. As of this month, Google’s AI Overviews (formerly SGE) appear for over 52% of all search queries. If you are looking at your Search Console and seeing a 30% drop in informational traffic compared to last year, you aren’t alone. You’re simply seeing the result of the “Zero-Click” era reaching its final form. For digital agency owners and systems architects, the old SEO playbook is a liability. If you are still optimizing for clicks on “What is…” or “How to…” keywords, you are effectively donating your intellectual property to train a model that will replace your visit.

    The currency of search has shifted. We have moved from the era of link equity to the era of Source-Worthy Content. In this new reality, the goal isn’t to get the user to click through to read a basic definition; it is to ensure that your data, your unique perspective, or your proprietary methodology is the primary source cited by the Retrieval-Augmented Generation (RAG) systems powering Google, Perplexity, and OpenAI.

    The Numbers Don’t Lie: The Death of the Click

    By mid-2026, the data across our portfolio is clear. Informational query traffic—the top-of-funnel “educational” content that used to drive massive awareness—has cratered by 20-40% across most B2B and technical sectors. Users are getting their answers directly in the search interface. They don’t need to visit your site to learn “how to configure a headless CMS” if Gemini can pull the five essential steps from your documentation and present them in a neat bulleted list.

    However, while traffic is down, the value of a single citation within an AI Overview has skyrocketed. We’ve found that being the primary citation in a RAG-driven answer drives higher-intent leads than the old-school organic #1 spot ever did. The users who do click through from an AI Overview have already been pre-qualified by the AI. They aren’t looking for a definition; they are looking for the operator who provided the insight. Optimizing for AI overviews is no longer a side project; it is the core of technical SEO.

    Understanding RAG: How Google Picks Its Sources

    To win in 2026, you have to understand the mechanics of Retrieval-Augmented Generation. Google’s AI isn’t just “hallucinating” answers based on its training data; it is actively searching the live web, retrieving specific “chunks” of information, and then synthesizing those chunks into a response. This is RAG optimization.

    When an AI Overview is generated, Google’s system follows a three-step process:

    1. Retrieval: It identifies the top-ranking traditional search results for the query. (This is why maintaining traditional page-one rankings is still a prerequisite for being a source).
    2. Selection: It selects specific paragraphs, data tables, or unique insights from those top results that best satisfy the user’s intent.
    3. Generation: It rewrites those insights into a cohesive answer, adding citations to the sources it used.

    If your content is generic—if it says exactly what every other site says—the AI will synthesize the answer without citing you specifically, or it will cite a larger authority (like Wikipedia or a massive news outlet) that says the same thing. To be cited, your content must be source-worthy. It must provide something the AI cannot find elsewhere or synthesize from common knowledge.

    Why Generic Content is Erased by AI

    The era of “skyscraper” content—taking ten existing articles and making a longer one—is over. AI is better at that than you are. In fact, most of that generic content is now being flagged by LLMs as “low information gain.”

    When we audit a site using the Gemini CLI, we look for “Information Gain” scores. If a paragraph doesn’t offer a new data point, a specific case study result, or a unique operator’s perspective, it’s invisible to the RAG process. Generic advice like “SEO requires good keywords” is discarded. Specific advice like “We saw a 12% lift in RAG citations by moving from 1,000-word articles to 400-word modular content blocks” is source-worthy.

    The LLM wants to cite the originator. If you are just a curator, you are a middleman that the AI has successfully bypassed.

    The ‘Source-Worthy’ SEO Framework

    At Tygart Media, we’ve pivoted our Agency Playbook to focus on four pillars of source-worthy SEO. This is how we ensure our clients remain the “source of truth” in an AI-dominated search engine.

    1. Proprietary Data and “Proof of Work”

    The AI cannot hallucinate your internal data (yet). Original surveys, technical benchmarks, and project post-mortems are the most cited pieces of content in 2026. If you run a test on a new deployment pipeline and publish the raw numbers, Google’s AI Overview will cite your specific numbers. We’ve moved away from “opinion pieces” and toward “experiment logs.” Every article should contain at least one table or chart of data that didn’t exist on the internet before you published it.

    2. The Operator’s Perspective (E-E-A-T)

    Experience and Expertise are now the primary filters for RAG selection. Google is prioritizing content that shows “Proof of Effort.” Use first-person accounts. Instead of writing “How to use Claude Code,” write “What we learned after 500 hours using Claude Code to refactor a legacy Python monolith.” The specific failures and technical hurdles you describe are unique identifiers that the AI recognizes as authoritative.

    3. Modular Content Architecture

    Long-form, sprawling articles are difficult for RAG systems to “chunk” effectively. We are now building content in modular blocks. Each section of an article is designed to stand alone as a complete answer to a sub-query. We use <section> tags and specific ID attributes to make it easy for the crawler to identify and retrieve the exact block it needs. This is optimizing for AI overviews by making your content “consumable” for machines, not just humans.

    4. Structured Data for RAG

    Schema.org hasn’t gone away; it has become the metadata for AI. We use Dataset, HowTo, and Review schema more aggressively than ever. But more importantly, we are using Gemini CLI to auto-generate JSON-LD that specifically maps out the “Claims” made in our articles. By explicitly stating “Our claim: Informational traffic is down 30%,” we make it easier for the AI to attribute that fact to us.

    Technical Execution: Modular E-E-A-T and Gemini CLI

    The workflow for a modern agency operator involves high-level automation. We don’t manually audit 500 pages for “source-worthiness.” We use tools like Claude Code and Gemini CLI to process our content libraries.

    Our current stack for RAG optimization looks like this:

    • Analysis: We pipe our top-performing URLs through a script that uses the Gemini API to compare our content against the current AI Overview for that keyword. The script identifies “content gaps”—information the AI is providing that isn’t on our page, or information we have that the AI is ignoring.
    • Refactoring: If a page is losing traffic but has high “Source Worthiness,” we use Claude Code to refactor the HTML into a more modular structure, adding Dataset schema to any tables.
    • Validation: we use Antigravity to simulate how a RAG system would “chunk” the page. If the chunks are incoherent, we rewrite the headers to be more explicit.

    One failure we saw early in 2026 was attempting to “game” the AI by over-optimizing for specific keywords. The AI sees through keyword density. It is looking for semantic weight. When we tried to force-feed keywords, our RAG citation rate dropped. When we focused on “operator-restrained” technical clarity, the citations returned.

    Case Study: The 40% Traffic Drop and the 15% Lead Increase

    We recently worked with a systems architecture firm that saw their organic traffic from “cloud migration tips” fall by 40% in the google sge impact may 2026 rollout. Initially, there was panic. However, upon closer inspection, their “Request a Consultation” conversions were actually up by 15%.

    What happened? Their generic “tips” were being swallowed by the AI Overview. But the AI Overview was citing their specific “Cloud Migration Cost Calculator” and their “2025 Migration Failure Report.” The traffic they lost was the “looky-loos” who just wanted a quick tip. The traffic they gained (via the AI citations) was from CTOs who saw their specific data cited as the authority and clicked through to hire them. This is the shift from “volume” to “value.”

    Action Plan: What You’d Do Tomorrow

    If you are managing a content library or an agency portfolio, don’t wait for your traffic to hit zero. Start the pivot to source-worthy SEO immediately. Here is the operator’s checklist for tomorrow morning:

    1. Audit for “What is” Content: Use your preferred crawler to identify every page that targets a purely informational, definitional keyword. These are your “donor” pages. Decide whether to delete them, consolidate them, or upgrade them with proprietary data.
    2. Inject Original Data: Find three pieces of internal data—even if they are small—and add them to your top 10 most important pages. Use tables. Add a “Methodology” section.
    3. Modularize Your Headers: Ensure every H3 in your articles can stand alone as a question and every following paragraph as a direct, concise answer. Remove the “fluff” and the “introductory transitions.” The AI doesn’t need a “In this section, we will explore…” lead-in. It needs the facts.
    4. Verify Citations: Perform a manual search for your primary keywords. Look at the AI Overview. If you are ranking #1-3 in organic but aren’t cited in the AI response, your content isn’t “Source-Worthy.” It’s too generic. Rewrite the top-ranking paragraph to offer a unique, data-backed perspective that the AI is currently missing.
    5. Update Your Schema: Move beyond basic Article schema. Implement Speakable, Dataset, and ClaimReview schema where applicable. Use a tool like Gemini CLI to automate the generation of these blocks based on your existing text.

    SEO isn’t dead; the middleman is dead. The search engine of 2026 doesn’t want to send users to a website; it wants to provide an answer. Your job is to be the only source that the answer cannot exist without. Build for the machine, provide for the human, and protect your intellectual property by making it too specific to be ignored.

  • How to Get Cited in ChatGPT Search in 2026: The Bing Index, OAI-SearchBot, and the 15% Citation Cliff

    How to Get Cited in ChatGPT Search in 2026: The Bing Index, OAI-SearchBot, and the 15% Citation Cliff

    ChatGPT Search cites 15% of the pages it retrieves. The other 85% get pulled into the model’s context window, evaluated, and silently discarded — no visibility, no referral, no trace. If you are doing GEO work and your pages keep getting retrieved but never quoted, you are losing at the second filter, not the first.

    This is the 2026 implementation guide for surviving both filters: getting retrieved by ChatGPT Search, then getting cited once you are there.

    How ChatGPT Search Actually Builds an Answer

    ChatGPT Search runs a three-stage pipeline. Each stage kills most candidates.

    1. Retrieval — ChatGPT Search is powered by Bing’s index for real-time web retrieval. Seer Interactive’s analysis found 87% of SearchGPT citations match Bing’s top results, with the bulk in positions one through ten and a long tail in positions eleven through twenty. AirOps research separately put ChatGPT-to-Bing overlap at 73%. If you are not in Bing’s top 20 for a query, you almost certainly are not in ChatGPT’s candidate set.
    2. Crawlability check — OpenAI’s OAI-SearchBot is the user agent that builds the index used for ChatGPT’s search features. It is separate from GPTBot (training) and ChatGPT-User (browsing). Block OAI-SearchBot in robots.txt and you remove yourself from ChatGPT Search entirely, even if Bing has you ranked.
    3. Citation selection — Of the pages retrieved, AirOps found ChatGPT cites only 15%. The model picks what to quote based on structure, freshness, authority signals, and whether the page directly answers the query.

    Step 1: Verify You Are Indexed by Bing

    Most sites optimized for Google have never logged into Bing Webmaster Tools. Fix that first. Three checks before anything else:

    • site:yourdomain.com in Bing — confirms basic indexing.
    • Bing Webmaster Tools → URL Inspection — confirms the specific pages you want cited are indexed and have no crawl errors.
    • Bing rankings for your target queries — if you are not in the top 20 in Bing, ChatGPT will not see you.

    If pages are missing, submit a sitemap via Bing Webmaster Tools and request URL inspection on any priority page. Bing typically reflects changes within 24–72 hours, faster than Google.

    Step 2: Allow OAI-SearchBot in robots.txt

    The single most-skipped step in GEO work. Add this block to your robots.txt:

    # Allow ChatGPT Search to retrieve and cite this site
    User-agent: OAI-SearchBot
    Allow: /
    
    # Optional: allow on-demand browsing for ChatGPT users
    User-agent: ChatGPT-User
    Allow: /
    
    # Optional: block training crawler if you want retrieval without training
    User-agent: GPTBot
    Disallow: /

    OpenAI publishes these three user agents and treats each independently. You can allow OAI-SearchBot for ChatGPT Search visibility and still disallow GPTBot from using your content for model training. The settings do not conflict. OpenAI’s systems typically recognize robots.txt changes within 24 hours.

    Step 3: Structure Pages for the Citation Filter

    Retrieval is necessary but not sufficient. Once your page is in the candidate set, the model decides whether to quote it. Pages that get quoted share a structural pattern.

    Direct answers in the first 100 words

    ChatGPT cites sources that answer the question fully. Partial answers lose to complete ones. Lead each page with a clean direct-answer paragraph: question implied or stated, answer in the next sentence, supporting detail after. This is the same pattern that wins featured snippets, which is not a coincidence — answer engines and snippet engines reward the same structure.

    JSON-LD schema

    An AirOps study of 548,534 pages found pages with JSON-LD markup posted a 38.5% citation rate versus 32.0% without it. Article, FAQPage, and HowTo schema are the highest-leverage types. Add them.

    Word count: 500–2,000

    Pages between 500 and 2,000 words performed best in the same AirOps study. Pages longer than 5,000 words were cited less often than pages under 500. The mechanism is mechanical: long pages overflow the retrieval context window, and the model defaults to shorter, denser sources it can quote in full.

    Freshness

    Content updated within 30 days received 3.2x more citations than older material. The fix is not faked freshness — it is genuine updates: a new stat, a new case, a corrected claim. Update the date when you update the content, not before.

    Step 4: Build the Authority Layer

    Structure gets you cited once. Authority gets you cited repeatedly. AirOps found sites with over 32,000 referring domains are 3.5x more likely to be cited by ChatGPT than sites with fewer than 200. You do not need 32,000 — you need to be in the upper band of your topical neighborhood.

    ChatGPT’s citation pattern leans heavily on Wikipedia (roughly 48% of top citations in multiple studies) and large news/media properties. The practitioner read on that: ChatGPT favors sources with multi-source third-party validation. Build the kind of citations on the open web that Wikipedia editors accept — peer-reviewed studies, primary sources, named author attribution, transparent methodology.

    Step 5: Track Your Citation Footprint

    You cannot manage what you do not measure. The minimum tracking stack for 2026:

    • Server log monitoring for OAI-SearchBot user agent — confirms OpenAI is actually crawling. If you allowed the bot in robots.txt three weeks ago and there are zero OAI-SearchBot hits in your logs, something is wrong (CDN block, IP firewall, misconfigured allow rule).
    • Manual citation audits — pick 10 priority queries, run them in ChatGPT with the Search toggle on, log which domains get cited. Repeat weekly. A spreadsheet beats no tracking.
    • Bing position tracking — because ChatGPT pulls from the Bing index, Bing rankings are a leading indicator. If your Bing position drops, ChatGPT visibility drops behind it.

    The Practitioner Summary

    Ranking in ChatGPT in 2026 is not mysterious. It is a four-gate funnel: Bing index → OAI-SearchBot crawl access → retrieval into the candidate set → citation selection. Most sites fail at gate one (not indexed in Bing) or gate two (OAI-SearchBot blocked or not addressed). Sites that clear those two gates and write pages that answer the question fully, with schema and a 500–2,000-word range, will land in the 15% that get quoted.

    Treat ChatGPT Search like a separate search engine that happens to share an index with Bing. Optimize for the index. Allow the crawler. Write the page. The rest follows.

  • Is Anything Actually Fetching Your llms.txt? A Server-Log Verification Method

    Is Anything Actually Fetching Your llms.txt? A Server-Log Verification Method

    You shipped an llms.txt file. You curated the links, you paired it with robots.txt, you validated the format. Now answer the only question that matters: is anything actually requesting it? Most site owners never check — and the data from 2026 suggests the honest answer, for most domains, is “almost nothing.” This is the verification step that turns llms.txt from an act of faith into a measurable signal. Here is how to read your own server logs and find out exactly what is fetching the file you published.

    Why verification matters more than the file itself

    The uncomfortable finding of the last year is that publishing llms.txt and benefiting from llms.txt are two different things. In OtterlyAI’s 90-day crawler study, only 0.1% of AI crawler requests touched /llms.txt at all — 84 requests out of 62,100 total AI bot visits — and the file received far fewer visits than the average content page (OtterlyAI GEO study). As of Q1 2026, no major AI company — OpenAI, Google, Anthropic, Meta, or Mistral — has publicly committed to reading or acting on llms.txt in production systems, though GPTBot does fetch the file occasionally (AEO Engine).

    That does not make the file worthless. It makes measurement the whole game. If you cannot tell whether a crawler ever requested the file, you cannot tell whether your time was wasted, whether a platform quietly started honoring it, or whether your file is returning a silent 404. Verification is the difference between strategy and superstition.

    The five-minute server-log check

    Every fetch of your llms.txt file leaves a row in your access log. The job is to isolate requests to that path, then filter by the user-agents that belong to AI systems. On any server with standard combined-format Apache or Nginx logs, this one-liner does the first pass:

    grep -E "/llms(-full)?\.txt" /var/log/nginx/access.log | \
      grep -E -i "GPTBot|OAI-SearchBot|ChatGPT-User|ClaudeBot|Claude-User|Claude-SearchBot|PerplexityBot|Perplexity-User|Google-Extended|Google-CloudVertexBot|Amazonbot|CCBot|Applebot|meta-externalagent|MistralAI-User|bingbot"

    The first grep narrows to requests for llms.txt or llms-full.txt. The second filters to the known AI crawler user-agent strings documented across 2026 reference work (No Hacks AI User-Agent Landscape 2026; Momentic crawler list). Each surviving line tells you three things: which bot, what time, and the HTTP status code it received.

    That status code is the part people skip. A 200 means the bot got your file. A 404 means you have been congratulating yourself over a file the crawler never actually reached — a misconfigured path, a redirect loop, or a build step that drops the file on deploy. A 301 or 302 means it is being redirected, and not every crawler follows redirects for this path. Read the status column before you read anything else.

    Turn the raw hits into a monthly cadence table

    One grep tells you whether the file is reachable. To know whether anything is changing, you need the same query run on a schedule and counted by bot. Extend the pipeline to a count:

    grep -E "/llms(-full)?\.txt" /var/log/nginx/access.log* | \
      grep -E -i -o "GPTBot|ClaudeBot|PerplexityBot|Google-Extended|bingbot|Amazonbot|CCBot|Applebot" | \
      sort | uniq -c | sort -rn

    This produces a leaderboard of which AI user-agents requested your llms.txt across all retained logs. Capture that number on the first of each month and you have a cadence series. The signal you are watching for is not the absolute count — it will be small — but the direction: a bot that appears for the first time, a bot whose hit count jumps, or a bot that goes silent. Those inflection points are the leading indicators that a platform has changed how it treats the file.

    What you see in the log What it means Action
    No requests to /llms.txt at all File may be unreachable, or simply not yet fetched — both are common Request the URL yourself; confirm a clean 200 before assuming neglect
    200 from GPTBot, low frequency Consistent with reported behavior — GPTBot fetches occasionally Log the cadence; treat as baseline, not a ranking signal
    404 or 301 on the path Crawler is not getting the file you think you published Fix the path/redirect today — this is a silent failure
    A new bot appears month-over-month A platform may have started fetching the file Note the date; correlate with any citation or referral changes

    Cross-check against your content fetches

    The llms.txt hit count means little in isolation. Compare it against how often the same bots fetch your actual content pages. If GPTBot pulls forty content URLs a day and never touches llms.txt, the file is not part of how that crawler discovers you — your content’s own structure and internal linking are doing the work. The practical monitoring approach documented for 2026 is exactly this: a server-log dashboard built against the major user-agents, watching cadence and path-preference shifts month over month (Digital Applied 30-day log study). The same study notes distinct personalities worth knowing — GPTBot crawls more aggressively than most assume, ClaudeBot is more patient than its volume suggests, and PerplexityBot is quieter than its share-of-voice would predict.

    What to do with the answer

    If your logs show the file is reachable and occasionally fetched, you are in the normal range for 2026 — keep the file current and keep measuring. If they show a 404, you found a real bug that no amount of curation would have fixed. And if they show a brand-new bot starting to request the path, you have spotted a platform behavior change before the blog posts catch up to it. That last case is the entire payoff: the practitioners who read their own logs will know the standard started mattering weeks before the ones who only read about it. Verification is not the boring final step of an llms.txt rollout. On a standard that nobody has formally committed to honoring yet, it is the only step that produces evidence instead of hope.

  • LSAs vs Google Ads vs SEO for Restoration Companies in 2026: The Channel Comparison Vendors Won’t Show You

    LSAs vs Google Ads vs SEO for Restoration Companies in 2026: The Channel Comparison Vendors Won’t Show You

    If you own a restoration company in 2026, your marketing budget is being eaten alive by three channels fighting for the same lead: Google Local Services Ads, Google Search Ads, and SEO. The owners I talk to are spending six figures a year and still can’t tell me, with a straight face, which channel is actually paying them. So let’s settle this with the numbers vendors don’t put in their pitch decks.

    The water damage CPC is the most expensive in home services

    Reported cost-per-click for top water damage restoration keywords has climbed as high as the $200–$250 range in competitive metros, with industry sources citing top-of-page bids reaching around $250 per click for terms like “water damage restoration [city].” Average emergency restoration keywords more commonly land in the $40–$100 CPC range depending on geography and time of day. That is not a typo. A single click — not a lead, not a job — can cost more than most contractors charge for a furnace tune-up.

    The reason owners keep paying it is simple. A water mitigation job typically prices in the $3,000–$15,000+ range depending on category and scope. At those ticket sizes, a $300 cost-per-lead and a 25% close rate still pencils out. But “pencils out” is doing a lot of heavy lifting in that sentence — and that’s where most owners stop running the math.

    The three channels, ranked by what they actually do

    Google Local Services Ads (LSA): the most consistent ROI lever right now

    LSA cost-per-lead in restoration is widely reported in the $80–$180 range for water damage, with mold remediation reported between roughly $60 and $250 depending on market. Conversion rates from lead to booked job tend to be reported around the 10–15% range — higher than standard Google Search Ads — because Google charges per qualified phone call or message, not per click.

    The bottom line on LSAs: if you do not have Google Guaranteed status set up and your service area dialed in, this is the first thing you fix this quarter. The catch nobody mentions: Google ended the credit policy for “job type not serviced” and “geo not serviced” disputes in 2025, meaning junk leads now come out of your pocket with no refund pathway. You have to dispute aggressively on the categories Google still credits, or your effective CPL drifts 15–25% higher than the platform number says it is.

    Google Search Ads (PPC): the channel you run when you have no other choice

    Average reported cost-per-lead for Google Search Ads in restoration falls in the $150–$400+ range, with the high end concentrated in metros with two or more national franchise advertisers bidding against you. Conversion from click to lead in well-managed accounts typically lands in the 5–10% range — half of what LSAs deliver.

    PPC has one thing LSAs don’t: control. You set the keywords, you set the geo, you set the ad copy, you decide whether you want commercial water damage leads or residential mold leads or fire restoration leads. If you are running a multi-location shop or chasing commercial work specifically, you cannot live on LSAs alone — the lead types are too restricted. But if you are a single-location residential operator, every dollar in PPC should be earning its keep against the LSA dollar, and most of the time it isn’t.

    SEO: the long-term asset everyone wants to own and almost nobody finishes building

    Cost-per-lead from established organic rankings is commonly reported in the $75–$150 range — roughly half the cost of paid channels at maturity. The trade-off is time. Restoration SEO in competitive metros typically takes 12–18 months of consistent investment before it produces meaningful lead flow, with initial signal in 3–6 months for low-competition local terms.

    The honest read: most restoration owners start SEO, get impatient at month four when paid channels are still doing all the work, and either fire the agency or stop publishing content. Then they restart 18 months later with a different vendor and the same outcome. SEO works. It works exactly the way the calendar says it will work. The companies that win with it are the ones who treat it like a 24-month commitment, not a 90-day experiment.

    What the channel mix should actually look like

    For a residential-focused restoration company doing $1M–$5M in revenue, a defensible channel mix in 2026 looks something like this:

    • LSA: 35–45% of paid budget. Highest reported ROI of any paid channel in restoration. Cap is the daily lead volume Google will give you, not the budget.
    • Google Search Ads: 25–35% of paid budget. Covers the lead types LSAs cannot serve — commercial work, specific service lines, and overflow when LSAs hit daily caps. Required for any multi-location shop.
    • SEO and content: 20–30% of total marketing budget. Treat as 18–24 month asset build. Tracked separately from paid CPL because the unit economics only stabilize at month 12+.
    • Referrals and direct outreach: ongoing, no fixed budget. Reported industry-wide as the lowest-CAC channel and the one with the shortest break-even window. Build a plumber/agent/property manager referral program before you spend another dollar on paid ads.

    The split that gets restoration owners in trouble is putting 80% into paid and 20% into “we’ll get to it” SEO. Two years later they are completely dependent on Google’s auction prices, and the auction prices have gone up every year of the last five.

    The metric that actually matters

    Cost-per-lead is the metric every vendor reports. It is the wrong number to optimize for. The number that matters is fully-loaded cost-per-acquired-job, which is CPL divided by your channel-specific close rate, plus the labor cost of the CSR who fielded the call, plus the credit card processing on whatever portion of the job is paid out-of-pocket, minus the franchise or TPA fee if applicable.

    Most restoration owners do not have this number for any of their channels. They have CPL from the platform dashboards, they have revenue from the job management software, and the two systems have never talked to each other. Fix that before you change a single bid. The owner who knows their fully-loaded acquired-job cost by channel makes better decisions in five minutes than the owner who doesn’t makes in a quarter.

    The bottom line

    LSAs are the highest-ROI paid channel in restoration in 2026 and should be the first lever you optimize. Google Search Ads are required for any operator chasing commercial work or running multiple locations, but they should never be your largest line item. SEO is the long-term insurance policy against rising auction prices, and the only restoration owners who get the payoff are the ones who treat it like a 24-month commitment and refuse to flinch at month six.

    If you are spending more than $5,000 a month on Google Search Ads and you do not yet have LSAs set up, you are leaving the most profitable channel in restoration on the table. Start there.

    Frequently Asked Questions

    What is the average cost per lead for water damage restoration in 2026?

    Reported cost-per-lead for water damage restoration in 2026 ranges from roughly $80–$180 on Google Local Services Ads, $150–$400+ on Google Search Ads, and $75–$150 from mature organic SEO. Actual costs vary significantly by metro, competition, and lead-type mix.

    Are Google Local Services Ads better than Google Ads for restoration?

    For most residential restoration operators, LSAs deliver a lower cost-per-lead and a higher reported lead-to-job conversion rate than standard Google Search Ads. LSAs charge per qualified call rather than per click, which is why the ROI tends to be more consistent. Multi-location shops and commercial-focused operators still need Google Search Ads to cover lead types LSAs do not serve.

    How long does SEO take to work for a restoration company?

    Restoration SEO in competitive metros typically takes 12–18 months of consistent investment before it produces meaningful lead flow. Initial ranking signal often appears in 3–6 months for low-competition local terms, but the cost-per-lead advantage versus paid channels only stabilizes after month 12.

    What percentage of a restoration marketing budget should go to paid ads?

    A common defensible split for a residential restoration company in 2026 is roughly 60–70% of total marketing budget on paid channels (LSA + Google Search Ads) and 20–30% on SEO and content, with referral programs running in parallel at minimal incremental cost. Going above 80% paid concentrates risk in the Google auction.

  • GEO Case Studies Teardown: What 5 Published Wins Reveal About Generative Engine Optimization in 2026

    GEO Case Studies Teardown: What 5 Published Wins Reveal About Generative Engine Optimization in 2026

    If you want to know whether generative engine optimization actually moves the needle, stop reading think pieces and look at what shipped. The case-study record from 2025 and early 2026 is now thick enough to draw practitioner conclusions: which interventions correlate with citation lift, how fast the curve bends, and what the conversion side of the funnel does once AI traffic shows up. This is a working teardown of the published case studies — what was done, what changed, and what the implementation pattern looks like underneath.

    Case 1: B2B SaaS — 575 to 3,500 AI-referred trials in roughly seven weeks

    A $30M+ ARR B2B SaaS company documented in Digital Agency Network’s GEO case study roundup moved from 575 AI-referred free trials per period to over 3,500 in about seven weeks. The intervention sequence was content restructuring for citability — clear one-sentence definitions at the top of each section, statistics and comparisons rendered as tables rather than buried in prose, and step-by-step frameworks that LLMs can extract verbatim. The first 40–60 words under every H2 carried the answer to that H2’s implicit question.

    The implementation pattern under this win is what matters: the company did not write new articles. It rebuilt existing articles to surface the answer first. That is the cheapest possible GEO intervention — restructure, do not republish.

    Case 2: B2B SaaS — citation rate from 8% to 12% in four weeks

    Discovered Labs documented a B2B SaaS case where ChatGPT citation rate on tracked queries moved from 8% to 12% by week four of an engagement, with the company’s VP of Marketing noting they had been “invisible for 18 months despite solid SEO work.” The 50% relative lift came from the same restructuring pattern plus aggressive entity-binding — explicit company name, product name, and category definition repeated in citation-friendly positions throughout each asset.

    The data point worth carrying: traditional SEO authority does not automatically translate to LLM citation. The two systems read pages differently, and the page-level rewrite is what closes the gap.

    Case 3: CloudEagle — 33 pages optimized, 33% increase in AI citations

    CloudEagle’s published GEO result, cited across multiple 2026 case study summaries including AlphaP’s real-world GEO examples, is one of the cleanest dose-response curves in the public record. Optimize 33 pages → 33% increase in AI citations. The ratio is suspicious as a coincidence but tells the practitioner the right thing: GEO is a per-page intervention, and aggregate lift scales roughly with how many pages you treat. There is no site-wide tag you can flip. Each asset gets its own restructure.

    Case 4: HubSpot — template rebuild, not content rebuild

    HubSpot’s internal AEO case study, summarized in HubSpot’s own AEO case study writeup, is the cleanest illustration of the structural fix. HubSpot already ranked for thousands of marketing queries — the volume was there. The barrier was that answers were buried multiple paragraphs deep, written in traditional long-form. The fix was a template rebuild: every article restructured so the first 40–60 words under each H2 or H3 directly answered the implicit question of that heading.

    This is the playbook to copy. If your site has any existing traffic, restructuring beats writing new content. The audit question is: under every H2 on every page, do the first three sentences answer the question that H2 raises?

    Case 5: Netpeak USA — 120% revenue lift, 693% AI traffic growth

    Stackmatix’s AEO case study compilation documents Netpeak USA’s conversational ecommerce GEO campaign producing +120% revenue and +693% AI traffic growth. The mechanism: product and category pages restructured around buyer questions (“what is the best X for Y?”, “X vs Y comparison”, “how do I choose X?”) with direct, hedged answers up top and detailed reasoning below. The pattern works because AI search engines synthesize buying decisions from extractable answer fragments, and ecommerce pages historically bury the answer under marketing copy.

    The structural pattern under every win

    Read the five cases together and one implementation discipline emerges. Every published GEO win in the public record traces back to the same physical change to the page:

    1. Answer first. The first 40–60 words under every H2 directly answer the question that heading raises. No setup, no transition paragraph, no scene-setting.
    2. Tables over prose for comparison data. Articles with 15+ data points receive measurably more AI citations than those with fewer than five, per the research synthesized in Marketing LTB’s 2026 GEO statistics roundup. Tables make those data points extractable.
    3. Entity binding. Company name, product name, and category definition explicitly stated in citation-friendly positions, not just implied through context.
    4. Stepwise frameworks. Procedural content rendered as numbered steps that LLMs can extract verbatim into responses.
    5. Citable sources inline. Authoritative external citations placed adjacent to claims, not banished to a references section at the bottom.

    What the cases do not prove

    The published record has selection bias the size of a building. Every case study you can read is a published win. The agencies and SaaS companies that ran a GEO campaign and got nothing are not writing blog posts about it. Read the cases for the structural patterns, not the percentage lifts — the lifts are a function of starting baseline, vertical, and how invisible the brand was before the intervention.

    Two other limits worth naming. First, conversion-rate claims about AI-referred traffic (“converts at a higher rate than organic” appears in over half of marketer surveys per the 2026 HubSpot State of Marketing report) come from self-reporting, not third-party measurement. The directional point is probably right — qualified intent behind an LLM query — but the magnitude is unverifiable. Second, AI citation rates are measured against the agencies’ own tracked query sets. Those sets are chosen for relevance to the client, which means baseline visibility is artificially low. The 8% → 12% case is real; whether it generalizes to a random query set is unknown.

    What to do tomorrow if you are starting from zero

    Pick ten pages on your site that already rank in positions 4–15 for queries with commercial intent. Open each one. Under every H2, rewrite the first 40–60 words so they directly answer the question that heading raises. Convert any prose comparison into a table. State your company name, product category, and the specific problem you solve in the opening paragraph. Add a sources list with authoritative citations.

    That is the intervention every published GEO case study reduces to. Ten pages, one week of writing work. The case study record suggests you will see citation movement in three to six weeks if the queries you care about already have AI Overview or LLM citation surface area at all. If they do not, the intervention is still right — you are positioning for when they do.

    FAQ

    How long until GEO interventions show measurable lift?

    Published cases show citation movement at the four-week mark (the 8% → 12% B2B SaaS case) and traffic movement at six to eight weeks (the 575 → 3,500 trials case at roughly seven weeks). Three months is the standard window quoted in agency case studies for material citation rate change.

    Does traditional SEO authority help GEO?

    Partially. Pages that already hold featured snippets are disproportionately pulled into Google AI Overviews, per multiple 2026 AEO summaries. But the B2B SaaS case where the company was “invisible for 18 months despite solid SEO work” shows that authority alone does not produce citations — page-level structural changes are the missing ingredient.

    How many pages do I need to optimize before I see results?

    CloudEagle’s case (33 pages → 33% citation lift) suggests the dose-response is roughly linear at small scale. Most published case studies show meaningful aggregate movement starting around 10–30 pages restructured. Below that, you are testing the methodology rather than expecting measurable lift.

    Is the citation rate lift actually translating to revenue?

    The published evidence says yes for ecommerce (Netpeak USA’s +120% revenue) and trial-driven SaaS (the 575 → 3,500 trials case). For brand and consideration-stage content the answer is murkier — AI citations probably influence brand recall and assisted conversion, but the attribution chain to revenue is harder to draw cleanly and the case study record is thin on this slice.

    What is the cheapest GEO intervention with the highest published return?

    Restructuring existing pages that already rank. The HubSpot template rebuild and the 575 → 3,500 trials case both used this approach. No new content, no new authority work, no link building — just rewriting the first 40–60 words under every H2 and converting prose comparisons into tables.

  • How to Measure LLM Visibility in 2026: The GA4 + Response-Side Stack

    How to Measure LLM Visibility in 2026: The GA4 + Response-Side Stack

    Traditional analytics platforms can’t see the most important impression you’re making in 2026. When a user asks ChatGPT, Perplexity, Gemini, or Claude about your category, your brand either shows up in the answer or it doesn’t — and your GA4 dashboard has no idea either way. This is the measurement blind spot at the center of generative engine optimization. If you can’t measure LLM visibility, you can’t optimize for it.

    This guide walks through the measurement stack that actually works in 2026: the GA4 channel grouping that catches AI referral traffic, the manual verification protocol that costs nothing, and the dedicated LLM visibility platforms that automate prompt monitoring at scale. By the end, you’ll have a measurement framework you can run starting today.

    Why GA4 alone is not enough

    Standard web analytics measures what happens after the click. LLM visibility is what happens before the click — or instead of one. According to widely cited industry reporting, a large share of AI search sessions end without the user ever clicking through to a source, which means the brand impression inside the AI response is often the only impression you get. GA4 cannot see that impression. It cannot see when ChatGPT recommends you in a comparison. It cannot see when Perplexity cites your article as a source for an answer.

    You still need GA4 — AI referral traffic is real, growing, and converts well — but you need it as one layer of a two-layer stack. Layer one is referral-side measurement, which captures the users who actually click through from AI platforms. Layer two is response-side measurement, which monitors what AI platforms are saying about you whether anyone clicks or not.

    Layer one: catching AI referrals in GA4

    GA4 does not have a built-in “AI” channel. By default, traffic from ChatGPT, Perplexity, Claude, and Gemini gets bucketed into the generic Referral channel, where it disappears next to social and partner sites. The fix is a custom channel group that uses a referrer regex to peel AI traffic out into its own bucket.

    In GA4, go to Admin → Data Settings → Channel Groups, create a custom channel group, and add a new rule above the default Referral rule. Set the conditions to Source matches regex and use a pattern like this:

    chatgpt\.com|openai\.com|perplexity\.ai|claude\.ai|anthropic\.com|gemini\.google\.com|copilot\.microsoft\.com|deepseek\.com|you\.com|meta\.ai|poe\.com

    The order matters. Your AI Traffic rule must sit above the Referral rule in the priority list, or AI traffic will be captured by Referral first and never reach your custom channel. Once the rule is live, you can build Explorations that segment AI traffic by source, page, conversion rate, and engagement time — and compare that segment against organic, direct, and social.

    The referrer attribution gap

    One caveat: not every AI click passes a referrer. ChatGPT’s free tier in particular has been reported to strip referrer headers in many configurations, meaning a meaningful share of ChatGPT traffic shows up as Direct in GA4 rather than as a chatgpt.com referral. This is a known limitation across the industry. Treat your AI referral numbers as a floor, not a ceiling, and use response-side monitoring to fill in the gap.

    Layer two: response-side monitoring

    This is the measurement that traditional SEO never needed. You’re no longer just asking “did anyone visit?” — you’re asking “what is the AI saying about me?” There are two ways to answer that question.

    The manual verification protocol

    The free, no-tool approach is a structured query log. Build a list of 15 to 25 prompts that a buyer in your category would realistically type into an AI assistant. Be specific. “Best CRM for small B2B teams” is a prompt. “What is a CRM” is not — that’s a research query, not a buyer query.

    Once a week, run every prompt through each AI platform you care about — typically ChatGPT, Perplexity, Gemini, and Claude — and record three things per query: whether your brand was mentioned, whether your domain was cited as a source, and what position your brand appeared in if it was named alongside competitors. A simple spreadsheet with prompt, date, platform, mention (yes/no), citation (yes/no), and position is enough to start. Week-over-week deltas on this sheet will tell you whether your GEO and AEO work is moving the needle.

    This is slow and manual but it’s the only method that gives you ground truth. The dedicated platforms below are essentially automating this protocol — running the same kind of prompt log against the same APIs on a daily schedule. If you’re under $1,000/month in marketing spend, run it manually. If you’re past that, automate it.

    Dedicated LLM visibility platforms

    A new category of tools emerged in 2025 and matured in 2026 specifically to monitor LLM responses. They all do roughly the same thing — run your target prompts daily across multiple AI engines, score visibility, track which sources the AIs cite, and surface competitor gaps — but they segment by price point.

    At the budget end, Otterly.AI offers monitoring plans starting around $29/month, with a Share of AI Voice metric and time-to-first-data of under ten minutes after signup. It’s the simplest entry point for teams that just want a citation-frequency dashboard. In the mid-market, Peec AI starts around €89/month and emphasizes multilingual coverage and actionable recommendations — it doesn’t just tell you you’re invisible, it suggests what to change. At the enterprise tier, Profound starts around $499/month and adds Prompt Volumes, which estimates real AI search demand by topic with demographic breakdowns. SOC 2 compliance and dedicated onboarding generally start at the $1,000+ enterprise tiers across this category.

    Other platforms in active use this year include Semrush’s AI Toolkit, SE Ranking’s SE Visible, Goodie AI, Rankscale, Nightwatch, AirOps, and Searchable. The category is moving fast — pricing and features change quarterly — so verify the current state of any platform before committing.

    The six KPIs to track

    Whatever measurement stack you use, the same handful of metrics will tell you whether GEO is working. Organize them into leading and lagging indicators:

    Leading indicators (response-side, change first):

    • Mention Rate — the percentage of monitored prompts where AI responses mention your brand name. This is the broadest signal.
    • Citation Rate — the percentage of monitored prompts where your domain is cited as a source, not just named. Citation is stronger than mention because it implies the AI is treating your content as authoritative.
    • Position — when your brand is named alongside competitors, where in the list does it appear. First-named brands get disproportionate attention.

    Lagging indicators (referral and revenue-side, change later):

    • AI Referral Sessions — total sessions from your AI Traffic channel group in GA4.
    • AI Referral Engagement — engagement rate and average engagement time for the AI segment, compared to organic. Strong AI referral traffic typically engages longer because the user arrived with intent already framed by the AI.
    • AI-Influenced Conversions — conversions where AI was part of the attribution path, even if not the last touch.

    Tier-one metrics move first because content changes affect what AIs say within days to weeks. Tier-two metrics lag because they require enough traffic to be statistically meaningful, which can take a quarter or more to develop.

    The minimum viable setup

    If you do nothing else this week, do these three things:

    1. Add the AI Traffic channel group to GA4 using the regex above and move it above Referral in priority.
    2. Build a 15-prompt spreadsheet of buyer-intent queries for your category and run them once across ChatGPT, Perplexity, Gemini, and Claude. Record mention, citation, and position.
    3. Set a calendar reminder to repeat step two every Friday for four weeks. After four weeks you’ll have a real trendline.

    That setup costs nothing and produces the measurement layer that lets you tell whether your GEO, AEO, and LLMs.txt work is actually compounding — or whether you’re guessing. Once the trendline is stable, evaluate whether automating with Otterly, Peec, or Profound is worth the spend. For most operators, the manual protocol gets you 80% of the insight at 0% of the budget.

    Frequently Asked Questions

    What is LLM visibility?

    LLM visibility is the measurement of how often, and how prominently, a brand or website appears in responses generated by large language models like ChatGPT, Perplexity, Gemini, and Claude. It is the response-side counterpart to traditional search ranking — instead of measuring where you appear in a results page, you’re measuring whether AI assistants mention or cite you when answering questions in your category.

    Can GA4 track AI traffic from ChatGPT and Perplexity?

    GA4 can track AI referral clicks if you create a custom channel group with a referrer regex matching AI domains and place it above the default Referral rule. It cannot track impressions inside AI responses where the user doesn’t click through, and ChatGPT’s free tier often strips referrers entirely, so a portion of AI traffic still lands in Direct. Treat GA4 numbers as a floor.

    What is the difference between mention rate and citation rate?

    Mention rate measures the percentage of monitored AI prompts where your brand name appears anywhere in the response. Citation rate measures the percentage where your specific domain or URL is referenced as a source. Citation is a stronger signal because it indicates the AI is treating your content as authoritative, not just naming you in passing.

    Which LLM visibility tool should I use in 2026?

    For budget-conscious teams, Otterly.AI starts around $29/month and gets you to first data in minutes. For mid-market needs with multilingual coverage and recommendations, Peec AI starts around €89/month. For enterprise teams that need prompt-volume demand data and SOC 2 compliance, Profound starts around $499/month. Verify current pricing before purchasing — the category moves quickly.

    How often should I check my LLM visibility?

    For manual tracking, weekly is the right cadence — frequent enough to catch movement, infrequent enough to avoid noise. Dedicated platforms typically run automated checks daily and let you review weekly. Don’t expect day-to-day stability; AI responses have inherent variance, so look at week-over-week and month-over-month trends rather than single data points.