Category: AI Search Measurement & Tools

  • How to Measure LLM Visibility: The Complete Tracking Stack for 2026

    How to Measure LLM Visibility: The Complete Tracking Stack for 2026

    Most SEO teams know they need to care about AI search. Almost none of them have a measurement system in place for it. That’s the gap this article closes.

    Ranking in ChatGPT, Perplexity, Google AI Overviews, or Claude isn’t a vanity metric anymore — it’s a traffic channel. But unlike Google, AI systems don’t serve a results page you can screenshot. They weave citations into prose. Your brand either shows up in that prose or it doesn’t, and if you’re only watching GA4’s built-in channel reports, you’re flying mostly blind.

    This is a practitioner’s setup guide: the exact metrics, GA4 configuration, and tool stack needed to track LLM visibility systematically.

    The Five Metrics That Define LLM Visibility

    Traditional SEO tracks ranking position, impressions, and clicks. None of those exist in AI search. You need a new metric set:

    Citation frequency — How often your domain or brand is mentioned in AI-generated answers for your target query set. LLMs typically cite 2–7 sources per response. Capturing one of those slots consistently is the entire game.

    Prompt coverage — Out of your tracked prompt library, what percentage of prompts return your brand at all? Calculate it as: (prompts where you appear ÷ total tracked prompts) × 100. A brand actively optimizing for AI search should be above 40% coverage on tier-1 prompts within 90 days of focused content work.

    Share of voice — For a given topic cluster, how often do AI answers cite you versus competitors? If you appear in 12 of 30 tested prompts and a competitor appears in 20, they hold 67% share of voice on that topic. That ratio is more strategically meaningful than any single citation count.

    AI referral sessions — The sessions in GA4 that actually arrived from an AI platform with a usable referrer header. This is the only metric that ties visibility to business outcomes. Setup is covered in the next section.

    Conversion quality from AI traffic — AI-referred visitors behave differently from organic search visitors. They arrive with higher intent (they asked a specific question and your site was the answer). Track engagement rate, pages per session, and goal completions for AI referral sessions separately. If this cohort converts at 2–3× the rate of your organic traffic — which early data from practitioners suggests — it changes how you think about GEO investment.

    Setting Up GA4 to Capture AI Traffic: The Regex You Need

    Out of the box, GA4 misclassifies most AI referral traffic. ChatGPT sessions land in “Referral.” Perplexity sessions land in “Referral.” Claude.ai sessions may land in “Direct.” Without a custom channel group, you have no way to isolate or trend this traffic.

    In GA4: Admin → Data Display → Channel Groups → Create New Channel Group

    Name it “AI Search” and configure the rule:

    • Condition type: Session source
    • Match type: matches regex
    • Pattern (copy exactly):
    ^(chatgpt\.com|openai\.com|chat\.openai\.com|perplexity\.ai|claude\.ai|anthropic\.com|gemini\.google\.com|bard\.google\.com|copilot\.microsoft\.com|bing\.com\/chat|deepseek\.com|grok\.com|you\.com|poe\.com|meta\.ai)$

    Critical step: Place the “AI Search” channel above “Referral” in your channel list. GA4 processes channel rules top-to-bottom — if Referral appears first, every AI referral will match Referral before ever reaching your AI channel definition. This is the single most common setup mistake.

    One important caveat on scope: approximately 70% of AI-originated visits arrive without a referrer header. OpenAI’s iOS app, private browsing mode, and in-app browsers all strip referrer data before the request reaches your server. This means your “AI Search” channel in GA4 is capturing the visible minority — the sessions where the referrer was preserved. Don’t benchmark by absolute volume. Benchmark by growth rate. If your AI Search channel is growing month-over-month while overall Direct traffic is stable, your citation presence is expanding.

    To supplement GA4 attribution, add a self-reported source question to high-intent forms: “How did you find us?” Include “ChatGPT / AI assistant” as an option. This provides ground truth that session data alone cannot.

    The Tool Tier: Free to Enterprise

    The LLM visibility tool market matured significantly through 2025 and into 2026. Three tiers have emerged, and most independent publishers and agencies should start at the first tier before paying for anything.

    Free / DIY layer — start here

    Run 20 representative prompts manually across ChatGPT, Perplexity, Claude, and Google AI Overviews each month. Record mentions in a spreadsheet: cited (yes/no), cited with link (yes/no), competitor named instead. This gives you baseline prompt coverage and share of voice data with zero budget. Do this for at least one month before paying for any tool — you’ll understand your own citation patterns much better and know exactly what problem you’re trying to solve with a paid platform.

    Mid-market tools ($100–$500/month)

    Otterly.ai provides automated monitoring across Google AI Overviews, ChatGPT, Perplexity, Gemini, and Microsoft Copilot. It runs scheduled prompt sets on your behalf and tracks brand mention frequency and citation links over time. The value is removing the manual labor of the 20-prompt audit while expanding coverage to more platforms and prompts than you’d realistically run by hand.

    LLMrefs takes a different approach: input your existing SEO keywords rather than writing prompts, and the platform automatically generates prompt fan-outs and returns tracking in a dashboard that mirrors a traditional rank tracker. Lower learning curve for teams coming from keyword-centric SEO workflows.

    Enterprise layer ($1,000+/month)

    Profound is built around its proprietary Prompt Volumes dataset — a search-volume equivalent for AI queries. It estimates how often specific questions are actually being asked across LLMs, which lets you prioritize content topics based on demand rather than intuition. This is genuinely useful at scale, but it’s overkill for most independent publishers. It becomes relevant when you’re deciding between 20 possible content angles and need volume data to make the call.

    The 20-Prompt Audit: Your Monthly Baseline Protocol

    Whether you use a paid tool or not, run this protocol monthly:

    1. Build a prompt library of 20 questions your target buyer would ask an AI system. These should be the questions your content is designed to answer — not keyword-formatted phrases, but actual conversational queries.
    2. Run each prompt across ChatGPT, Perplexity, and Google AI Overviews (3 platforms × 20 prompts = 60 data points per month).
    3. For each result, record: was your brand cited in text, was your domain linked, and which competitor was cited if you were not.
    4. Calculate prompt coverage per platform (what % of the 20 prompts returned your brand) and total share of voice versus your top 3 competitors.

    Log results in a spreadsheet with a date column. Three months of monthly data reveals directional trends — whether your GEO and AEO work is moving the needle. No tool gives you this longitudinal view without ongoing, consistent execution.

    Diagnosing a Citation Drop

    If your monthly audit shows prompt coverage declining from the previous period, run through this checklist before assuming a platform algorithm change:

    Did you remove or restructure a previously cited page? AI systems build representations of your content over time. Pages that disappear or are significantly restructured lose citation weight. Check your changelog against the prompt set that declined.

    Did a competitor publish stronger content on the topic? AI citation is zero-sum within the 2–7 source window. If a competitor published a more authoritative, well-structured page, it may have displaced yours. Review their recent publishing calendar.

    Check your LLMs.txt file. A crawlability block accidentally introduced via LLMs.txt or a misconfigured robots.txt Disallow directive will cut AI citation access at the source. Verify your LLMs.txt is allowing the pages you expect to be cited.

    Check for a model update on the platform. Major model releases can reset citation patterns. GPT-5, Gemini 2.0, and similar releases changed which sources each platform weighted. Check the platform’s public changelog for the period in question.

    If none of these apply, run a structured data audit on the pages that lost citations. Schema markup, FAQ blocks, clear heading hierarchy, and factual density all affect how AI systems extract and attribute content. A page that lost its FAQ section in a redesign may have simultaneously lost its AI citation utility.

    The Bottom Line

    LLM visibility measurement is not a solved problem, but the measurement primitives exist today: GA4 custom channel groups for traffic attribution, manual prompt audits for citation coverage, and mid-market tools for automated monitoring at scale. The sites building this infrastructure now will have 12–18 months of baseline data by the time the rest of the market treats it as standard practice.

    Build the 20-prompt library this week. Set up the GA4 channel group today. Everything else layers on top of those two data streams.