Verify llms.txt: How to Check Server Logs for AI Crawlers

You shipped an llms.txt file. You curated the links, you paired it with robots.txt, you validated the format. Now answer the only question that matters: is anything actually requesting it? Most site owners never check — and the data from 2026 suggests the honest answer, for most domains, is “almost nothing.” This is the verification step that turns llms.txt from an act of faith into a measurable signal. Here is how to read your own server logs and find out exactly what is fetching the file you published.

Why verification matters more than the file itself

The uncomfortable finding of the last year is that publishing llms.txt and benefiting from llms.txt are two different things. In OtterlyAI’s 90-day crawler study, only 0.1% of AI crawler requests touched /llms.txt at all — 84 requests out of 62,100 total AI bot visits — and the file received far fewer visits than the average content page (OtterlyAI GEO study). As of Q1 2026, no major AI company — OpenAI, Google, Anthropic, Meta, or Mistral — has publicly committed to reading or acting on llms.txt in production systems, though GPTBot does fetch the file occasionally (AEO Engine).

That does not make the file worthless. It makes measurement the whole game. If you cannot tell whether a crawler ever requested the file, you cannot tell whether your time was wasted, whether a platform quietly started honoring it, or whether your file is returning a silent 404. Verification is the difference between strategy and superstition.

The five-minute server-log check

Every fetch of your llms.txt file leaves a row in your access log. The job is to isolate requests to that path, then filter by the user-agents that belong to AI systems. On any server with standard combined-format Apache or Nginx logs, this one-liner does the first pass:

grep -E "/llms(-full)?\.txt" /var/log/nginx/access.log | \
  grep -E -i "GPTBot|OAI-SearchBot|ChatGPT-User|ClaudeBot|Claude-User|Claude-SearchBot|PerplexityBot|Perplexity-User|Google-Extended|Google-CloudVertexBot|Amazonbot|CCBot|Applebot|meta-externalagent|MistralAI-User|bingbot"

The first grep narrows to requests for llms.txt or llms-full.txt. The second filters to the known AI crawler user-agent strings documented across 2026 reference work (No Hacks AI User-Agent Landscape 2026; Momentic crawler list). Each surviving line tells you three things: which bot, what time, and the HTTP status code it received.

That status code is the part people skip. A 200 means the bot got your file. A 404 means you have been congratulating yourself over a file the crawler never actually reached — a misconfigured path, a redirect loop, or a build step that drops the file on deploy. A 301 or 302 means it is being redirected, and not every crawler follows redirects for this path. Read the status column before you read anything else.

Turn the raw hits into a monthly cadence table

One grep tells you whether the file is reachable. To know whether anything is changing, you need the same query run on a schedule and counted by bot. Extend the pipeline to a count:

grep -E "/llms(-full)?\.txt" /var/log/nginx/access.log* | \
  grep -E -i -o "GPTBot|ClaudeBot|PerplexityBot|Google-Extended|bingbot|Amazonbot|CCBot|Applebot" | \
  sort | uniq -c | sort -rn

This produces a leaderboard of which AI user-agents requested your llms.txt across all retained logs. Capture that number on the first of each month and you have a cadence series. The signal you are watching for is not the absolute count — it will be small — but the direction: a bot that appears for the first time, a bot whose hit count jumps, or a bot that goes silent. Those inflection points are the leading indicators that a platform has changed how it treats the file.

What you see in the log	What it means	Action
No requests to `/llms.txt` at all	File may be unreachable, or simply not yet fetched — both are common	Request the URL yourself; confirm a clean 200 before assuming neglect
`200` from GPTBot, low frequency	Consistent with reported behavior — GPTBot fetches occasionally	Log the cadence; treat as baseline, not a ranking signal
`404` or `301` on the path	Crawler is not getting the file you think you published	Fix the path/redirect today — this is a silent failure
A new bot appears month-over-month	A platform may have started fetching the file	Note the date; correlate with any citation or referral changes

Cross-check against your content fetches

The llms.txt hit count means little in isolation. Compare it against how often the same bots fetch your actual content pages. If GPTBot pulls forty content URLs a day and never touches llms.txt, the file is not part of how that crawler discovers you — your content’s own structure and internal linking are doing the work. The practical monitoring approach documented for 2026 is exactly this: a server-log dashboard built against the major user-agents, watching cadence and path-preference shifts month over month (Digital Applied 30-day log study). The same study notes distinct personalities worth knowing — GPTBot crawls more aggressively than most assume, ClaudeBot is more patient than its volume suggests, and PerplexityBot is quieter than its share-of-voice would predict.

What to do with the answer

If your logs show the file is reachable and occasionally fetched, you are in the normal range for 2026 — keep the file current and keep measuring. If they show a 404, you found a real bug that no amount of curation would have fixed. And if they show a brand-new bot starting to request the path, you have spotted a platform behavior change before the blog posts catch up to it. That last case is the entire payoff: the practitioners who read their own logs will know the standard started mattering weeks before the ones who only read about it. Verification is not the boring final step of an llms.txt rollout. On a standard that nobody has formally committed to honoring yet, it is the only step that produces evidence instead of hope.

What to explore next

AEO & AI Search

AEO, GEO, SEO Is the New Social Media

Same room

AEO & AI Search

WordPress Taxonomy Rebuild — Categories, Tags, and Slug Normalization at Scale

Same room

Agency Playbook

How Restoration Work Shows Up in GRESB and CDP Disclosures

You may also explore

Deep dive

Everett Food & Drink

Cracken Coffee Roasters: South Everett Specialty Coffee

Deep dive

Track the AI tools you actually use

Live, vendor-neutral prices & limits for ChatGPT, Claude, Gemini, Perplexity and more — and we’ll email you the moment your tools change price or limits. Free, no hype.

See the live AI tracker →or set up your alerts

Verify llms.txt: How to Check Server Logs for AI Crawlers

Why verification matters more than the file itself

The five-minute server-log check

Turn the raw hits into a monthly cadence table

Cross-check against your content fetches

What to do with the answer

Comments

Leave a Reply Cancel reply

More posts

AI Agents Are Learning to Check Instead of Guess: The GitHub Context Problem

Logic Apps vs Cloud Workflows: No-Code Automation Across Two Clouds

Azure Static Web Apps vs Firebase Hosting: A Dashboard on Each

Cosmos DB vs Firestore: A Free-Tier Operations Ledger on Both Clouds