97% of llms.txt Files Got No AI Requests. Here's the Full Story.

Summary

Ahrefs found 97% of llms.txt files across 137,000 domains got zero AI bot requests. The stat is real — but the scoreboard it uses misses coding agents, the file's primary audience.

The short answer: most llms.txt sees no fetches, and that's not the whole story
What the Ahrefs number actually measured — and its limits
Why "AI traffic to the file" is the wrong scoreboard
The real audience: coding agents are already reading
Instrument it — how to tell if YOUR file is in the working 3%
When to skip llms.txt entirely — the honest "don't bother" cases
Related

Key facts

The Ahrefs number is real.
The study measured HTTP requests to the `/llms.
If you publish developer documentation, an API reference, a technical blog, or anything that ends up in a software project context, the honest picture is that your llms.
The difference between the 97% and the 3% is not that the 3% got lucky.
Not every site needs this file today.

The headline from Search Engine Journal's coverage of Ahrefs data is striking: 97% of llms.txt files across 137,000 domains received zero AI bot requests. If you shipped an llms.txt and nothing is reading it, that number feels like a verdict.

It is not. But the reason it isn't is worth understanding carefully, because the 3% that does get traffic has a specific profile — and without bot detection you have no idea which side of that line you are on.

The short answer: most llms.txt sees no fetches, and that's not the whole story

The Ahrefs number is real. Across 137,000 domains that published an llms.txt file, the vast majority saw zero recorded requests from AI bots to that URL. ChatGPT and Perplexity together account for roughly 1% of the requests Ahrefs measured. That is not nothing, but it is not the "AI is reading every site's llms.txt" story that the file's early promoters implied.

What the study also found: around 12% of all measured requests came from GEO and AEO audit tools. That is the industry studying itself. Marketers running llms.txt audits on their clients' sites, developers checking whether their file is valid, agencies benchmarking competitors. A meaningful chunk of the "AI traffic" to llms.txt is not AI products at all — it is humans with audit software.

And there is a stranger data point buried in the dataset. At least one crawler was identified as studying llms.txt files specifically as a prompt-injection vector. The file is a direct feed into AI context windows, so a badly constructed one — especially one auto-generated from CMS content with unchecked user input — can carry malicious instructions into whatever agent reads it. If your llms.txt is generated automatically and you have not reviewed its contents lately, that warrants a look.

This is the companion post to our piece on Google's llms.txt guidance. That one covers the "it's legitimate" side of the thesis. This one covers the "it's oversold" side. Both are true simultaneously.

What the Ahrefs number actually measured — and its limits

The study measured HTTP requests to the /llms.txt URL path from known AI bot User-Agents. That is a reasonable proxy for AI search bot behavior, but it systematically undercounts one major category: coding agents.

Tools like Cursor, Windsurf, Continue, and Claude Code do not crawl llms.txt the way a search bot does. They pull it through IDE integrations, package metadata, and developer tooling pipelines. When a developer opens a project in Cursor and the IDE fetches llms-full.txt to understand the repo's structure, that request may not show up in the crawl data Ahrefs was analyzing. It often comes from the developer's machine rather than from a registered bot hitting the public URL.

The llms-full.txt file matters here too. The core llms.txt spec defines two files: the short index (llms.txt) and the extended version (llms-full.txt) that contains the actual content. Coding agents typically want the full version. Studies focused on requests to the index file miss a significant share of real usage.

None of this means the 97% stat is wrong. It means the 97% stat is measuring AI search bots, and llms.txt has two audiences with very different access patterns.

Why "AI traffic to the file" is the wrong scoreboard

Traffic to /llms.txt as a URL is not the metric the file's creators were optimizing for. Jeremy Howard's original spec framed llms.txt as context-layer infrastructure: a way for AI systems to understand your site's structure and access its content more cleanly than scraping raw HTML.

Measuring its success by counting HTTP GET requests to the URL is like measuring a robots.txt file's effectiveness by counting crawlers that fetched it. The fetch is not the outcome. The outcome is whether the file shaped what those crawlers did next.

For AI search bots, the bet is longer-term. ChatGPT and Perplexity are building the habit, but the spec is still early. For coding agents, the bet has already paid off for developer-facing sites — documentation, APIs, open source projects, and technical blogs see real IDE-driven fetches of their llms-full.txt today. The question is which category your site falls into.

The real audience: coding agents are already reading

If you publish developer documentation, an API reference, a technical blog, or anything that ends up in a software project context, the honest picture is that your llms.txt has an active audience right now. Claude Code reads the file when it encounters your domain in a project. Cursor uses it to help developers navigate your SDK. Windsurf pulls it when an agent is building something that touches your infrastructure.

This audience does not show up as a wave of bot traffic in a dataset counting crawl requests. It shows up as better context retrieval, more accurate answers, and fewer hallucinations when developers ask their coding agents about your product. Those outcomes are hard to count but real.

The flip side: if your site is a consumer blog, a local business site, or a pure-media property with no developer audience, the honest answer is that your llms.txt is probably idle today. AI search bots will mature and that will change, but shipping the file for the wrong audience and then checking HTTP stats is how you end up citing the 97% number as a reason to delete it.

Instrument it — how to tell if YOUR file is in the working 3%

The difference between the 97% and the 3% is not that the 3% got lucky. It is that they have a profile the bots want and, in some cases, that they can actually see which bots are fetching them.

Standard analytics will not help here. Google Analytics filters most bot traffic by design. The UA-Agent strings that identify AI crawlers never make it into your dashboard. If a Perplexity bot fetched your llms.txt yesterday, your GA4 account does not know it happened.

Server-side bot detection is the only reliable method. You need a tool that captures raw traffic, identifies crawlers by User-Agent, and maps those requests to specific URL paths, including /llms.txt and /llms-full.txt. That gives you three things: confirmation that your file is being fetched, identification of which bots are doing the fetching, and a baseline to detect changes over time.

Crawlytics does exactly this — AI bot traffic analytics that show you which crawlers hit which pages. If your llms.txt is in the working 3%, you will see it. If it is idle, you will know whether the issue is bot behavior, file structure, or content profile. The crawl-coverage metric is the auditable starting point: how many of your pages are actually being fetched by AI bots, and what is the crawl frequency?

Cross-reference with how to track AI citations for the full measurement picture, keeping in mind that retrieval (what the logs show) and citation (what the AI answer displays) are two separate events.

When to skip llms.txt entirely — the honest "don't bother" cases

Not every site needs this file today. The honest cases where you can deprioritize it:

Pure-local service businesses. A plumber or a dentist office is not getting fetched by coding agents, and AI search bot maturity for local queries is still thin. Spend that hour on schema markup or a Google Business Profile update instead.
Sites under 10 pages. If your entire site fits in a single HTML file, the benefit of an additional navigation layer for AI bots is minimal. The bots can figure out your structure without help.
Sites with CMS auto-generation and no security review. A CMS that generates llms.txt from user-submitted content without sanitization is a prompt-injection risk. Better to ship nothing than to ship a file that carries attacker instructions into AI context windows. Fix the generation pipeline first.

For everyone else, the cost of shipping is low and the cost of not having it when bot behavior matures is not recoverable quickly. Ship the file, pair it with clean HTML routing for AI bots so the bots that do arrive get well-structured content, and then instrument it so you are measuring reality rather than guessing.

The 97% number is the right reason to be skeptical of hype. It is not a reason to skip the file. It is a reason to ship it with clear eyes, watch the actual bot traffic, and let the data tell you which category you are in.

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →

Frequently Asked Questions

Cite this page

Title: 97% of llms.txt Files Got No AI Requests. Here's the Full Story.
Author: Crawlytics Team
Publisher: Crawlytics
Published: 2026-06-18
Updated: 2026-06-18
URL: https://crawlytics.app/blog/llms-txt-no-traffic-data?utm_source=claude&utm_medium=ai_referral&utm_campaign=crawlytics

Related on this site

This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt

Site index for AI agents: llms.txt · sitemap