Ahrefs found 97% of llms.txt files across 137,000 domains got zero AI bot requests. The stat is real — but the scoreboard it uses misses coding agents, the file's primary audience.
The headline from Search Engine Journal's coverage of Ahrefs data is striking: 97% of llms.txt files across 137,000 domains received zero AI bot requests. If you shipped an llms.txt and nothing is reading it, that number feels like a verdict.
It is not. But the reason it isn't is worth understanding carefully, because the 3% that does get traffic has a specific profile — and without bot detection you have no idea which side of that line you are on.
The Ahrefs number is real. Across 137,000 domains that published an llms.txt file, the vast majority saw zero recorded requests from AI bots to that URL. ChatGPT and Perplexity together account for roughly 1% of the requests Ahrefs measured. That is not nothing, but it is not the "AI is reading every site's llms.txt" story that the file's early promoters implied.
What the study also found: around 12% of all measured requests came from GEO and AEO audit tools. That is the industry studying itself. Marketers running llms.txt audits on their clients' sites, developers checking whether their file is valid, agencies benchmarking competitors. A meaningful chunk of the "AI traffic" to llms.txt is not AI products at all — it is humans with audit software.
And there is a stranger data point buried in the dataset. At least one crawler was identified as studying llms.txt files specifically as a prompt-injection vector. The file is a direct feed into AI context windows, so a badly constructed one — especially one auto-generated from CMS content with unchecked user input — can carry malicious instructions into whatever agent reads it. If your llms.txt is generated automatically and you have not reviewed its contents lately, that warrants a look.
This is the companion post to our piece on Google's llms.txt guidance. That one covers the "it's legitimate" side of the thesis. This one covers the "it's oversold" side. Both are true simultaneously.
The study measured HTTP requests to the /llms.txt URL path from known AI bot User-Agents. That is a reasonable proxy for AI search bot behavior, but it systematically undercounts one major category: coding agents.
Tools like Cursor, Windsurf, Continue, and Claude Code do not crawl llms.txt the way a search bot does. They pull it through IDE integrations, package metadata, and developer tooling pipelines. When a developer opens a project in Cursor and the IDE fetches llms-full.txt to understand the repo's structure, that request may not show up in the crawl data Ahrefs was analyzing. It often comes from the developer's machine rather than from a registered bot hitting the public URL.
The llms-full.txt file matters here too. The core llms.txt spec defines two files: the short index (llms.txt) and the extended version (llms-full.txt) that contains the actual content. Coding agents typically want the full version. Studies focused on requests to the index file miss a significant share of real usage.
None of this means the 97% stat is wrong. It means the 97% stat is measuring AI search bots, and llms.txt has two audiences with very different access patterns.
Traffic to /llms.txt as a URL is not the metric the file's creators were optimizing for. Jeremy Howard's original spec framed llms.txt as context-layer infrastructure: a way for AI systems to understand your site's structure and access its content more cleanly than scraping raw HTML.
Measuring its success by counting HTTP GET requests to the URL is like measuring a robots.txt file's effectiveness by counting crawlers that fetched it. The fetch is not the outcome. The outcome is whether the file shaped what those crawlers did next.
For AI search bots, the bet is longer-term. ChatGPT and Perplexity are building the habit, but the spec is still early. For coding agents, the bet has already paid off for developer-facing sites — documentation, APIs, open source projects, and technical blogs see real IDE-driven fetches of their llms-full.txt today. The question is which category your site falls into.
If you publish developer documentation, an API reference, a technical blog, or anything that ends up in a software project context, the honest picture is that your llms.txt has an active audience right now. Claude Code reads the file when it encounters your domain in a project. Cursor uses it to help developers navigate your SDK. Windsurf pulls it when an agent is building something that touches your infrastructure.
This audience does not show up as a wave of bot traffic in a dataset counting crawl requests. It shows up as better context retrieval, more accurate answers, and fewer hallucinations when developers ask their coding agents about your product. Those outcomes are hard to count but real.
The flip side: if your site is a consumer blog, a local business site, or a pure-media property with no developer audience, the honest answer is that your llms.txt is probably idle today. AI search bots will mature and that will change, but shipping the file for the wrong audience and then checking HTTP stats is how you end up citing the 97% number as a reason to delete it.
The difference between the 97% and the 3% is not that the 3% got lucky. It is that they have a profile the bots want and, in some cases, that they can actually see which bots are fetching them.
Standard analytics will not help here. Google Analytics filters most bot traffic by design. The UA-Agent strings that identify AI crawlers never make it into your dashboard. If a Perplexity bot fetched your llms.txt yesterday, your GA4 account does not know it happened.
Server-side bot detection is the only reliable method. You need a tool that captures raw traffic, identifies crawlers by User-Agent, and maps those requests to specific URL paths, including /llms.txt and /llms-full.txt. That gives you three things: confirmation that your file is being fetched, identification of which bots are doing the fetching, and a baseline to detect changes over time.
Crawlytics does exactly this — AI bot traffic analytics that show you which crawlers hit which pages. If your llms.txt is in the working 3%, you will see it. If it is idle, you will know whether the issue is bot behavior, file structure, or content profile. The crawl-coverage metric is the auditable starting point: how many of your pages are actually being fetched by AI bots, and what is the crawl frequency?
Cross-reference with how to track AI citations for the full measurement picture, keeping in mind that retrieval (what the logs show) and citation (what the AI answer displays) are two separate events.
Not every site needs this file today. The honest cases where you can deprioritize it:
For everyone else, the cost of shipping is low and the cost of not having it when bot behavior matures is not recoverable quickly. Ship the file, pair it with clean HTML routing for AI bots so the bots that do arrive get well-structured content, and then instrument it so you are measuring reality rather than guessing.
The 97% number is the right reason to be skeptical of hype. It is not a reason to skip the file. It is a reason to ship it with clear eyes, watch the actual bot traffic, and let the data tell you which category you are in.
Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →
This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt