llms.txt for AI Agents: Navigation, Not Discovery (2026)

Summary

John Mueller is right: llms.txt can't get you discovered by AI. That's not a flaw — it was never meant to. Here's what the file actually does and why it still belongs on your site.

Contents

Key facts


Google's John Mueller said something accurate about llms.txt on Search Off the Record: the file "can't help LLMs differentiate sites" during the discovery process. He's right. And that framing is actually the clearest explanation of what llms.txt is — and isn't — that has come from a major search authority yet.

The problem is that a lot of people interpreted "can't differentiate" as "doesn't matter." Those are not the same thing. Mueller was describing a specific limitation of a file that was never designed to solve a discovery problem. The file solves a different problem: what happens after an agent already arrives.

The short answer: llms.txt navigates, it doesn't rank or get you found

llms.txt is a Markdown-formatted file that lives at the root of your domain. It lists your key pages, their descriptions, and their structure — a clean map of your site written for AI readers rather than human browsers. When an agent lands on your domain and fetches /llms.txt, it gets an orientation that takes seconds instead of crawling through multiple pages.

What it cannot do is introduce your site to an AI system that hasn't already found you. There's no mechanism for that. The file doesn't broadcast your existence. It doesn't submit your pages to a model's training corpus. It doesn't inject your domain into an LLM's awareness. Mueller's point is simply that a file sitting quietly at your URL root has no way to influence whether an AI retrieves you in the first place.

He compared it to a store directory — useful for someone who already walked in, invisible to someone who hasn't. That's exactly right.

What Mueller actually said (and why "can't differentiate" is correct)

The comment came from Search Off the Record, where Mueller discussed the emerging stack of context files that sites are publishing for AI systems. His specific framing: llms.txt can't help LLMs differentiate between sites in the way that traditional ranking signals do, because the file is only readable by an agent that has already reached your domain.

He also name-dropped WebMCP as a format under active discussion alongside llms.txt. That mention matters, and we'll come back to it.

The "can't differentiate" point is technically precise. AI systems that are deciding which sources to retrieve or synthesize don't do that by reading your llms.txt first. They use their training data, retrieval indexes, and link graph — the same signals search has always used. A file at your URL root can't intervene in that process.

What Mueller did not say is that the file is pointless. That's the reading people are projecting onto the statement.

Discovery vs navigation: the two jobs people conflate

The confusion here is real and understandable. For most of SEO's history, "being visible" meant "ranking" — and ranking meant influencing a centralized system's decision to surface you. The habits built around that model push people to evaluate every new signal by asking: "does this help me rank?" For llms.txt, the answer is no. But that's because ranking is not the question it's answering.

Break the problem into two separate jobs:

Job 1 — Discovery: An AI system decides whether to include your content in an answer. This depends on whether your pages have been indexed, whether you have enough link authority, whether your content appears relevant to the query. These inputs look a lot like traditional SEO. llms.txt plays no role here.

Job 2 — Navigation: An AI agent that has already reached your domain — either because it was directed there, or because it retrieved one of your pages — needs to find the right content quickly and complete a task. This is where llms.txt is directly useful. An agent with a clean map of your content can get to your pricing page, your documentation, your product listing, or your booking flow in one lookup instead of three.

Most llms.txt debates are arguing past each other because one side is talking about Job 1 and the other is talking about Job 2.

What still does discovery (links, sitemaps, live fetches)

If you're trying to get AI systems to actually find and retrieve your content, llms.txt is the wrong lever. The right levers are familiar: links from high-authority sources (AI retrieval systems still weight these heavily), a clean sitemap so crawlers can index your pages efficiently, server-side rendering so AI bots don't get a blank JavaScript shell, and structured data that helps parsers understand what your content is about.

Entity authority is also growing in importance. If your site is the recognized source for a topic — the place that established the concept, published the original research, or has been cited by others who covered it — AI retrieval systems are more likely to pull you. That's content and links, not a file.

Being visible to GPTBot, ClaudeBot, PerplexityBot, and the like is a prerequisite. You can verify that in your bot logs — Crawlytics shows which AI crawlers have hit your pages and how recently. If the bots aren't reaching you, no amount of llms.txt refinement changes that. Fix the crawl access first.

From "read" to "do": where WebMCP picks up

Mueller's mention of WebMCP alongside llms.txt is worth unpacking, because it maps directly to how the agent-interface stack is forming.

llms.txt is the read layer. A WebMCP-aware agent reads it to understand what your site contains and where things live. Then it acts using WebMCP — a draft browser API that lets your site register tools an agent can invoke: searchProducts(), addToCart(), bookAppointment(), requestQuote(). The agent reads you first, then acts on the site that makes itself actionable.

Consider a concrete example: a user asks their agent to find a project management tool with a free tier and an API. The agent retrieves several candidates from its index. It lands on your site. If you have llms.txt, the agent can navigate directly to your pricing page and your API docs without crawling through your navigation. If you also have WebMCP tools registered, the agent can invoke a structured getPricingPlans() call and return clean, structured data about your tiers directly to the user — no scraping, no guessing, no broken parse.

That's the sequence: discovery (links and crawl) brings the agent to your domain; llms.txt routes it to the right page; WebMCP lets it complete the task.

The full WebMCP explainer covers the spec, which agents invoke it today, and the safety model. The short version: adoption is still early, but the snippet is a 90-second integration and a no-op on browsers without support. Mueller naming it validates the direction.

So should you still ship llms.txt? (yes — here's the honest why)

Mueller's comment is sometimes read as a reason to skip llms.txt entirely. That's not the right takeaway. His point is about what the file can't do, not a verdict on whether it's worth deploying.

There are several concrete reasons to ship it anyway:

Coding agents are the biggest current audience. Cursor, Windsurf, and Continue pull llms-full.txt to understand a codebase or documentation set before generating code. For developer tools, SaaS, and API docs, this is a real and active use case today, not a future one. The agents that are actually reading llms.txt at scale right now are coding assistants, not consumer chatbots.

Navigation efficiency compounds. An agent that can find your pricing page in one lookup instead of three is more likely to complete the task on your site rather than abandoning it partway. This is especially true when the agent is operating with a limited context window or is balancing multiple sources.

Structured identity matters to retrieval. While llms.txt doesn't rank you, it does declare your content structure in a machine-readable form that reinforces what your site is about. That's not nothing for the parsers that do index you.

It's a 15-minute task. The cost is low enough that "it doesn't do the one thing it was never designed to do" is not a good reason to skip it. The setup guide walks through the whole process. Crawlytics generates it automatically from your sitemap.

Mueller is right about discovery. That doesn't change the navigation case, the coding-agent case, or the forward investment in the read-then-do stack that WebMCP is building on top of.

The positioning that follows from this

The honest framing for llms.txt in 2026 is exactly what Mueller described, applied more precisely: it's agent-navigation infrastructure, not a ranking signal. Your site gets discovered the same way it always has. Once an agent arrives, llms.txt is the structured handshake that tells it where things are and what they do.

The AEO/GEO pillar covers where AI visibility fits into the broader search picture. The AI share of voice post covers what's actually measurable with server-side bot logs versus what's synthetic and directional. Those two together give the full context for where llms.txt sits in the stack.

Mueller's framing isn't a warning about llms.txt. It's a clarification of its job. The file does what it does well. Discovery is a different problem, and it has different tools.

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →

Frequently Asked Questions

Cite this page

Related on this site


This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt

Site index for AI agents: llms.txt · sitemap