A step-by-step llms.txt setup for 2026: generate the markdown index, add the right sections, host it at /llms.txt, and confirm AI crawlers actually read it.
If you've heard the phrase llms.txt in the past six months, it was probably from a Vercel changelog, an Anthropic doc page, or someone on r/SEO insisting it's the new robots.txt. None of those tell you the full story. This guide does — what the file actually is, who reads it, how to ship one on any host, and whether the time investment pays off.
The short version: llms.txt is a markdown file at the root of your site that gives AI assistants a clean, structured index of what's worth reading. It's not a ranking signal. It's a delivery format. And as more AI clients start fetching it by default, the cost of not having one is going up.
llms.txt is a plain-text markdown file served at the root of your domain — https://yoursite.com/llms.txt. Inside it, you list the pages on your site that matter most to a reader who's trying to understand what you do, organized into sections with descriptions.
Here's a stripped-down example:
> Acme builds open-source CLI utilities for inspecting Docker images.
## Docs
- [Getting Started](https://acme.dev/docs/start): install in 30 seconds, scan your first image
- [API Reference](https://acme.dev/docs/api): every command, every flag, every exit code
## Blog
- [Why we rewrote our scanner in Rust](https://acme.dev/blog/rust-rewrite): 11x faster, 90% less memory
- [The case for SBOMs in 2026](https://acme.dev/blog/sbom-2026)
That's the whole thing. A heading with your site name, a one-sentence summary, then markdown lists grouped by section. AI systems fetch it, parse it, and use it to decide what to read next.
What it isn't:
robots.txt. Robots.txt is exclusion — telling crawlers what not to fetch. llms.txt is inclusion — telling them what's worth fetching first.llms.txt. As of mid-2026, AI Overviews still pull from web search, not from llms.txt directly.The proposal came from Jeremy Howard (Answer.AI, fast.ai) in September 2024. The pitch was simple: LLM context windows are expensive, HTML is noisy, and there should be a way for a site owner to hand a clean markdown index to any AI client that wants it. He set up llmstxt.org with the spec and a directory.
The adoption curve looked like most open conventions: a few months of "is this a real thing?", followed by enough notable sites shipping it that the question became "why don't you have one yet?"
As of mid-2026, you'll find llms.txt live on Anthropic's docs, Vercel, Cursor, Stripe, Supabase, Vue.js, Astro, dbt Labs, and thousands of independent sites. The directory at llmstxt.org tracks public adopters. Cloudflare's Markdown for Agents and OpenAI's developer docs both reference the convention.
What the AI clients actually do with it varies. ChatGPT and Claude fetch llms.txt opportunistically when you give them a URL or ask about a site by name. Perplexity prefers llms-full.txt when available. Custom GPTs and Claude Projects use it as a seed index. Codegen tools (Cursor, Windsurf, Continue) pull it to pre-warm their context when you point them at a library.
The official spec is short — under 50 lines of normative text — but a few conventions have hardened in practice that aren't on llmstxt.org. Here's what works:
- [Page title](https://full-url): short description. The description is what tips an agent toward fetching that URL.llms.txt wastes agent fetches.The three files do different jobs and AI clients fetch them at different times. The table makes the distinction concrete:
| File | What it contains | Who reads it | When |
|---|---|---|---|
| /robots.txt | Crawl rules (allow / disallow / sitemap pointer) | All crawlers and most AI bots | Before fetching anything else |
| /llms.txt | Curated index of URLs with descriptions | AI assistants and code agents | When deciding what to fetch from your site |
| /llms-full.txt | Full concatenated markdown of your site | AI clients that want everything in one shot | For one-fetch ingestion (often by code agents) |
You don't have to choose. The right move for most sites is to ship all three: robots.txt for crawl control, llms.txt as the curated index, llms-full.txt as the bulk download option. Crawlytics generates the latter two automatically. Here's how it differs from Cloudflare's edge approach.
For a small site — under 30 URLs — open a text editor, write the file by hand, drop it at the root. Total time: 15 minutes for a focused site, an hour for one with a lot of categories.
Pick this if you have a stable site that doesn't change weekly, or if you want full editorial control over which pages the AI sees first. Documentation sites with a clean structure (10-20 top pages plus an API reference) often do this and never touch the file again.
Downside: you have to remember to update it. A stale llms.txt sends agents to dead URLs and old content.
Crawlytics crawls your sitemap nightly, scores each URL on six signals (depth, recency, word count, sitemap priority, meta description, category), groups by section, and writes llms.txt and llms-full.txt to stable URLs that work on any host — Vercel, Netlify, WordPress, raw nginx, anything. You add a snippet, the file regenerates daily, you get a dashboard showing which AI bots are fetching it and from where.
Pick this if you have a fast-moving content site (blog, docs that ship often, ecommerce catalog), if you want analytics on bot fetches, or if you don't want to maintain the file by hand. The Visibility tier is $29.99/mo and includes the generator plus per-bot analytics.
If you run a static site (Astro, Next.js, Hugo, Eleventy), you can write a build-time script that walks your content collections, formats markdown, and writes /public/llms.txt before the build finishes. Vercel publishes a reference script. Astro and Next plugins exist.
Pick this if you're already comfortable with custom build steps and you don't want a hosted dependency. Downside: no per-bot analytics, no fetch logging, no UTM injection for attribution. The file just exists.
Direct answer: not in the classic Google-ranking sense. Google has not confirmed that Googlebot reads llms.txt as a ranking signal, and AI Overviews still pull from the regular web index, not from llms.txt directly.
What it does help is AI search — the layer of ChatGPT, Claude, Perplexity, Gemini, and the dozens of vertical assistants that fetch sites directly when answering questions. In that channel:
llms.txt get cited more often because the agent doesn't have to decide what to read — you already told itllms-full.txt when building features against your API, because they can load the whole reference in one fetchThe way to think about it: llms.txt isn't an SEO tactic, it's an AEO (Answer Engine Optimization) primitive. If you care about being cited in AI answers, ship it. If you only care about Google's blue-link rankings, it's neutral — you won't be penalized for having one, you won't be rewarded.
For the broader playbook on AI search, our AEO framework covers the four layers: technical accessibility, content structure, signal generation, and attribution recovery.
Before pushing llms.txt live, run through this list. The failure modes are all silent — your file will exist, agents will fetch it, and you won't know it's broken unless you check:
https://yoursite.com/llms.txt. Fetch it with curl. Confirm 200 status. No redirects, no auth challenge.text/plain or text/markdown. Some hosts default to application/octet-stream for unknown extensions, which causes downloads instead of inline display. Set the MIME type explicitly.robots.txt as allowed. If you have a blanket Disallow: /, allow /llms.txt explicitly so AI bots can still reach it.If you want a one-click check on all eight, the free Agent-Ready Grader runs through them in 10 seconds and gives you a score plus the broken items.
llms.txt is a small file with a long tail of impact. It costs you 15 minutes to hand-write or a one-line snippet to automate. The downside is zero. The upside is being readable to every AI client that asks — which, on the current trajectory, is most of them by the end of 2026.
Don't overthink the format. Ship it, point it at your best pages, and update it when you add new ones. The agents that matter are already looking for it.
Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →
Direct answer: not in the classic Google-ranking sense. Google has not confirmed that Googlebot reads llms.txt as a ranking signal, and AI Overviews still pull from the regular web index, not from llms.txt directly. What it does help is AI search — the layer of ChatGPT, Claude, Perplexity, Gemini, and the dozens of vertical assistants that fetch sites directly when answering questions. In that channel:
This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt