llms.txt is the markdown index at /llms.txt that tells AI systems what your site contains. The format, every field, real examples, and a free generator.
llms.txt is a plain-text file you put at the root of your website (https://example.com/llms.txt) that gives AI systems a curated, structured summary of your content in clean markdown. It's the AI-era cousin of robots.txt and sitemap.xml — instead of telling crawlers what they can read, it tells them what they should read and in what order.
Modern websites are HTML-heavy. They have nav bars, footers, cookie banners, JavaScript that loads content after the page renders, ads, popups, and a thousand other elements that mean nothing to a language model. When an LLM tries to read your site, it spends most of its context window on noise and a fraction on actual content.
The result: ChatGPT cites your competitor instead of you, Perplexity summarizes the wrong section, Claude can't answer questions about your product. Not because your content is bad because your content is buried in a delivery format optimized for human browsers.
llms.txt fixes this by giving AI systems a pre-curated, structured, markdown version of what matters on your site, at a stable URL they can fetch in one request.
The format was proposed by Jeremy Howard (Answer.AI, co-founder of fast.ai) in September 2024. The spec lives at llmstxt.org and has been adopted by Anthropic, Cloudflare, Vercel, and a growing number of developer tools.
The spec is intentionally simple. A valid llms.txt file looks like this:
> One-line summary of what the site is and who it's for.
Optional context paragraph that explains the site's mission, target audience, or important context. Keep this short — 2-3 sentences.
## Section name
- [Link title](https://example.com/page): One-line description of what the page covers.
- [Another link](https://example.com/other-page): Another description.
## Another section
- [Yet another link](https://example.com/third-page): Description.
That's the whole spec. The hard parts are:
## Tools, ## Pricing, etc.) as a navigation skeleton. Reasonable groupings make the difference between an LLM understanding your site and treating it as a wall of links.The standard defines two related files:
| File | What it contains | Purpose |
|---|---|---|
| /llms.txt | Curated index — H1, summary, sections of links with one-line descriptions | The "table of contents" — AI fetches it first to understand the shape of your site |
| /llms-full.txt | Full markdown of your top-scored pages concatenated into one document | The "single-fetch bundle" — AI can ingest all your key content in one request, no follow-up fetches needed |
Most sites should have both. The index is what AI discovers first; the bundle is what AI loads when it needs depth.
When an AI assistant gets a question about your domain — "what's the pricing for example.com?" — modern systems will:
https://example.com/llms.txt exists. If yes, fetch it.llms-full.txt for a bundled read, or fetch the specific page markdown (e.g. /md/pricing).Without llms.txt, the same AI has to (a) crawl your sitemap if you have one, (b) fetch each page's HTML, (c) try to extract main content from styled HTML, (d) guess at which pages are most important. That's expensive in tokens and often fails. With llms.txt, the AI does one fetch and gets the structure for free.
No, not directly. Googlebot reads HTML, not llms.txt — Google has stated their search ranking pipeline ignores llms.txt. Adding the file won't move your rankings up or down on traditional search.
What it does affect is AI search visibility. ChatGPT, Perplexity, Claude, and Google's own AI Overviews (a different pipeline from Google search) all increasingly fetch llms.txt. So if you want to rank in AI assistant answers — what some folks now call "AEO" (Answer Engine Optimization) or "GEO" (Generative Engine Optimization) — llms.txt is becoming table-stakes.
Fine for small sites (under ~20 pages). You write the markdown yourself, commit it to your repo or upload to your host, done. Maintenance burden: every time you add or remove a page, you update the file. Most teams forget after the first few weeks and the file drifts out of sync with the site.
Best for: personal portfolios, small marketing sites, documentation sites where the structure rarely changes.
Use a one-time generator script (there are several open-source options on GitHub) to crawl your sitemap and output an llms.txt file. Upload it. Move on.
Same drift problem as option 1 the file ages immediately. Better than nothing, worse than option 3.
Best for: sites that change content rarely (annual brochures, archival projects).
A service crawls your sitemap on a schedule (daily is common), extracts each page as clean markdown, scores and categorizes the pages, then serves /llms.txt, /llms-full.txt, and per-page /md/<path> URLs dynamically. The file stays current without you touching it.
This is what Crawlytics does. Cloudflare's Markdown for Agents does something related (HTML→markdown on demand via the Accept header), but doesn't generate the pre-built llms.txt file most AI clients look for.
Best for: any site that publishes new content regularly — blogs, e-commerce, SaaS, documentation, news.
Convention says the root of your domain: https://example.com/llms.txt. AI crawlers look there first. If you have subdomains (docs.example.com, blog.example.com), each one should have its own llms.txt — they're treated as separate sites by AI systems.
Content-Type should be text/plain or text/markdown. Either works; markdown clients prefer the latter when available.
Include:
Exclude:
When your llms.txt has 50+ pages, the order matters — AI assistants weight the top of the file more heavily. A reasonable scoring approach uses six signals:
priority attribute in sitemap.xml, if present)lastmod attribute in sitemap.xml) — fresher pages slightly preferredCrawlytics applies this scoring automatically. If you're hand-writing your file, you can apply the same logic mentally but expect to re-order it every couple of months.
llms.txt is worse than a 50-line one. AI assistants stop reading partway. Curate ruthlessly.llms.txt linking to deleted pages or wrong prices is worse than no file at all. Set a re-generation cadence (daily is ideal).llms.txt without /md/<path> URLs means AI assistants have to fetch your HTML for depth which is the original problem.Three quick checks once you've shipped:
https://yoursite.com/llms.txt in a browser. Does it load? Is it readable?llms.txt is being read./llms.txt from User-Agents like GPTBot, ClaudeBot, PerplexityBot. If you see them, the AI ecosystem has discovered your file.If you have Crawlytics installed, the dashboard surfaces these fetches automatically, so you can see which AI providers are pulling your llms.txt and how often.
llms.txt is a plain-text file you put at the root of your website (https://example.com/llms.txt) that gives AI systems a curated summary of your content in clean markdown. The format is an H1 with your site name, a one-line summary, then H2 sections of grouped links with descriptions. It is the AI-era counterpart to robots.txt and sitemap.xml.
Not for traditional Google search. Googlebot does not read llms.txt and Google has stated the search ranking pipeline ignores the file. It does help AI search visibility: ChatGPT, Claude, Perplexity, and Google AI Overviews increasingly fetch llms.txt to understand a site, so adding the file lifts citation eligibility in AI answers without affecting Google rankings either way.
Three options: (1) hand-write the file once if your site is small (under 20 pages) and stable; (2) generate it once with an open-source script; (3) auto-generate it from your sitemap on a daily schedule so it stays current. Option 3 is the only one that survives content drift past the first month. Crawlytics handles option 3 automatically.
At the root of your domain: https://yoursite.com/llms.txt. AI crawlers look there first. Subdomains (docs.yoursite.com, blog.yoursite.com) each need their own file. Serve as text/plain or text/markdown. Either Content-Type works, markdown-aware clients prefer the latter.
llms.txt is the curated index, short, grouped by category. AI fetches it first to understand your site shape. llms-full.txt is the full bundle, your top-scored pages concatenated as one document an AI can ingest in a single fetch when it needs depth. Most sites should publish both.
Curate aggressively. A 50-line llms.txt outperforms a 2,000-line one because AI assistants weight the top of the file heavily and stop reading partway through long files. Include pages that answer real questions: homepage, pricing, product, key docs, top blog posts, calculators, tools. Skip tag pages, paginated archives, thin stub pages.
This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt