SiteName

Summary

llms.txt is the markdown index at /llms.txt that tells AI systems what your site contains. The format, every field, real examples, and a free generator.

Contents

Key facts


The one-paragraph version

llms.txt is a plain-text file you put at the root of your website (https://example.com/llms.txt) that gives AI systems a curated, structured summary of your content in clean markdown. It's the AI-era cousin of robots.txt and sitemap.xml — instead of telling crawlers what they can read, it tells them what they should read and in what order.

Why this exists

Modern websites are HTML-heavy. They have nav bars, footers, cookie banners, JavaScript that loads content after the page renders, ads, popups, and a thousand other elements that mean nothing to a language model. When an LLM tries to read your site, it spends most of its context window on noise and a fraction on actual content.

The result: ChatGPT cites your competitor instead of you, Perplexity summarizes the wrong section, Claude can't answer questions about your product. Not because your content is bad because your content is buried in a delivery format optimized for human browsers.

llms.txt fixes this by giving AI systems a pre-curated, structured, markdown version of what matters on your site, at a stable URL they can fetch in one request.

Who proposed the standard

The format was proposed by Jeremy Howard (Answer.AI, co-founder of fast.ai) in September 2024. The spec lives at llmstxt.org and has been adopted by Anthropic, Cloudflare, Vercel, and a growing number of developer tools.

The file format

The spec is intentionally simple. A valid llms.txt file looks like this:

> One-line summary of what the site is and who it's for.

Optional context paragraph that explains the site's mission, target audience, or important context. Keep this short — 2-3 sentences.

## Section name
- [Link title](https://example.com/page): One-line description of what the page covers.
- [Another link](https://example.com/other-page): Another description.

## Another section
- [Yet another link](https://example.com/third-page): Description.

That's the whole spec. The hard parts are:

llms.txt vs llms-full.txt

The standard defines two related files:

File What it contains Purpose
/llms.txt Curated index — H1, summary, sections of links with one-line descriptions The "table of contents" — AI fetches it first to understand the shape of your site
/llms-full.txt Full markdown of your top-scored pages concatenated into one document The "single-fetch bundle" — AI can ingest all your key content in one request, no follow-up fetches needed

Most sites should have both. The index is what AI discovers first; the bundle is what AI loads when it needs depth.

How AI systems use it

When an AI assistant gets a question about your domain — "what's the pricing for example.com?" — modern systems will:

  1. Check if https://example.com/llms.txt exists. If yes, fetch it.
  2. Parse the H2 sections to understand the site structure.
  3. Find the most relevant section (e.g. "Pricing") and identify the linked pages.
  4. Either fetch llms-full.txt for a bundled read, or fetch the specific page markdown (e.g. /md/pricing).
  5. Answer using that content as primary source, citing the page URL.

Without llms.txt, the same AI has to (a) crawl your sitemap if you have one, (b) fetch each page's HTML, (c) try to extract main content from styled HTML, (d) guess at which pages are most important. That's expensive in tokens and often fails. With llms.txt, the AI does one fetch and gets the structure for free.

Does llms.txt affect SEO?

No, not directly. Googlebot reads HTML, not llms.txt — Google has stated their search ranking pipeline ignores llms.txt. Adding the file won't move your rankings up or down on traditional search.

What it does affect is AI search visibility. ChatGPT, Perplexity, Claude, and Google's own AI Overviews (a different pipeline from Google search) all increasingly fetch llms.txt. So if you want to rank in AI assistant answers — what some folks now call "AEO" (Answer Engine Optimization) or "GEO" (Generative Engine Optimization) — llms.txt is becoming table-stakes.

Three ways to generate llms.txt

Option 1: Hand-write it

Fine for small sites (under ~20 pages). You write the markdown yourself, commit it to your repo or upload to your host, done. Maintenance burden: every time you add or remove a page, you update the file. Most teams forget after the first few weeks and the file drifts out of sync with the site.

Best for: personal portfolios, small marketing sites, documentation sites where the structure rarely changes.

Option 2: Generate once, host static

Use a one-time generator script (there are several open-source options on GitHub) to crawl your sitemap and output an llms.txt file. Upload it. Move on.

Same drift problem as option 1 the file ages immediately. Better than nothing, worse than option 3.

Best for: sites that change content rarely (annual brochures, archival projects).

Option 3: Auto-generate and host dynamically

A service crawls your sitemap on a schedule (daily is common), extracts each page as clean markdown, scores and categorizes the pages, then serves /llms.txt, /llms-full.txt, and per-page /md/<path> URLs dynamically. The file stays current without you touching it.

This is what Crawlytics does. Cloudflare's Markdown for Agents does something related (HTML→markdown on demand via the Accept header), but doesn't generate the pre-built llms.txt file most AI clients look for.

Best for: any site that publishes new content regularly — blogs, e-commerce, SaaS, documentation, news.

Where to put llms.txt

Convention says the root of your domain: https://example.com/llms.txt. AI crawlers look there first. If you have subdomains (docs.example.com, blog.example.com), each one should have its own llms.txt — they're treated as separate sites by AI systems.

Content-Type should be text/plain or text/markdown. Either works; markdown clients prefer the latter when available.

What goes in (and what doesn't)

Include:

Exclude:

Scoring: which pages belong at the top?

When your llms.txt has 50+ pages, the order matters — AI assistants weight the top of the file more heavily. A reasonable scoring approach uses six signals:

  1. Sitemap priority (the priority attribute in sitemap.xml, if present)
  2. URL depth — shallower pages (one path segment) usually matter more than deep nested pages
  3. Category — homepage / pricing / product pages outrank blog posts in most cases
  4. Word count — pages with substance score higher than 200-word stub pages
  5. Recency (the lastmod attribute in sitemap.xml) — fresher pages slightly preferred
  6. Has meta description — proxy for "the author cared enough to write a description"

Crawlytics applies this scoring automatically. If you're hand-writing your file, you can apply the same logic mentally but expect to re-order it every couple of months.

Common mistakes

Testing your llms.txt

Three quick checks once you've shipped:

  1. Open https://yoursite.com/llms.txt in a browser. Does it load? Is it readable?
  2. Paste a question about your site into ChatGPT (using ChatGPT with browsing, not the base model) and see if it cites you. If the citations are accurate and specific, your llms.txt is being read.
  3. Check your server logs for fetches to /llms.txt from User-Agents like GPTBot, ClaudeBot, PerplexityBot. If you see them, the AI ecosystem has discovered your file.

If you have Crawlytics installed, the dashboard surfaces these fetches automatically, so you can see which AI providers are pulling your llms.txt and how often.

Related

Frequently Asked Questions

What is llms.txt?

llms.txt is a plain-text file you put at the root of your website (https://example.com/llms.txt) that gives AI systems a curated summary of your content in clean markdown. The format is an H1 with your site name, a one-line summary, then H2 sections of grouped links with descriptions. It is the AI-era counterpart to robots.txt and sitemap.xml.

Does llms.txt help SEO?

Not for traditional Google search. Googlebot does not read llms.txt and Google has stated the search ranking pipeline ignores the file. It does help AI search visibility: ChatGPT, Claude, Perplexity, and Google AI Overviews increasingly fetch llms.txt to understand a site, so adding the file lifts citation eligibility in AI answers without affecting Google rankings either way.

How do I create an llms.txt file?

Three options: (1) hand-write the file once if your site is small (under 20 pages) and stable; (2) generate it once with an open-source script; (3) auto-generate it from your sitemap on a daily schedule so it stays current. Option 3 is the only one that survives content drift past the first month. Crawlytics handles option 3 automatically.

Where do I host llms.txt?

At the root of your domain: https://yoursite.com/llms.txt. AI crawlers look there first. Subdomains (docs.yoursite.com, blog.yoursite.com) each need their own file. Serve as text/plain or text/markdown. Either Content-Type works, markdown-aware clients prefer the latter.

What is the difference between llms.txt and llms-full.txt?

llms.txt is the curated index, short, grouped by category. AI fetches it first to understand your site shape. llms-full.txt is the full bundle, your top-scored pages concatenated as one document an AI can ingest in a single fetch when it needs depth. Most sites should publish both.

How many pages should llms.txt include?

Curate aggressively. A 50-line llms.txt outperforms a 2,000-line one because AI assistants weight the top of the file heavily and stop reading partway through long files. Include pages that answer real questions: homepage, pricing, product, key docs, top blog posts, calculators, tools. Skip tag pages, paginated archives, thin stub pages.

Cite this page

Related on this site


This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt

Site index for AI agents: llms.txt · sitemap