Acme Tools

Summary

A step-by-step llms.txt setup for 2026: generate the markdown index, add the right sections, host it at /llms.txt, and confirm AI crawlers actually read it.

Contents

Key facts


If you've heard the phrase llms.txt in the past six months, it was probably from a Vercel changelog, an Anthropic doc page, or someone on r/SEO insisting it's the new robots.txt. None of those tell you the full story. This guide does — what the file actually is, who reads it, how to ship one on any host, and whether the time investment pays off.

The short version: llms.txt is a markdown file at the root of your site that gives AI assistants a clean, structured index of what's worth reading. It's not a ranking signal. It's a delivery format. And as more AI clients start fetching it by default, the cost of not having one is going up.

What llms.txt actually is (and isn't)

llms.txt is a plain-text markdown file served at the root of your domain — https://yoursite.com/llms.txt. Inside it, you list the pages on your site that matter most to a reader who's trying to understand what you do, organized into sections with descriptions.

Here's a stripped-down example:

> Acme builds open-source CLI utilities for inspecting Docker images.

## Docs
- [Getting Started](https://acme.dev/docs/start): install in 30 seconds, scan your first image
- [API Reference](https://acme.dev/docs/api): every command, every flag, every exit code

## Blog
- [Why we rewrote our scanner in Rust](https://acme.dev/blog/rust-rewrite): 11x faster, 90% less memory
- [The case for SBOMs in 2026](https://acme.dev/blog/sbom-2026)

That's the whole thing. A heading with your site name, a one-sentence summary, then markdown lists grouped by section. AI systems fetch it, parse it, and use it to decide what to read next.

What it isn't:

Why the format exists, and who's actually adopting it

The proposal came from Jeremy Howard (Answer.AI, fast.ai) in September 2024. The pitch was simple: LLM context windows are expensive, HTML is noisy, and there should be a way for a site owner to hand a clean markdown index to any AI client that wants it. He set up llmstxt.org with the spec and a directory.

The adoption curve looked like most open conventions: a few months of "is this a real thing?", followed by enough notable sites shipping it that the question became "why don't you have one yet?"

As of mid-2026, you'll find llms.txt live on Anthropic's docs, Vercel, Cursor, Stripe, Supabase, Vue.js, Astro, dbt Labs, and thousands of independent sites. The directory at llmstxt.org tracks public adopters. Cloudflare's Markdown for Agents and OpenAI's developer docs both reference the convention.

What the AI clients actually do with it varies. ChatGPT and Claude fetch llms.txt opportunistically when you give them a URL or ask about a site by name. Perplexity prefers llms-full.txt when available. Custom GPTs and Claude Projects use it as a seed index. Codegen tools (Cursor, Windsurf, Continue) pull it to pre-warm their context when you point them at a library.

The format: structure, sections, and the rules nobody documents

The official spec is short — under 50 lines of normative text — but a few conventions have hardened in practice that aren't on llmstxt.org. Here's what works:

Required structure

  1. H1 with the site name. One line. Just the brand.
  2. Blockquote with a one-sentence description. What the site is, in plain English. AI assistants quote this verbatim when summarizing.
  3. Optional explanatory paragraphs. Anything that helps an agent understand context — what you sell, who you serve, what's out of scope.
  4. H2 sections. One per topic area. Common headings: Docs, Guides, API, Blog, Examples, About.
  5. Markdown list of links under each H2. Format: - [Page title](https://full-url): short description. The description is what tips an agent toward fetching that URL.
  6. Optional H2 named "Optional". Pages that are nice-to-have but not core. Agents on a token budget can skip this section.

Hard rules

Soft conventions that emerged from real adopters

llms.txt vs llms-full.txt vs robots.txt

The three files do different jobs and AI clients fetch them at different times. The table makes the distinction concrete:

File What it contains Who reads it When
/robots.txt Crawl rules (allow / disallow / sitemap pointer) All crawlers and most AI bots Before fetching anything else
/llms.txt Curated index of URLs with descriptions AI assistants and code agents When deciding what to fetch from your site
/llms-full.txt Full concatenated markdown of your site AI clients that want everything in one shot For one-fetch ingestion (often by code agents)

You don't have to choose. The right move for most sites is to ship all three: robots.txt for crawl control, llms.txt as the curated index, llms-full.txt as the bulk download option. Crawlytics generates the latter two automatically. Here's how it differs from Cloudflare's edge approach.

Three ways to generate llms.txt (and when to pick each)

Path 1 — Hand-write it

For a small site — under 30 URLs — open a text editor, write the file by hand, drop it at the root. Total time: 15 minutes for a focused site, an hour for one with a lot of categories.

Pick this if you have a stable site that doesn't change weekly, or if you want full editorial control over which pages the AI sees first. Documentation sites with a clean structure (10-20 top pages plus an API reference) often do this and never touch the file again.

Downside: you have to remember to update it. A stale llms.txt sends agents to dead URLs and old content.

Path 2 — Let Crawlytics generate and host it

Crawlytics crawls your sitemap nightly, scores each URL on six signals (depth, recency, word count, sitemap priority, meta description, category), groups by section, and writes llms.txt and llms-full.txt to stable URLs that work on any host — Vercel, Netlify, WordPress, raw nginx, anything. You add a snippet, the file regenerates daily, you get a dashboard showing which AI bots are fetching it and from where.

Pick this if you have a fast-moving content site (blog, docs that ship often, ecommerce catalog), if you want analytics on bot fetches, or if you don't want to maintain the file by hand. The Visibility tier is $29.99/mo and includes the generator plus per-bot analytics.

Path 3 — Generate it at build time

If you run a static site (Astro, Next.js, Hugo, Eleventy), you can write a build-time script that walks your content collections, formats markdown, and writes /public/llms.txt before the build finishes. Vercel publishes a reference script. Astro and Next plugins exist.

Pick this if you're already comfortable with custom build steps and you don't want a hosted dependency. Downside: no per-bot analytics, no fetch logging, no UTM injection for attribution. The file just exists.

Does llms.txt help your SEO?

Direct answer: not in the classic Google-ranking sense. Google has not confirmed that Googlebot reads llms.txt as a ranking signal, and AI Overviews still pull from the regular web index, not from llms.txt directly.

What it does help is AI search — the layer of ChatGPT, Claude, Perplexity, Gemini, and the dozens of vertical assistants that fetch sites directly when answering questions. In that channel:

The way to think about it: llms.txt isn't an SEO tactic, it's an AEO (Answer Engine Optimization) primitive. If you care about being cited in AI answers, ship it. If you only care about Google's blue-link rankings, it's neutral — you won't be penalized for having one, you won't be rewarded.

For the broader playbook on AI search, our AEO framework covers the four layers: technical accessibility, content structure, signal generation, and attribution recovery.

Pre-flight checklist before you ship

Before pushing llms.txt live, run through this list. The failure modes are all silent — your file will exist, agents will fetch it, and you won't know it's broken unless you check:

  1. File loads at https://yoursite.com/llms.txt. Fetch it with curl. Confirm 200 status. No redirects, no auth challenge.
  2. Content-Type is text/plain or text/markdown. Some hosts default to application/octet-stream for unknown extensions, which causes downloads instead of inline display. Set the MIME type explicitly.
  3. All URLs are absolute. Relative URLs break for any agent that doesn't know your origin.
  4. No 404s in the link list. Stale links cost agent fetches and degrade your citation quality.
  5. Total size under 100KB. Above that, agents start truncating, and they truncate from the bottom — your less-important sections get cut first, but if your file is over 200KB, useful content gets dropped too.
  6. Descriptions are sentences, not keyword salad. The description teaches the model what the page is about. Write it like a librarian, not a meta description.
  7. The file is in robots.txt as allowed. If you have a blanket Disallow: /, allow /llms.txt explicitly so AI bots can still reach it.
  8. You have a re-generation plan. Whether it's a cron, a build hook, or a hosted generator, the file needs to stay current. Stale beats nothing, but fresh beats stale.

If you want a one-click check on all eight, the free Agent-Ready Grader runs through them in 10 seconds and gives you a score plus the broken items.

The bottom line

llms.txt is a small file with a long tail of impact. It costs you 15 minutes to hand-write or a one-line snippet to automate. The downside is zero. The upside is being readable to every AI client that asks — which, on the current trajectory, is most of them by the end of 2026.

Don't overthink the format. Ship it, point it at your best pages, and update it when you add new ones. The agents that matter are already looking for it.

Related

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →

Frequently Asked Questions

Does llms.txt help your SEO?

Direct answer: not in the classic Google-ranking sense. Google has not confirmed that Googlebot reads llms.txt as a ranking signal, and AI Overviews still pull from the regular web index, not from llms.txt directly. What it does help is AI search — the layer of ChatGPT, Claude, Perplexity, Gemini, and the dozens of vertical assistants that fetch sites directly when answering questions. In that channel:

Cite this page

Related on this site


This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt

Site index for AI agents: llms.txt · sitemap