► Serve · llms.txt generator
Crawlytics fetches every URL in your sitemap, extracts the main content as clean markdown, scores and categorizes the pages, and exposes llms.txt, llms-full.txt, and per-page .md at stable URLs. Built for AI bots that don't run JavaScript.
Three URLs per site, ready to hand to an AI assistant.
Curated index
/api/sites/
Top-scored pages grouped by category. One-line descriptions, links to the live HTML.
Full bundle
/api/sites/
Top 50 pages' full markdown concatenated into a single document an AI can ingest in one fetch.
Per-page markdown
/api/sites/
Every crawled page addressable as clean markdown. Drop /api/sites/3/md/pricing into ChatGPT, get the page back as markdown.
Crawl status
auth required
/api/sites/
Discovered / processed / failed counts, latest job state, top-scored pages. Auth required.
Six steps from sitemap URL to public output.
Because Crawlytics generates your llms.txt and reads your bot logs from the same database, it can answer a question a standalone generator can't: which pages does the file declare that no AI bot has actually fetched? That list is the llms.txt Coverage Gap on your dashboard — declared but undiscovered.
A generator that only writes the file never finds out whether bots used it. The closed loop is the point.
// the serve layer
llms.txt is an open standard (llmstxt.org) — a plain-text file at /llms.txt that tells AI systems what a website contains. The format is: an H1 with your site name, a one-line quote summary, then sections of grouped links with descriptions. AI bots fetch this instead of trying to parse JavaScript-heavy HTML. llms-full.txt is the companion file that concatenates the full markdown of your top-scored pages into one fetchable document.
Three options: (1) hand-write it — fine for small sites, becomes a maintenance burden past ~20 pages; (2) generate it once and host the static file — drifts the moment you add a page; (3) auto-generate it from your sitemap so it stays current automatically. Crawlytics does option 3 — paste a snippet, point it at your sitemap, and we serve /llms.txt, /llms-full.txt, and per-page /md/
llms.txt is the curated index — short. It lists your top pages grouped by category (Product, Pricing, Blog, etc.) with one-line descriptions. An AI bot fetches it to understand your site map. llms-full.txt is the full bundle — the actual markdown of your top-scored pages concatenated into a single document an AI can ingest in one fetch.
No. llms.txt is a static text file at /llms.txt that Google's crawlers ignore entirely (Googlebot reads HTML, not llms.txt). It doesn't affect SEO rankings either positively or negatively in Google. Its purpose is to help AI bots (GPTBot, ClaudeBot, PerplexityBot) understand and cite your content — which is increasingly relevant for AI search traffic.
Each page gets a 0–50 score from six signals: sitemap priority, URL depth (shallower wins), category (homepage / about / pricing / product / tools / docs / blog), word count, recency (from sitemap lastmod), and whether it has a meta description. The top-scored pages flow to the top of llms.txt and into llms-full.txt.
Currently 100 pages per crawl invocation (bounded by serverless request budget). Larger sites are handled by re-clicking Start crawl — the worker skips pages already in the database and processes the next 100. A 480-page site finishes in 5 clicks. Auto-continue is on the roadmap.
Crawlytics fetches the server-rendered HTML response. For SPAs that require JS to render, ensure server-side rendering is enabled or use a prerender service in front of your origin.
This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt