ai-catalog.json: Put Your Site on the AI Agent Discovery Map

Summarize with: ChatGPT Claude Perplexity

Key takeaways

ai-catalog.json is a machine-readable file you host on your own domain that lists the tools, MCP servers, APIs, and agents your site exposes, so AI agents can discover your capabilities at runtime instead of being wired up by hand.
It comes from Agentic Resource Discovery (ARD), a v0.9 draft spec backed by Google, Microsoft, GitHub, Hugging Face, NVIDIA, Salesforce, Cisco, Databricks, ServiceNow, Snowflake, and GoDaddy, built on the Linux Foundation's AI Catalog data model and released under Apache 2.0.
Where llms.txt tells AI crawlers what your site is about, ai-catalog.json tells AI agents what your site can do. They sit at different layers and do not compete.
Unlike most day-one specs, ARD launched with real consumers: GitHub's agent finder for Copilot, Hugging Face's Discover Tool, and Cisco's AGNTCY directory all read ARD catalogs as of launch. Google's Agent Registry support is announced but not yet live.
If you do not expose tools or APIs to agents, there is nothing to catalog yet, so watch the spec. If you run an MCP server or public API, it is worth understanding now while the format is young.

A new file is taking shape in the layer between your site and the AI systems that act on it. First came llms.txt, which tells AI crawlers what your site contains. Then Google's Open Knowledge Format proposed a way to package structured knowledge for AI consumers. Now a consortium of large technology companies has introduced ai-catalog.json, a file that tells AI agents not what your site says, but what it can do.

If you have already published an llms.txt file or thought about letting agents transact on your site, this is the next file worth understanding. It targets a different job, and it arrived with more real adoption than these specs usually get on day one.

What ai-catalog.json actually is

According to Search Engine Journal's coverage, an organization publishes an ai-catalog.json file at a well-known path on its own domain. The file lists the tools, MCP servers, agents, or APIs it makes available. An AI agent can then read that catalog at runtime and learn what it is allowed to call, without a developer manually integrating each capability ahead of time.

This is the part that makes it new. Today, an agent can only use your MCP endpoint or API if someone configured it to know that endpoint exists. ARD removes that manual step. The agent discovers your capabilities the moment it needs them, the same dynamic that made the open web work for search crawlers, applied to actions instead of pages. The spec is a v0.9 draft, licensed under Apache 2.0, built on the AI Catalog data model maintained by a working group under the Linux Foundation.

How it differs from llms.txt

The cleanest way to hold the two files in your head: llms.txt is about content, ai-catalog.json is about capability.

llms.txt sits at your domain root and acts as a navigation index. It points AI systems to the pages that matter and, with llms-full.txt, can surface extended content. Its job ends once the agent knows what you have published. We cover that role in depth in the llms.txt setup guide, and John Mueller's point that the file is for on-site navigation rather than discovery, covered in Mueller on llms.txt, is worth keeping in mind: these files help an agent that has already arrived.

ai-catalog.json picks up where content ends. It does not describe what you have written; it describes what an agent can invoke. If llms.txt is the table of contents, ai-catalog.json is the control panel. That connects it directly to the WebMCP world, where the actual tools an agent calls live. ARD is the discovery layer that points agents to those tools in the first place.

A trust model you already understand

The reason this should feel familiar is the trust model. Because each catalog sits on the publisher's own domain, ARD uses domain ownership to verify who published it, exactly the pattern behind robots.txt and llms.txt. You control your domain, so you control your catalog. For production use, publishers can attach trust metadata so an agent or registry can confirm the publisher's cryptographic identity, which matters once agents are spending money or provisioning resources based on what the catalog claims.

That reuse of a known pattern is a deliberate choice. Site owners already grasp domain-rooted files. ARD is betting that a capability catalog will spread faster if it asks for no new mental model, just a new file in a place you already trust.

Who it's for now, and who should just watch

Be honest with yourself about which group you are in.

Act now if you expose callable capabilities. If you run an MCP server, a public API, or registered agent tools, ARD is the standardized way to make them discoverable. The earlier you understand the format, the less likely you are to be retrofitting under pressure when a registry your buyers use starts expecting a catalog. Early-mover advantage was real for llms.txt adopters, and the same window is open here.

Watch if your site is content. If you do not offer anything an agent can invoke, ai-catalog.json has nothing to list. Publishing an empty or speculative catalog buys you nothing. The right move is to keep your content legible to agents, ship and maintain llms.txt, and revisit ARD when you actually have a tool or API to expose. The agentic web is arriving in layers, and not every layer applies to every site on day one.

Spec-truth versus market-truth

One reliable mistake with anything in this space is treating a published spec as proof of deployed capability. A draft can sit in a repository for a year before anything consumes it. The useful questions are always: which systems read this format today, where is the tooling, and how much adoption exists outside the announcing organizations.

ARD scores better here than most v0.9 drafts. It did not launch as a paper standard. GitHub's agent finder for Copilot, Hugging Face's Discover Tool, and Cisco's AGNTCY Agent Directory were reading ARD catalogs at launch, which means a publisher's catalog had real consumers on the first day. Google's Agent Registry, part of Gemini Enterprise, has announced native support, but that one is not live yet, so do not plan around it as if it were. The accurate read is "early but real," not "shipped everywhere" and not "vaporware." Holding that distinction is how you avoid both over-investing in a draft and dismissing something that already has traction.

What to do today, and what to defer

The practical split mirrors the advice we gave for OKF, because the underlying logic is the same.

Ship and maintain llms.txt now. It is the file with mature spec, broad adoption, and crawlers actively fetching it. It is also the cheapest agent-readiness move you can make. If you have not, the setup guide covers it in fifteen minutes.

Understand ARD now; build against it when your offering is stable. Read the draft, watch which registries gain traction, and if you already run an MCP server or API, map out what your catalog would list. The format will move from v0.9, so writing a production catalog today means accepting some churn.

Underneath both files is the same question every site owner should be able to answer: are AI agents actually reaching me, and what are they doing when they arrive? A discovery file only helps if agents are looking. Crawlytics surfaces that real traffic in your bot dashboard, breaking down which AI crawlers and agents hit which pages and how often. If you want a quick read before going deeper, the free Agent-Ready Grader scans your site across five categories in about a minute, no account needed. Build for the agents you can confirm are showing up, and let the file formats follow the traffic.

Tagged:ai-agentsllms.txtai-search

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →