Google's Open Knowledge Format: The Next llms.txt?

Summary

Google's OKF is a markdown+YAML spec for packaging AI-readable knowledge. How it relates to llms.txt, AGENTS.md, and the context-layer stack forming around AI agents.

Contents

Key facts


A quiet pattern is forming across how AI systems receive context about the web. First came llms.txt, the Jeremy Howard spec that tells AI crawlers what a site contains. Then developer teams started dropping AGENTS.md and CLAUDE.md into their repos to give coding assistants project-specific orientation. Now Google Cloud has published the Open Knowledge Format, a spec for packaging structured knowledge so AI producers and consumers can exchange it without writing custom integration code on both sides.

None of these are the same thing. But they are clearly circling the same problem: the web's existing surface — HTML pages, sitemaps, robots.txt — was never designed to tell an AI system what something means, just where it is. A context layer is forming to fill that gap. OKF is the newest piece of that layer, and it's worth understanding on its own terms before deciding what to do about it.

What OKF actually is

The Google Cloud announcement describes OKF as an open format for representing structured knowledge that AI systems can produce and consume. The format is markdown with YAML frontmatter — the same combination most static-site generators, documentation systems, and llms.txt files already use. The YAML carries typed metadata (entity types, relationships, provenance); the markdown carries the human-readable content.

The producer/consumer framing is the key idea. OKF imagines a world where a database, a knowledge graph, or an enterprise data system can emit a document and any compatible AI agent or retrieval system can ingest it without needing bespoke connectors. Think of it as a lingua franca for knowledge handoff: the producer doesn't need to know which agent will consume the document, and the consumer doesn't need to reverse-engineer the producer's schema.

As of mid-2026, OKF is at version 0.1. The spec is public (Search Engine Journal's announcement piece has the overview) and Google Cloud is the primary backer. Tooling is sparse; agent-side consumption is not yet widespread.

The context-layer stack that's forming

OKF doesn't exist in isolation. Put it next to the other context-layer files that have appeared over the last two years and a pattern becomes visible:

These files serve different scopes and different audiences, but the underlying logic is identical: don't make the AI reconstruct meaning from raw, unstructured content. Pre-package the context it needs.

That convergence is not coincidence. AI agents are bad at inferring meaning from sources designed for humans. A pricing page built for a person to scan contains everything an agent needs, but wrapped in layout markup, navigation chrome, JavaScript-rendered conditionals, and implicit assumptions that a human fills in from context. Every one of these context-layer formats is a bet that providing structure upstream is cheaper than scraping meaning downstream.

How OKF differs from llms.txt

The practical differences matter when you're deciding what to build against.

llms.txt is a navigation index. It sits at /llms.txt on your domain, lists the pages that matter (with optional brief descriptions), and points to an extended llms-full.txt for sites that want to surface more content. Agents fetch it once to understand the site's structure, then use that map to decide which pages to retrieve. It's closer to a table of contents than a data format.

OKF is a packaging format for individual knowledge documents. It's not a site-level index; it describes a single piece of structured knowledge (an entity, a concept, a dataset entry) with typed metadata in the YAML header. The relationship between them is more complementary than competitive: llms.txt could theoretically point an agent to the right OKF document, and OKF could carry the structured payload that agent then processes.

The other significant difference is maturity. llms.txt has a finalized spec, an established community (the llms-txt.org site), and documented fetching behavior from real crawlers including Anthropic's. You can verify your file is being fetched today by checking your server logs. OKF is a v0.1 proposal from a single vendor. There's no equivalent adoption evidence yet, and the tooling to produce or consume OKF documents at scale doesn't exist in the same way.

Spec-truth vs market-truth

One reliable mistake in this space is treating a published specification as evidence of deployed capability. The web platform is full of specifications that took years to move from proposal to browser default, and AI agent standards are no different. When evaluating a new format like OKF, the useful questions are: which agents are consuming it today, where is the tooling, and what does adoption look like outside the announcing organization?

For OKF right now, those answers are thin. The spec exists. Google Cloud is the primary backer. But there's no equivalent of the llms.txt fetching evidence, no catalog of AI agents that treat OKF as a first-class input, and no community tooling comparable to what's grown up around the Howard spec. That can change quickly, and it likely will. But "v0.1 with enterprise-first focus" is a different reality than "actively fetched by ChatGPT-User, Claude, and Perplexity."

This doesn't mean OKF isn't worth understanding. Early-mover advantage is real in this space, and understanding the spec before you need to act on it is the right posture. The distinction is between watching (free, low-cost) and building (time, migration risk, opportunity cost).

What to do today, and what to defer

The practical split is straightforward.

Ship llms.txt now. The what-is-llms-txt-guide has the full setup walkthrough, but the short version is: a plain-text markdown file at your domain root, listing your most important pages, with a pointer to llms-full.txt if you want to surface extended content. It takes 15 minutes to do manually on most platforms, or you can generate and maintain it automatically. Multiple crawlers are actively fetching it. The investment-to-payoff ratio is high and the downside is as close to zero as any web infrastructure decision gets.

After you've shipped the file, the more valuable step is finding out whether it's actually being fetched. Most sites that publish llms.txt never check whether any AI system reads it. Your server logs will tell you. Crawlytics surfaces this in your bot-traffic dashboard, breaking down which AI crawlers hit which files and how often.

Watch OKF, but don't build against v0.1 yet. Subscribe to the spec updates, read the Google Cloud docs, understand the producer/consumer model. When agents start announcing OKF as a consumption target, that's the signal to move. Building structured knowledge packaging pipelines before that adoption evidence exists means absorbing the cost of an early draft spec with no clear payoff date.

The broader pattern is worth internalizing regardless of where OKF lands: a context layer is forming between the public web and the AI systems that operate on it. llms.txt is the clearest example of that layer today. OKF may become another piece of it. The sites that treat context-layer files as infrastructure rather than optional experiments will have a structural advantage as agent traffic continues to grow. Whether that grows through llms.txt alone, or eventually through a stack that includes OKF and formats not yet announced, the underlying work is the same: help AI systems understand what you have and what matters.

For a wider look at how AI agents navigate and use the content they find, what-is-the-agentic-web covers the full picture, and blended-retrieval explains how systems like Gemini are already fusing public and private context in a single retrieval pass.

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →

Frequently Asked Questions

Cite this page

Related on this site


This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt

Site index for AI agents: llms.txt · sitemap