Most schema is noise to LLMs. Four types still earn their keep: Article, FAQPage, Organization, BreadcrumbList. What to ship, what to skip, and why.
The most honest sentence I can write about schema markup in 2026 is this: most of what you've been told to ship doesn't matter for AI search. ChatGPT does not parse your HowTo schema. Claude does not care about your Speakable markup. The detailed Product blocks SEO consultants spent the late 2010s telling you to add are doing nothing for your citation rate in AI answers.
That doesn't mean schema is dead. A short list still moves the needle — for both Google rich results and the small piece of AI retrieval that does pick it up. This guide separates the schema worth shipping from the schema you can quietly remove, and explains why the line falls where it does.
Here's what's actually happening under the hood. When ChatGPT, Claude, or Perplexity fetches your page, the retrieval pipeline does one of two things: it renders the page and extracts the visible text, or it fetches a clean markdown version (via llms.txt, content negotiation, or a built-in HTML-to-markdown converter). In neither path does the standard JSON-LD blob get parsed as Schema.org structured data.
The retrieval scorers look at headings, paragraphs, lists, tables, and entity mentions. They don't query @type: HowTo to decide whether to cite you. That's a Google rich-result behavior, not an LLM behavior. When people insist schema is "critical for AI search," they're usually conflating Google AI Overviews (which still leans on the classic Google index) with the standalone AI assistants (which mostly don't).
There's one nuance. Some schema content — names, descriptions, dates — does end up parsed because it's also present in visible page text or in meta tags the LLM reads. The signal survives; the schema container does not. That's why the four schema types below still matter: not because LLMs parse the JSON-LD directly, but because the data they contain ends up where LLMs can read it, and because Google's rich-result coverage compounds the benefit.
Article schema is the highest-leverage type to ship on every blog post and editorial page. Google uses it for the "Top Stories" carousel, article cards in AI Overviews, and date-stamping in search results. The single most important property is dateModified — it's the signal that tells both Google and downstream LLM retrieval that the content is fresh.
A minimum-viable Article block:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Article",
"headline": "How to Get Cited by ChatGPT",
"author": { "@type": "Person", "name": "Jane Smith" },
"datePublished": "2026-06-05",
"dateModified": "2026-06-05",
"publisher": {
"@type": "Organization",
"name": "Crawlytics",
"logo": { "@type": "ImageObject", "url": "https://crawlytics.app/logo.png" }
}
}
</script>
Update dateModified every time you retrofit the post. Google sometimes shows this date in SERPs, and fresher dates correlate with higher click-through. ChatGPT and Perplexity also pick up date freshness from the visible page header — but the schema acts as a fallback when the visible date isn't crawlable.
FAQPage schema is the second-highest-leverage type. Google still expands FAQ snippets in some verticals, and the Q/A structure happens to mirror exactly what AI retrieval scorers love — discrete, chunkable answers to discrete questions. Even though the LLM doesn't read the JSON-LD, the visible Q/A section it duplicates is the single most-cited part of most posts.
The trick is to mirror your visible FAQ section exactly. Don't ship FAQPage schema with questions that aren't visible on the page (Google deprecated that pattern in 2023 and may flag it as deceptive). The schema's job is to reinforce what's already in the rendered DOM.
If your post has a "Common questions" H2 with 3-5 H3 questions and paragraph answers, ship the matching FAQPage block. That's all the lift this type can give you.
Organization schema is the entity layer. It tells search engines and any LLM retrieval pipeline that scrapes it (Bing's certainly does, Google's does, OpenAI's index appears to ingest it inconsistently) who you are, what you're called, and which other web properties you own.
The single highest-value property in Organization is sameAs, the array of canonical URLs for your brand on other platforms — your LinkedIn, X, GitHub, Crunchbase, Wikipedia (if applicable), and any other authoritative profile. This is the entity-disambiguation signal that helps an AI engine resolve "Crawlytics" to a specific company rather than a generic term.
Ship Organization schema once, at the site root (typically in your homepage layout or a shared BaseLayout). Get sameAs right and you get cumulative benefit across every page that inherits the schema.
BreadcrumbList is the least sexy of the four but it punches above its weight. It tells search engines (and any LLM that parses it) how a page fits in your site's hierarchy — useful for context, useful for showing breadcrumb trails in Google SERPs, useful for the rare AI engine that uses page position as a relevance signal.
It also costs almost nothing to ship. If your site has any nested structure (blog/post-slug, features/feature-slug, resources/resource-slug), generate BreadcrumbList per page from the URL path. Ten lines of template code. One-time setup, perpetual benefit.
A short tour of the schema types you can safely skip if AI search is your priority. None of these will hurt you; most are just a waste of template-author time.
The pattern across all of these: the schema type was designed for a specific Google rich-result feature. If you care about that feature, ship the schema. If you only care about AI citations, the schema is a no-op.
Here's the substitution. The signal that classic schema was supposed to provide — "this page is structured, here's what it contains" — is now provided in AI search by clean markdown delivery. llms.txt tells the AI client what your site contains and where to find it. llms-full.txt bundles the actual content. Content-negotiated markdown rendering (or a .md companion route) lets the AI fetch a clean version of any specific page in one request instead of scraping HTML.
These are not schema in the Schema.org sense. They're a parallel content-delivery layer that gives AI clients the same context that schema was supposed to give Google. The llms.txt setup guide covers the mechanics. The decision rule is the same as for the surviving schema types: ship it if you care about AI citations, skip it if you don't.
If you're sequencing the work: ship llms.txt first (it's the bigger AI lift), then ship the four schema types listed above (small lift but easy, and you get Google rich results as a side benefit), then audit and remove the deprecated schema types from your templates.
You don't need a full schema audit tool to know what you're shipping. The quickest path:
application/ld+json. Count the blocks. Each one is a schema object.@type. Anything in the "keep" list (Article, FAQPage, Organization, BreadcrumbList) stays. Anything else gets a question mark.Five minutes per page. Most sites end up removing 1-2 schema types and tightening 1-2 keepers. The result is a smaller, more focused schema footprint that actually helps where it can.
A few practical traps that come up repeatedly:
<script> block. It's easier to maintain, easier to validate, and Google has been pushing it for years. Microdata inline in HTML still works but is harder to keep in sync with content changes.2026-06-05 or 2026-06-05T14:30:00Z). Anything else is a coin-flip on whether parsers accept it.A little, indirectly. The four types listed above provide entity and freshness signals that AI retrieval pipelines can pick up — sometimes through direct parsing (Bing, Perplexity), sometimes through Google's index (AI Overviews, ChatGPT's search index when it relies on Bing). Expect a 5-10% lift in citation odds from full schema coverage, not a 50% one. The bigger AI signals are structural cleanliness and clean markdown delivery, which the schema cannot substitute for.
FAQPage on content-heavy sites, Organization (with sameAs) on brand sites. FAQPage because the structure aligns perfectly with how LLM retrieval chunks content. Organization because it does the heavy lifting on entity disambiguation, which matters more as more AI assistants try to resolve brand mentions to specific companies.
No. Removing schema can only cost you the corresponding rich-result feature; it cannot cause a ranking penalty. If you're not using the rich result (or the feature has been deprecated), the schema is dead weight. Removing it slightly improves page weight and tightens the template — net win.
Yes, if your CMS makes it easy. Article schema with an updated dateModified is a free win when you're already retrofitting a post — see the retrofit checklist. If your CMS auto-injects Article schema from templates, you may already have this covered. FAQPage schema should match the visible FAQ section you added in the retrofit pass.
You need both, and they do different jobs. llms.txt tells AI clients what your site contains and how to navigate it — useful for AI assistants and code agents. Schema tells search engines what each page contains in a structured format — useful for Google rich results and the part of AI retrieval that still relies on Google's index. They overlap roughly 0%. Ship both. Together they cover the surface area; either one alone leaves gaps.
If you're starting from zero, the order is: ship llms.txt first (biggest AI lift), add Article + Organization + BreadcrumbList globally next (one-time template work), then add FAQPage to any post that has a visible FAQ section. Total engineering time: half a day to one full day depending on stack. Maintenance afterward: near zero, since schema updates are mostly driven by content changes the CMS already handles.
Everything else is optional. If you're not chasing a specific Google rich-result feature, don't ship it. The schema graveyard is full of types that someone insisted were "essential" three years ago and have since quietly stopped doing anything. Focus on the four that still earn their keep, and put the freed-up time into the structural and llms.txt work that actually moves AI citations.
Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →
A little, indirectly. The four types listed above provide entity and freshness signals that AI retrieval pipelines can pick up — sometimes through direct parsing (Bing, Perplexity), sometimes through Google's index (AI Overviews, ChatGPT's search index when it relies on Bing). Expect a 5-10% lift in citation odds from full schema coverage, not a 50% one. The bigger AI signals are structural cleanliness and clean markdown delivery, which the schema cannot substitute for.
FAQPage on content-heavy sites, Organization (with sameAs) on brand sites. FAQPage because the structure aligns perfectly with how LLM retrieval chunks content. Organization because it does the heavy lifting on entity disambiguation, which matters more as more AI assistants try to resolve brand mentions to specific companies.
No. Removing schema can only cost you the corresponding rich-result feature; it cannot cause a ranking penalty. If you're not using the rich result (or the feature has been deprecated), the schema is dead weight. Removing it slightly improves page weight and tightens the template — net win.
Yes, if your CMS makes it easy. Article schema with an updated dateModified is a free win when you're already retrofitting a post — see the retrofit checklist. If your CMS auto-injects Article schema from templates, you may already have this covered. FAQPage schema should match the visible FAQ section you added in the retrofit pass.
You need both, and they do different jobs. llms.txt tells AI clients what your site contains and how to navigate it — useful for AI assistants and code agents. Schema tells search engines what each page contains in a structured format — useful for Google rich results and the part of AI retrieval that still relies on Google's index. They overlap roughly 0%. Ship both. Together they cover the surface area; either one alone leaves gaps.
This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt