What Schema Markup Still Matters in the AI Search Era

Summary

Most schema is noise to LLMs. Four types still earn their keep: Article, FAQPage, Organization, BreadcrumbList. What to ship, what to skip, and why.

Contents

Key facts


The most honest sentence I can write about schema markup in 2026 is this: most of what you've been told to ship doesn't matter for AI search. ChatGPT does not parse your HowTo schema. Claude does not care about your Speakable markup. The detailed Product blocks SEO consultants spent the late 2010s telling you to add are doing nothing for your citation rate in AI answers.

That doesn't mean schema is dead. A short list still moves the needle — for both Google rich results and the small piece of AI retrieval that does pick it up. This guide separates the schema worth shipping from the schema you can quietly remove, and explains why the line falls where it does.

The honest answer — most schema is noise to LLMs

Here's what's actually happening under the hood. When ChatGPT, Claude, or Perplexity fetches your page, the retrieval pipeline does one of two things: it renders the page and extracts the visible text, or it fetches a clean markdown version (via llms.txt, content negotiation, or a built-in HTML-to-markdown converter). In neither path does the standard JSON-LD blob get parsed as Schema.org structured data.

The retrieval scorers look at headings, paragraphs, lists, tables, and entity mentions. They don't query @type: HowTo to decide whether to cite you. That's a Google rich-result behavior, not an LLM behavior. When people insist schema is "critical for AI search," they're usually conflating Google AI Overviews (which still leans on the classic Google index) with the standalone AI assistants (which mostly don't).

There's one nuance. Some schema content — names, descriptions, dates — does end up parsed because it's also present in visible page text or in meta tags the LLM reads. The signal survives; the schema container does not. That's why the four schema types below still matter: not because LLMs parse the JSON-LD directly, but because the data they contain ends up where LLMs can read it, and because Google's rich-result coverage compounds the benefit.

The four schema types that still matter

Article (with dateModified)

Article schema is the highest-leverage type to ship on every blog post and editorial page. Google uses it for the "Top Stories" carousel, article cards in AI Overviews, and date-stamping in search results. The single most important property is dateModified — it's the signal that tells both Google and downstream LLM retrieval that the content is fresh.

A minimum-viable Article block:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "How to Get Cited by ChatGPT",
  "author": { "@type": "Person", "name": "Jane Smith" },
  "datePublished": "2026-06-05",
  "dateModified": "2026-06-05",
  "publisher": {
    "@type": "Organization",
    "name": "Crawlytics",
    "logo": { "@type": "ImageObject", "url": "https://crawlytics.app/logo.png" }
  }
}
</script>

Update dateModified every time you retrofit the post. Google sometimes shows this date in SERPs, and fresher dates correlate with higher click-through. ChatGPT and Perplexity also pick up date freshness from the visible page header — but the schema acts as a fallback when the visible date isn't crawlable.

FAQPage

FAQPage schema is the second-highest-leverage type. Google still expands FAQ snippets in some verticals, and the Q/A structure happens to mirror exactly what AI retrieval scorers love — discrete, chunkable answers to discrete questions. Even though the LLM doesn't read the JSON-LD, the visible Q/A section it duplicates is the single most-cited part of most posts.

The trick is to mirror your visible FAQ section exactly. Don't ship FAQPage schema with questions that aren't visible on the page (Google deprecated that pattern in 2023 and may flag it as deceptive). The schema's job is to reinforce what's already in the rendered DOM.

If your post has a "Common questions" H2 with 3-5 H3 questions and paragraph answers, ship the matching FAQPage block. That's all the lift this type can give you.

Organization (especially sameAs)

Organization schema is the entity layer. It tells search engines and any LLM retrieval pipeline that scrapes it (Bing's certainly does, Google's does, OpenAI's index appears to ingest it inconsistently) who you are, what you're called, and which other web properties you own.

The single highest-value property in Organization is sameAs, the array of canonical URLs for your brand on other platforms — your LinkedIn, X, GitHub, Crunchbase, Wikipedia (if applicable), and any other authoritative profile. This is the entity-disambiguation signal that helps an AI engine resolve "Crawlytics" to a specific company rather than a generic term.

Ship Organization schema once, at the site root (typically in your homepage layout or a shared BaseLayout). Get sameAs right and you get cumulative benefit across every page that inherits the schema.

BreadcrumbList

BreadcrumbList is the least sexy of the four but it punches above its weight. It tells search engines (and any LLM that parses it) how a page fits in your site's hierarchy — useful for context, useful for showing breadcrumb trails in Google SERPs, useful for the rare AI engine that uses page position as a relevance signal.

It also costs almost nothing to ship. If your site has any nested structure (blog/post-slug, features/feature-slug, resources/resource-slug), generate BreadcrumbList per page from the URL path. Ten lines of template code. One-time setup, perpetual benefit.

Schema types that don't matter for AI (and why people still ship them)

A short tour of the schema types you can safely skip if AI search is your priority. None of these will hurt you; most are just a waste of template-author time.

The pattern across all of these: the schema type was designed for a specific Google rich-result feature. If you care about that feature, ship the schema. If you only care about AI citations, the schema is a no-op.

The new thing LLMs DO use that isn't classic schema

Here's the substitution. The signal that classic schema was supposed to provide — "this page is structured, here's what it contains" — is now provided in AI search by clean markdown delivery. llms.txt tells the AI client what your site contains and where to find it. llms-full.txt bundles the actual content. Content-negotiated markdown rendering (or a .md companion route) lets the AI fetch a clean version of any specific page in one request instead of scraping HTML.

These are not schema in the Schema.org sense. They're a parallel content-delivery layer that gives AI clients the same context that schema was supposed to give Google. The llms.txt setup guide covers the mechanics. The decision rule is the same as for the surviving schema types: ship it if you care about AI citations, skip it if you don't.

If you're sequencing the work: ship llms.txt first (it's the bigger AI lift), then ship the four schema types listed above (small lift but easy, and you get Google rich results as a side benefit), then audit and remove the deprecated schema types from your templates.

How to audit your existing schema in 5 minutes

You don't need a full schema audit tool to know what you're shipping. The quickest path:

  1. View source on a top page. Search for application/ld+json. Count the blocks. Each one is a schema object.
  2. For each block, check the @type. Anything in the "keep" list (Article, FAQPage, Organization, BreadcrumbList) stays. Anything else gets a question mark.
  3. Validate the keepers with Google's Rich Results Test. Paste the URL, see whether the schema parses without errors. Fix the errors — schema that throws warnings often throws away the benefit entirely.
  4. Confirm visible content mirrors the schema. If your FAQPage block has a question that isn't in the visible Q/A section, either add the question to the page or remove it from the schema. Mismatches are penalty risks.
  5. For each deprecated type, decide. Remove if you're not using the corresponding Google feature; keep if you are. Don't keep them "just in case" — they bloat the template and add maintenance surface.

Five minutes per page. Most sites end up removing 1-2 schema types and tightening 1-2 keepers. The result is a smaller, more focused schema footprint that actually helps where it can.

Implementation gotchas

A few practical traps that come up repeatedly:

Common questions

Does adding schema improve AI search visibility?

A little, indirectly. The four types listed above provide entity and freshness signals that AI retrieval pipelines can pick up — sometimes through direct parsing (Bing, Perplexity), sometimes through Google's index (AI Overviews, ChatGPT's search index when it relies on Bing). Expect a 5-10% lift in citation odds from full schema coverage, not a 50% one. The bigger AI signals are structural cleanliness and clean markdown delivery, which the schema cannot substitute for.

Which schema type has the biggest impact in 2026?

FAQPage on content-heavy sites, Organization (with sameAs) on brand sites. FAQPage because the structure aligns perfectly with how LLM retrieval chunks content. Organization because it does the heavy lifting on entity disambiguation, which matters more as more AI assistants try to resolve brand mentions to specific companies.

Will Google penalize me if I remove unused schema?

No. Removing schema can only cost you the corresponding rich-result feature; it cannot cause a ranking penalty. If you're not using the rich result (or the feature has been deprecated), the schema is dead weight. Removing it slightly improves page weight and tightens the template — net win.

Should I add schema to old posts as part of a retrofit?

Yes, if your CMS makes it easy. Article schema with an updated dateModified is a free win when you're already retrofitting a post — see the retrofit checklist. If your CMS auto-injects Article schema from templates, you may already have this covered. FAQPage schema should match the visible FAQ section you added in the retrofit pass.

Do I need schema if I have llms.txt?

You need both, and they do different jobs. llms.txt tells AI clients what your site contains and how to navigate it — useful for AI assistants and code agents. Schema tells search engines what each page contains in a structured format — useful for Google rich results and the part of AI retrieval that still relies on Google's index. They overlap roughly 0%. Ship both. Together they cover the surface area; either one alone leaves gaps.

The short list and where to put your time

If you're starting from zero, the order is: ship llms.txt first (biggest AI lift), add Article + Organization + BreadcrumbList globally next (one-time template work), then add FAQPage to any post that has a visible FAQ section. Total engineering time: half a day to one full day depending on stack. Maintenance afterward: near zero, since schema updates are mostly driven by content changes the CMS already handles.

Everything else is optional. If you're not chasing a specific Google rich-result feature, don't ship it. The schema graveyard is full of types that someone insisted were "essential" three years ago and have since quietly stopped doing anything. Focus on the four that still earn their keep, and put the freed-up time into the structural and llms.txt work that actually moves AI citations.

Related

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →

Frequently Asked Questions

Does adding schema improve AI search visibility?

A little, indirectly. The four types listed above provide entity and freshness signals that AI retrieval pipelines can pick up — sometimes through direct parsing (Bing, Perplexity), sometimes through Google's index (AI Overviews, ChatGPT's search index when it relies on Bing). Expect a 5-10% lift in citation odds from full schema coverage, not a 50% one. The bigger AI signals are structural cleanliness and clean markdown delivery, which the schema cannot substitute for.

Which schema type has the biggest impact in 2026?

FAQPage on content-heavy sites, Organization (with sameAs) on brand sites. FAQPage because the structure aligns perfectly with how LLM retrieval chunks content. Organization because it does the heavy lifting on entity disambiguation, which matters more as more AI assistants try to resolve brand mentions to specific companies.

Will Google penalize me if I remove unused schema?

No. Removing schema can only cost you the corresponding rich-result feature; it cannot cause a ranking penalty. If you're not using the rich result (or the feature has been deprecated), the schema is dead weight. Removing it slightly improves page weight and tightens the template — net win.

Should I add schema to old posts as part of a retrofit?

Yes, if your CMS makes it easy. Article schema with an updated dateModified is a free win when you're already retrofitting a post — see the retrofit checklist. If your CMS auto-injects Article schema from templates, you may already have this covered. FAQPage schema should match the visible FAQ section you added in the retrofit pass.

Do I need schema if I have llms.txt?

You need both, and they do different jobs. llms.txt tells AI clients what your site contains and how to navigate it — useful for AI assistants and code agents. Schema tells search engines what each page contains in a structured format — useful for Google rich results and the part of AI retrieval that still relies on Google's index. They overlap roughly 0%. Ship both. Together they cover the surface area; either one alone leaves gaps.

Cite this page

Related on this site


This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt

Site index for AI agents: llms.txt · sitemap