← all posts
CloudflareAI BotsNews

Cloudflare's New AI Bot Rules Go Live September 15 — What Site Owners Must Do Now

Cloudflare's new AI bot rules go live September 15, 2026, splitting crawlers into Search, Agent, and Training. Audit your AI bot traffic control today.

Crawlytics Team · · 9 min read

Quick answer

On September 15, 2026, Cloudflare rolls out a new AI bot taxonomy that sorts crawlers into three categories: Search, Agent, and Training. The new default blocks Agent and Training crawlers on ad-bearing pages for new domains, while existing domains keep their current rules until an owner changes them. That gap is the risk: a blanket block can shut out ChatGPT-User and other Agent fetchers that browse product pages and complete purchases, the visitors most likely to convert. Audit which category each bot on your site falls into before you touch the toggle, then allow Search, transact with Agents, and gate or monetize Training.

Mark September 15, 2026 on your calendar. That is the day Cloudflare's new AI bot rules go live, and the default for new domains flips from "let most crawlers through" to "block Agent and Training crawlers on ad-bearing pages." If you run a site, the question is no longer whether AI bots read your content. It is which category each bot falls into, and whether the default that ships that day helps you or quietly costs you traffic.

Cloudflare laid this out in its Content Independence Day 2.0 announcement. The core change is a taxonomy: every AI crawler now sorts into one of three buckets, Search, Agent, or Training. Each bucket has a different value profile, and the new default treats them differently. Get the mapping wrong and you can block a buyer while feeding a free-rider.

What changes on September 15, 2026

For new domains onboarded to Cloudflare, the default setting will block Agent and Training crawlers on ad-bearing pages. Search crawlers still pass through. Existing domains are not switched automatically; they keep whatever rules they already have.

That sounds reassuring for established sites. It is also the trap. A default that ships "blocked" for every new domain is the direction the whole industry is drifting, and inheriting it by accident is a worse outcome than choosing it on purpose.

The three categories, in plain English

The taxonomy is the useful part of this release, so it is worth getting the definitions exactly right.

Search. A Search crawler indexes your pages so it can send people back to you later. Googlebot is the canonical example. Someone searches, clicks a result, and you get a visit. This is the old bargain, and it still mostly works. Under the new rules, Search is the category Cloudflare leaves on by default.

Agent. An Agent crawler fetches a page live, right now, because a specific person asked for it. ChatGPT-User is the one to remember. It fires the moment someone pastes your link into ChatGPT or asks the assistant to check something on your site. The person is real, they are usually in a research or buying mindset, and the agent is acting on their behalf. Agent traffic is the closest thing to a warm lead in the AI stack.

Training. A Training crawler collects your content to train a model. GPTBot on its training pass is the example. There is no click coming back. The value flows one way, out of your site and into a model that may later answer questions about your topic without ever naming you.

One bot can wear two hats. GPTBot counts as both Training and Search, depending on what OpenAI is doing with a given fetch. That is why a single allow-or-block decision per user-agent was always too blunt, and why the category model is a real improvement even if the September 15 default leans aggressive.

CategoryWhat it isExample botsNew default (Sept 15)Recommended treatment
SearchIndexes pages to send referral trafficGooglebot, GPTBot (search side)AllowedAllow. Blocking Search is almost never the right move.
AgentFetches live on a specific person's behalfChatGPT-UserBlocked on ad-bearing pages (new domains)Allow and make transactable. This is your warmest AI traffic.
TrainingExtracts content to train a modelGPTBot (training pass)Blocked on ad-bearing pages (new domains)Gate or monetize. Judge per bot, not with a blanket wall.

The table is the whole strategy in one view. GPTBot appears twice on purpose, because it spans two categories, and any policy that ignores that will over-block or over-share.

Why existing domains have to audit now

The reassurance that existing domains keep their settings is exactly why you cannot relax. There are two reasons.

First, the default is where the platform is heading. New domains ship blocked. Over time, plugins, dashboards, and one-click presets nudge everyone toward the same posture. If you do not decide deliberately, the platform decides for you.

Second, most site owners have no idea which category their bots fall into, because they have never measured it. You might be blocking ChatGPT-User already through an old robots.txt line or a WAF rule you copied from a blog post two years ago. You might be handing a Training crawler your entire catalog for free while blocking the Agent that would have sent a buyer. Auditing before September 15 means you make the call with data instead of inheriting a default you never chose. Our guide to tracking which AI bots crawl your site walks through how to see it, and our take on the default-deny posture covers when a stricter stance actually pays off.

Why blanket blocking backfires

The instinct, when a big infrastructure provider says "AI bots are a problem," is to block them all. That instinct is wrong, and the category model shows why.

Block every AI bot and you also block Agent traffic. When a shopper asks ChatGPT "does this product ship to Canada" and pastes your URL, ChatGPT-User fetches the page live. Block it, and the assistant answers "I could not access that site," then recommends a competitor whose pages it could read. You did not protect anything. You handed away a sale.

The same logic applies to assistants completing a purchase. An Agent that can reach your product page, read the price, and call a checkout tool is a customer standing at the register. Block the category and you have unplugged the newest storefront on the internet.

Training is where blocking makes sense, and even there it is a judgment call rather than a reflex. A Training crawler that hits you thousands of times and never sends a visit is extractive, and gating or charging it is reasonable. We work through that math in the crawl-to-referral ratio, and the short version is that the decision should be per category, not a wall around your whole site. The point of the three-bucket model is that "block AI bots" is now a sentence too vague to act on. Which bots, doing what, and worth how much to you?

What Cloudflare's toggle doesn't give you

Cloudflare's control is real and worth using. It enforces allow or block at the edge, fast, before a request ever touches your origin. That is genuinely valuable, and Crawlytics does not replace it. A toggle answers one question, "did I let this bot in?", and leaves three unanswered.

Volume per category. The toggle blocks or allows. It does not tell you how much of your bot traffic each category actually represents, which pages each one hits, or which ones send visitors back. Without that, you are tuning a policy blind.

Tailored content for agents. Blocking is binary. It cannot hand an Agent a clean, chrome-free version of your page that is easy to read and act on. ChatGPT-User reads HTML and discards markdown, so serving it the right format decides whether it can use your content at all. A yes-or-no gate has no opinion about format.

Monetization. A toggle can say no. It cannot say "yes, for a price." If you want to charge a Training crawler or let an Agent transact on your behalf, you need a layer above the edge that understands who is asking and what they are worth.

This is where Crawlytics sits, next to Cloudflare rather than against it. Cloudflare enforces at the edge; Crawlytics gives you the per-site visibility to know what to enforce and the tools to do more than block. DETECT shows which category each bot on your site falls into, how often it hits, and which pages. SERVE generates an llms.txt that signals your policy and serves AI-Optimized HTML so Agent fetchers get content they can actually read. SELL drops a WebMCP snippet so Agent traffic can search, book, or check out instead of bouncing. The Visibility tier is $29.99/mo, Commerce is $49.99/mo, and both run on any stack, not only Cloudflare-proxied sites.

What to do before September 15

Here is the clean checklist. Work top to bottom.

  1. Measure first. Before you touch any rule, find out which bots hit your site and which category each falls into. You cannot set a smart policy on data you do not have.
  2. Allow Search. Googlebot and the search side of GPTBot send referrals. Leave them on. Blocking Search is almost never the right move.
  3. Make Agents transactable, not blocked. ChatGPT-User and other Agent fetchers carry live buyers. Serve them readable content, and if you sell online, give them a WebMCP tool so they can act instead of bouncing.
  4. Gate or monetize Training. This is the category the September 15 default is pointed at, and where it mostly makes sense. Decide per bot: allow the ones that cite you, gate or charge the ones that only extract.
  5. Do not inherit the default blind. If you spin up a new domain, it ships blocked. Decide whether that fits your goals rather than discovering it after your ChatGPT referrals vanish.

The one-line version: allow Search, transact with Agents, gate or monetize Training, and measure before you decide any of it. September 15 is a deadline for a decision, not a reason to panic-block. The sites that win the next year are the ones that treat AI bot traffic as three different audiences, because that is what it is.

Frequently asked questions

When do Cloudflare's new AI bot rules take effect?

September 15, 2026. On that date Cloudflare's new default blocks Agent and Training crawlers on ad-bearing pages for newly onboarded domains. Existing domains keep their current settings until an owner changes them, so the practical deadline for auditing your rules is before that date, not after.

What are the three AI bot categories?

Search, Agent, and Training. Search crawlers index pages to send referral traffic, and Googlebot is the example. Agent crawlers fetch a page live on a person's behalf, and ChatGPT-User is the example. Training crawlers collect content to train models, and GPTBot on its training pass is the example. One bot can fall into more than one category: GPTBot counts as both Training and Search.

Should I just block all AI bots to be safe?

No. Blocking every AI bot also blocks Agent fetchers like ChatGPT-User, which browse your product pages when a user asks an assistant about you and can complete a purchase on their behalf. Blanket blocking removes the visitors most likely to convert. Measure which bots actually visit and what each one does before you block anything.

What does Cloudflare's toggle not tell me?

It enforces allow-or-block at the edge, but it does not report how much of your bot traffic each category represents, it cannot serve tailored content to agents, and it cannot let you charge for access. Crawlytics adds that per-site visibility, generates an llms.txt to signal your policy, and drops a WebMCP snippet so Agent traffic can transact.

How is Crawlytics different from Cloudflare's bot controls?

Crawlytics complements them rather than competing. Cloudflare enforces allow or block at the edge, before the request reaches your origin. Crawlytics shows which category each bot on your site falls into, how often each hits which pages, generates llms.txt and AI-Optimized HTML to guide them, and adds WebMCP commerce so Agent traffic can convert. You can run both.

Related

Tagged:cloudflareai-botsnews

Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →

Make your site AI-agent ready.

Stop guessing how AI sees your site. Crawlytics tracks every bot, generates llms.txt, and powers agent commerce, all from one snippet.

Or compare plans →

7 days free · No charge until day 8 · Cancel anytime