Referer header.► Visibility · AI Attribution
Most analytics tools log clicks from ChatGPT, Claude, and Perplexity as direct traffic because mobile and in-app browsers strip the Referer header. Crawlytics injects per-LLM UTM tags into the AI-Optimized HTML bots fetch, so citations can carry attribution that survives — when the assistant preserves the tagged URL.
Experimental · best-effort
This is how AI attribution is designed to work — it's the intent, not a guarantee. Whether an assistant preserves the UTM-tagged URL in a citation (and a user taps it with the params intact) is outside our control: it varies by assistant and model version, changes over time, and some strip query parameters entirely. Treat AI-referral counts as a strong directional signal, not exact or complete attribution.
The problem
Without Crawlytics
User taps a citation in ChatGPT iOS:
GET https://yoursite.com/pricing
Referer:
GA: "Direct / None" Mixpanel: "(direct)" Plausible: "Direct" You: "...where did that come from?"
With Crawlytics
Same user, same tap:
GET https://yoursite.com/pricing ?utm_source=chatgpt &utm_medium=ai_referral &utm_campaign=crawlytics
Crawlytics: ChatGPT → /pricing → 1 visit Your GA: chatgpt / ai_referral Your Mixpanel: source=chatgpt You: "Pricing is getting cited."
How it works
01
Through your Cloudflare Worker / nginx / Vercel / WP routing snippet, GPTBot is served AI-Optimized HTML from /api/sites/:id/md/ instead of your full page. Crawlytics detects the bot company from its User-Agent.
02
Same-origin links in the markdown body are rewritten to append utm_source=chatgpt&utm_medium=ai_referral&utm_campaign=crawlytics. External links, anchors, and mailto: are left alone. Existing UTMs are preserved.
03
When ChatGPT shows your link in a citation, it usually quotes the URL with the UTMs included — but some assistants strip query params or rewrite links, so this step is best-effort, not guaranteed. When it does carry through and a user taps the citation, your analytics records utm_source=chatgpt even though the Referer is empty, and the Crawlytics AI Referrals panel rolls it up by assistant.
Per-LLM mapping
Crawlytics detects the requesting bot's company and uses the canonical source name below. Same content everywhere — different attribution stamp per LLM.
| Bot User-Agent matches | utm_source | Shown as |
|---|---|---|
| GPTBot · ChatGPT-User · OAI-SearchBot | chatgpt | ChatGPT · OpenAI |
| ClaudeBot · claude-web · anthropic-ai | claude | Claude · Anthropic |
| PerplexityBot · Perplexity-User | perplexity | Perplexity · Perplexity |
| Google-Extended | gemini | Gemini · Google |
| Copilot bots | copilot | Copilot · Microsoft |
| Bytespider | doubao | Doubao · ByteDance |
| Meta-ExternalAgent | meta_ai | Meta AI · Meta |
| YouBot | you | You.com · You.com |
| MistralAI-User | lechat | Le Chat · Mistral |
| cohere-ai | cohere | Cohere · Cohere |
| GrokBot | grok | Grok · xAI |
| Applebot-Extended | apple_intelligence | Apple Intelligence · Apple |
All utm_medium values are ai_referral. utm_campaign is always crawlytics. Filter on either to slice AI traffic from the rest of your acquisition mix.
What this is and isn't
What you get
What it won't fix
// the attribution layer
ChatGPT's mobile and in-app browsers strip the Referer header on outbound clicks. When a user taps a citation in ChatGPT iOS or Android, your server sees no Referer and Google Analytics has nothing to attribute the visit to — so it falls into "Direct / None." Same thing happens with Claude iOS and Perplexity's in-app browser. Crawlytics fixes this by injecting utm_source=chatgpt (or claude, perplexity, etc.) into the AI-Optimized HTML that bots fetch, so when they cite your URL the UTMs can ride along and survive Referer-strip — when the assistant preserves the tagged link.
Two approaches: (1) check your raw access logs for Referer values like chat.openai.com, perplexity.ai, claude.ai — but you'll miss the majority because mobile in-app browsers strip Referer; (2) use UTM-injection (what Crawlytics does) so attribution is baked into the URL the LLM cites and survives any Referer behavior. Approach 2 also gives you per-LLM source separation in GA/Mixpanel/Plausible/Fathom.
No — only for citations LLMs crawl from now on. Tagging applies when an LLM fetches markdown via the Crawlytics routing endpoint; older citations keep the URL the model already saw. Most models re-crawl popular pages every few days, so coverage ramps up quickly.
Yes — same as any UTM tag from a paid channel. Most marketers consider that acceptable; if you don't, you can strip the params client-side after recording the visit (one line in your analytics layer).
No. Google's ranking pipeline (Googlebot, Googlebot-News) is not in the bot list and is never served tagged markdown — only AI-training and AI-search bots are. Search engines see your normal HTML and your normal links.
Defaults are designed to be readable in GA / Plausible / Fathom side-by-side with your other channels: chatgpt, claude, perplexity, gemini, copilot, doubao, meta_ai, you, lechat, cohere, grok, apple_intelligence. Custom mappings are a workspace setting — contact support if you need the names tuned to match an existing UTM taxonomy.
We never overwrite an existing utm_source. If your CMS already adds a campaign tag to internal links, that tag wins and our utm_medium / utm_campaign fields are still appended so AI attribution still works alongside your existing campaign tracking.
This page is part of Crawlytics.app. View all pages: llms.txt · llms-full.txt