Quick answer
Exa just raised a $250M Series C at a $2.2B valuation, led by a16z, to build "the search engine for AIs," and its announcement projects that AI agents will run more searches than humans this year, growing to roughly 1,000x today's Google volume within a few years. Every one of those searches ends in a fetch against a real website, so a new class of high-volume crawlers is already hitting your origin, mostly invisible in standard analytics. The practical response is to make your site easy for agents to read and to measure who is actually reading it: serve clean crawlable content, publish a current llms.txt, and watch your bot traffic. See what is hitting your site at /demo, then publish an llms.txt so agents have a predictable index to fetch.
Exa, a startup building what it calls "the search engine for AIs," just raised a $250 million Series C at a $2.2 billion valuation, led by Andreessen Horowitz. Buried in the announcement is the number every site owner should sit with: Exa expects AI agents to run more searches than humans this year, and within a few years it projects LLM-driven search could reach roughly 1,000 times today's Google query volume.
That is not a slide from a pitch deck with no product behind it. Per Exa's Series C announcement, the company already powers search for coding agents like Cursor and Cognition, for HubSpot, and for more than 400,000 developers. Its own crawler is tracking 500 billion URLs. The funding is a signal, and the signal points straight at your servers.
Here is why that matters for anyone who runs a website, and what to actually do about it.
What Exa is, and why a search layer for agents matters
When a person searches, they open a browser, type a query, scan a page of blue links, and click. When an agent searches, none of that happens. A language model or an autonomous agent calls a search API, gets back structured results, and reads the underlying pages itself to extract an answer or take an action. Exa is building the index and API for that second path. Its pitch, in its own words, is to be "the search engine for AIs" rather than for people.
An agent-search layer matters because agents have different needs than human searchers. A human wants a ranked list to choose from. An agent wants the content itself, cleanly and cheaply, so it can reason over it. That is a real infrastructure gap, and $250 million led by a16z is a bet that filling it is one of the larger opportunities in software right now. Exa's traction (Cursor, Cognition, HubSpot, 400,000+ developers) says the demand is already here, not hypothetical.
The part that lands on you: every agent search resolves to a fetch. When Exa's index, or any agent built on it, decides your page is relevant, something requests that page from your origin. Multiply that by the volume Exa is describing and you get a new, fast-growing category of traffic that has nothing to do with human visitors.
What the fundraise signals about the volume and nature of bot traffic
Read the announcement as a traffic forecast and it says something specific. If agents run more searches than humans this year, and if that volume trends toward 1,000x today's Google in a few years, then the composition of who reads your site is changing under you. Human pageviews may stay flat while machine reads climb.
Two things about this new traffic are worth naming plainly.
First, it is high volume. A crawler tracking 500 billion URLs is not a niche bot politely sampling a few pages. Agent search fans out: one user question can trigger several searches, each fetching multiple pages. The request count per useful outcome is much higher than a single human session.
Second, it is low visibility. Standard analytics were built for people. GA4 counts sessions that execute JavaScript and fire a tag, and most agent fetches do neither. When an AI assistant does send a human back to your site, its in-app browser usually strips the Referer header, so the visit shows up as "(direct)" with no hint an AI sent it. The result is a widening gap between the traffic you can see and the traffic you actually get. We wrote about that blind spot in the silent AI-agent funnel, and Exa's numbers make it wider.
Here is the shift in one table.
| Dimension | Human search era | AI agent search era |
|---|---|---|
| Who runs the query | A person in a browser | An LLM or agent, often via an Exa-style search API |
| Volume trajectory | Flat to declining | More searches than humans this year; ~1,000x Google in a few years (per Exa) |
| How your page is read | Rendered HTML, JavaScript executed | Raw fetch, often no JavaScript; clean text and structure win |
| What it wants | A ranked list of links to choose from | Direct, extractable answers and a machine-readable index |
| How it shows in GA4 | Organic referral | Often "(direct)" or an unfamiliar crawler, or nothing at all |
None of the right-hand column is speculative. It is the logical consequence of the volume Exa just raised $250 million to serve.
Why AI-agent discovery is now a competitive necessity
For twenty years the game was ranking in a list of links a human would scan. That game does not disappear, but a second game now runs alongside it: being readable, and being read, by the agents doing the searching. If agents out-search humans this year, the site an agent can parse and cite has an edge over the one that ranks well but fetches poorly.
Three things decide whether you win that second game.
Clean, crawlable content. Most agents do not execute JavaScript. If your key content only appears after a client-side render, an agent fetching your URL may see an empty shell. Server-rendered HTML with real text, headings, and structured data is what an agent can actually extract. This is the same discipline that helped human SEO, now with higher stakes because the reader is a machine with no patience for your framework.
A predictable index. The llms.txt convention exists so AI systems have one stable place to learn what your site contains and which pages matter, instead of reconstructing that from raw HTML on every visit. As agent search scales toward the volumes Exa describes, lowering the cost for a crawler to understand you is a real advantage. Our full llms.txt guide walks through the format.
Bot detection. You cannot make good decisions about agent traffic you cannot see. Which crawlers reach you, which pages they favor, whether they honor your llms.txt in practice, whether volume is trending up: none of that is in a normal analytics dashboard. Measuring it is the prerequisite for everything else, including the increasingly common question of whether to block, allow, or monetize a given bot.
How Crawlytics helps you see and respond
Crawlytics is one snippet (a Cloudflare Worker, Vercel middleware, WordPress plugin, Express middleware, or an nginx/Apache log shipper) built around three layers that map cleanly onto the problem Exa's announcement describes.
Detect. The dashboard shows which AI bots hit which pages, when, with per-bot, per-page, and per-day time series and a 14-day projection. It recovers ChatGPT, Claude, and Perplexity referral attribution from the "(direct)" bucket in Google Analytics by injecting per-LLM UTM tags, so the human traffic agents send you stops hiding. This is how you turn the invisible agent-search traffic into something you can watch move. Start with the live bot-traffic dashboard at /demo.
Serve. Crawlytics generates /llms.txt and /llms-full.txt at stable URLs and keeps them current with a daily re-crawl, and it serves AI-Optimized HTML (clean, chrome-free, with JSON-LD) to AI bots by routing on the User-Agent. Instead of hoping an agent parses your JavaScript-heavy page, you hand it content built to be read. If you only do one thing after reading this, make it this one: spin up an index with the llms.txt generator.
Sell. For sites that want agents to transact, not just read, the WebMCP commerce snippet registers agent-callable tools (search, checkout, book, lead-capture) and attributes conversions back to the agent that drove them. That is the far end of the same trend: once agents are searching at scale, some of them will want to act.
Crawlytics is host-agnostic and complements edge tools rather than replacing them. It runs on any stack, and it fills the per-site visibility gap that aggregate industry dashboards cannot. The Visibility tier is $29.99/mo and the Commerce tier is $49.99/mo, with a free scan at /demo.
What to do this quarter
Lead with the answer: measure first, then make yourself readable, then decide policy.
- Measure. Run a free scan at /demo and confirm which AI crawlers already reach your site and which pages they read. If Exa's forecast is even directionally right, this number is going up, and you want a baseline now.
- Make yourself readable. Serve real server-rendered HTML on the pages you want cited, and publish an llms.txt so agents have a stable index instead of guessing.
- Decide policy deliberately. Do not blanket-block agent crawlers by reflex. Blocking the fetchers that power AI answers cuts you out of the exact search surface Exa is scaling. Decide per bot, based on data, using a framework like our guide to tracking AI bots.
The through-line of Exa's raise is simple. A very well-funded company just told the market that machines will soon do most of the searching, and it is spending $250 million to serve them. The sites that treat agent discovery as infrastructure, not an afterthought, are the ones those searches will find, read, and cite.
Frequently asked questions
What is Exa and why does it matter for site owners?
Exa is a search company building what it calls "the search engine for AIs," an index and API that language models and agents query instead of a human typing into Google. It just raised a $250M Series C at a $2.2B valuation, led by a16z. It matters to site owners because every agent search ends in a fetch against real websites, including yours. Exa already powers search for Cursor, Cognition, HubSpot, and more than 400,000 developers, and its own crawler is tracking 500 billion URLs, so the volume of machine reads hitting your pages is climbing fast.
What does "AI agents will search more than humans" mean for my traffic?
Exa's announcement projects that AI agents will run more searches than humans this year, and that within a few years LLM-driven search could reach roughly 1,000 times today's Google query volume. In practice that means a growing share of the requests hitting your server are agents fetching pages to answer or act on someone's behalf, not people browsing. Human pageviews may stay flat or fall while machine reads rise, so the traffic you can see in a normal analytics dashboard tells you less and less about who is actually consuming your content.
Will Exa's crawler and other agent traffic show up in my analytics?
Usually not in the way you would expect. Standard analytics like GA4 are built around human sessions with JavaScript, so many agent fetches never fire the tracking tag and never appear. When an AI assistant does send a referral click, in-app browsers often strip the Referer header, so the visit lands under "(direct)." To see agent and crawler traffic you need to read it from server logs or a User-Agent-aware tool. Crawlytics does this and shows which bots hit which pages, per bot and per day.
What should I do to get my site ready for AI agent search?
Start by measuring: confirm which AI crawlers reach your site and which pages they read, because you cannot fix what you cannot see. Then make your content easy to fetch by serving clean, server-rendered HTML rather than JavaScript-only pages, since most agents do not execute JavaScript. Publish an llms.txt file that indexes your important pages so agents have a predictable place to look. Finally, avoid blanket-blocking agent crawlers, which cuts you out of AI answers. Crawlytics covers all three: a bot dashboard at /demo, an llms.txt generator, and per-page readiness scoring.
Does an llms.txt file help with AI agent search?
Yes. llms.txt is a plain-text markdown file at a stable URL that lists your most important pages and describes what your site contains, giving AI systems a predictable index instead of forcing them to guess your structure from raw HTML. As agent-driven search volume grows, a current llms.txt lowers the cost for a crawler to understand and cite you. Crawlytics generates llms.txt and llms-full.txt from your sitemap and keeps them fresh with a daily re-crawl, so you do not maintain the file by hand.
Related
Written by Crawlytics Team. Crawlytics tracks AI bots, generates llms.txt, and powers WebMCP commerce, all from one snippet on any stack. See how it works →