Understanding and controlling AI crawler activity on your website

As AI crawlers transform the business of content creation, how will your organization respond?

Partner Content Generative AI has upended a foundational internet economic model, and many digital businesses haven’t caught up.

Historically, web content creators had little problem being repeatedly indexed and crawled by search engines and other digital platforms. Doing so meant more traffic, which content creators monetized via advertising.

This model was never perfect, with social media companies attracting ire for hosting third-party content on their own platforms rather than sending traffic to the content’s creator. But generative AI significantly exacerbates the problem. According to Cloudflare network data, many leading AI crawlers scrape content hundreds or thousands of times for every referral they send in return:

Crawler Crawl-to-refer ratio
Google 5.4:1
Perplexity 181:1
OpenAI 1100:1
Anthropic 42000:1

How AI crawling affects different business models

How do AI crawlers affect content creating businesses like media, publishing, and online communities? Google search, a major traffic generator, offers an example. In mid 2024, Google began introducing generative AI into search results. In the year after that launch, search clickthrough rates dropped by an estimated 30%, according to BrightEdge research.

That finding does not even factor in the estimated 27% of users who now use AI tools like ChatGPT and Claude in lieu of search engines. Together, these traffic and ad view losses could eat into most of the average media organization’s profit margin of 32%.

What’s more, AI crawling’s effects stretch beyond media and publishing. Organizations across many sectors need better visibility into and control over crawler activity:

  • Research and consulting: These businesses may depend on subscription paywalls to fund proprietary research. And one study found that 50% of certain generative AI crawls were able to access content protected by paywalls.
  • Retail and travel/hospitality: Large-scale AI crawling can hurt website performances and skew marketing analytics.
  • Financial services: These businesses may not want AI crawlers to produce misinformation as a result of scraping time-sensitive data or information that’s subject to regulatory control.
  • B2B: Low referral rates from AI crawlers can hurt SEO- and content-driven awareness generation and lead acquisition.
  • Public sector: Governmental organizations that act as primary source of official, factual information may want to prevent public content from being misrepresented in generative AI summaries. In addition, some may again want to keep crawlers from accessing sensitive information.

Establishing an AI crawler strategy

The range of impacts AI crawlers can have on different businesses means no single response will work in every case. With that said, organizations who want to better manage AI crawling on their website should consider the following high-level steps:

  • Get visibility into crawler activity. While this may sound obvious, many organizations are surprised at the degree to which certain AI services are crawling their sites. In some cases, crawling may come from an AI service the organization is unfamiliar with. In others, a crawler may simply ignore site access guidelines like the robots.txt file.
  • Determine crawler access preferences across different content types. The more specificity, the better. For example, an organization might decide to block AI crawlers from pages where original content is monetized through ads or lead capture forms, while allowing those crawlers to access technical documentation for developers.
  • Apply block/allow rules with an application security service you trust. Basic blocking and allowing is a good start, but how confident are you in the service’s ability to detect a range of AI crawlers? And how easy is it to create new rules at scale, and to adapt them as the AI landscape changes?

Cloudflare’s AI audit provides better control over AI crawling

Cloudflare’s AI Audit gives organizations the visibility they need to make informed decisions on AI crawling, and the granular controls to enforce those decisions. It shows how often various crawlers from various AI companies are crawling specific webpages, and lets you block or allow them as you prefer.

AI Audit is a feature of Cloudflare’s connectivity cloud, a unified platform of security, connectivity, and developer services that sits in front of 20% of all web properties, including 80% of the top generative AI companies. This intelligence helps AI Audit detect crawlers that hide or don’t advertise their purpose — and also helps our other security services detect and block other malicious bots. Learn more about AI Audit here, and request a live demo here.

Contributed by Cloudflare.

More about

TIP US OFF

Send us news