All Cloudflare customers get a one-click AI scraping blocker

  • Cloudflare has launched a new feature that blocks AI crawlers and scrapers from accessing a website’s content.
  • The feature is available to all Cloudflare customers, including those on the free tier.
  • The company says it will update this feature over time.

Despite what Microsoft AI’s CEO may believe, just because you publish something to the internet doesn’t give AI firms the license to hoover it up for training purposes.

As the fight between AI companies and creators wages on, Cloudflare has come to the aid of its customers who’d rather stop AI bots from scraping their websites altogether.

“The popularity of generative AI has made the demand for content used to train models or run inference on skyrocket, and, although some AI companies clearly identify their web scraping bots, not all AI companies are being transparent,” says Cloudflare citing incidents such as the tussle between OpenAI and Scarlett Johansson as examples.

Now Cloudflare customers, including those in the free tier, can block AI scrapers and crawlers from access their content. Simply head to Security, then click Bots, and toggle the AI Scrapers and Crawlers switch to on.

“This feature will automatically be updated over time as we see new fingerprints of offending bots we identify as widely scraping the web for model training. To ensure we have a comprehensive understanding of all AI crawler activity, we surveyed traffic across our network,” Cloudflare notes in a blog post.

The company says that it has trained its machine learning model to identify scrapers even if they are spoofing so as to seem like a real user. A “user” is assigned a score based on their activity and if the score is low enough, the likelihood that it’s a bot is strong.

And bots are scraping websites for content, don’t fool yourself into thinking your content is safe. While bots do tend to target larger sites, there is data across the entirety of the internet and it’s a gold rush for AI companies.

“Among the top AI bots that we see, Bytespider not only leads in terms of number of requests but also in both the extent of its Internet property crawling and the frequency with which it is blocked. Following closely is GPTBot, which ranks second in both crawling and being blocked. GPTBot, managed by OpenAI, collects training data for its LLMs, which underpin AI-driven products such as ChatGPT,” Cloudflare shares.

The company has also set up a reporting page where users can report suspected scrapers and crawlers that are accessing their content without permission.

We highly recommend taking advantage of this feature if you happen to be a Cloudflare customer, especially if you’re publishing original content to the web.


About Author


Related News