General, News

Publishers can stop Bard from scraping their sites

Robin-Leigh Chetty

29th September 2023

Google has announced new controls when it comes to its Bard and Vertex generative AIs.
The control is specifically found in the robots.txt indexing file, allowing publishers to stop the AIs from trawling or scraping their sites.
The controls arrive as publishers have been airing concerns with Google as to what has access to their content.

We are living in the era of generative AI, and the race to iterate, improve, enhance, and upgrade is on across the number of technology companies that have some sort of AI platform.

For publishers, this presents a potential issue, as the content they publish online is ripe for trawling and scraping by AIs with virtually unfettered access, which is why Google recently confirmed that more controls will be made available to publishers moving forward.

In a Google blog post, VP for Trust at the company, Danielle Romain unpacked the new control that is being made available via the robots.txt indexing file in a system called Google Extended.

Romain explains that through Google Extended, publishers will have access to, “a new control that web publishers can use to manage whether their sites help improve Bard and Vertex AI generative APIs, including future generations of models that power those products. By using Google-Extended to control access to content on a site, a website administrator can choose whether to help these AI models become more accurate and capable over time.”

“As AI applications expand, web publishers will face the increasing complexity of managing different uses at scale. That’s why we’re committed to engaging with the web and AI communities to explore additional machine-readable approaches to choice and control for web publishers,” Romain added.

With generative AIs becoming increasingly important to the way people access information online, publishers may be forced to acquiesce when it comes to giving them permission for scraping sites.

As such, AI training and its importance in serving up search results, may prove the determining factor on whether publishers choose to opt-out of scraping and the like.

AI Bard data Google Machine learning News Publishers robots.txt Scraping websites

About Author

Robin-Leigh Chetty

Editor of Hypertext. Covers smartphones, IoT, 5G, cloud computing and a few things in between. Also a keen photographer and dabbles in console games when not taking the hatchet to stories.