Embarrassment accused of scraping websites that explicitly excluded AI scraping

Ai Startup Purplexity Crawling and content scraping from sites that have explicitly stated that they do not want to leave, according to Cloudflare Internet Infrastructure provider.

Monday, cloudflare published survey Saying that he noticed that AI starts ignores the blocks and hides service and scrape activities. The network infrastructure giant accused the embarrassment of hiding his identity when he tried to scrape the websites “in an attempt to bypass the site’s preferences,” the cloudflare researchers write.

AI products such as those offered by embarrassment are based on large quantities of internet data, and the newly established AI companies have elongated texts, images and videos from the internet often without permission to operate their products. Lately, sites have tried to fight with the use of the Web Standard Robots.txt file, which says in search engines and AI companies that can find pages and which should not try that have seen mixed results so far.

The embarrassment seems to willingly bypass these blocks by changing the “bots user”, which means a signal that identifies a site visitor from their device and type of version, as well as changing autonomous system networks or ASN, essentially a number that identifies large internet networks.

“This activity was observed in tens of thousands of areas and millions of requests a day. We were able to typed this detector using a combination of learning and network learning machines,” read the position of the cloudflare.

Perplexity spokeswoman Jesse Dwyer rejected Cloudflare’s position as “Pitch Sales”, adding an email to TechCrunch that screenshots in the “show that it had no access to any content”. In a tracking email, Dwyer claimed the bot called the cloudflare blog “is not even ours”.

Cloudflare said it first observed the behavior, after its clients complained that the embarrassment was crawling and awakened their websites, even after adding rules to their robot file and to block the well -known Perplexity bots. Cloudflare said he then tried to check and confirm that the embarrassment bypassing these blocks.

TechCrunch event

Francisco
|
27-29 October 2025

“We noticed that embarrassment uses not only the declared user-man, but also a general browser intended to imitate Google Chrome in Macos when their declared detector was blocked,” according to Cloudflare.

The company also said it has been embarrassed by its verified list and added new techniques to prevent them.

Cloudflare recently took a public stance on Crawlers AI. Last month, Cloudflare announced the launch of a market that allows the owners and publishers of the site to charge the AI scrapers visiting their websites. CEO of Cloudflare Matthew Prince Sounds the alarm At that time, saying that AI breaks the business model of the internet, especially publishers. Last year, Cloudflare also launched a free tool to prevent bots from scraping websites to train AI.

This is not the first time that embarrassment is accused of scraping without permission.

Last year, news stores, such as wiredThe supposed embarrassment was the censorship of their content. Weeks later, Perplexity CEO Aravind Srinivas was unable to respond immediately when he was asked to provide the company’s designation for interview with TechCrunch’s Devin Coldwey at the Disrupt 2024 Conference.

What's Hot

Embarrassment accused of scraping websites that explicitly excluded AI scraping

Related Posts

Leave A Reply Cancel Reply