The Wikimedia Foundation, the umbrella of wikipedia organization and about twelve other Crowdsourced Gorning Projects said on Wednesday that eating zone range for multimedia downloads from Wikimedia commons increased by 50% since January 2024.
The reason, the uniform wrote in a blog On Tuesday, it is not due to the increasing demand by the people of Knowledge-Security, but by automated, hungry scrapers who want to train AI models.
“Our infrastructure is designed to maintain sudden traffic points from people during events of high interest, but the amount of traffic produced by the wristwax bots is unprecedented and has increasing risks and costs,” the position said.
Wikimedia Commons is a freely accessible repository of images, videos and audio files available under open licenses or are different in the public sector.
Wikimedia says that almost two-thirds (65%) of the most “costly” release-that is, the most intense in terms of content consumed-was from bots. However, only 35% of total page views come from these bots. The reason for this inequality, according to Wikimedia, is that the content that often looks forward often remains closer to the user in their cache, while the other content that has been less affected than less directly is stored in the “core data center”, which is more expensive to serve the content than. This is the kind of content that the bots usually look for.
“While human readers tend to focus on specific – often similar – themes, the Crawler bots tend to” bulk read “larger pages and also visit the least popular pages,” Wikimedia writes. “This means that these types of applications are more likely to be promoted to the basic datacenter, which makes it much more expensive in terms of consumption of our resources.”
The long and short of all this is that the Wikimedia Foundation’s credibility team must spend a lot of time and resources that prevent detectors to prevent regular users disorder. And all this before we look at the cloud costs facing the institution.
In fact, this represents part of a rapidly growing trend that threatens the very existence of the open internet. Last month, software engineer and open source lawyerDrew Devault broke the fact that AI detectors ignore the “robots.txt” files designed to avoid automated circulation. And “realistic engineer‘Gremgst Oroz Also complained Last week that AI Scrapers from companies such as Meta led to zone range requirements for its own projects.
While the open source infrastructure, in particular, is on the firing lineDevelopers are struggling back with “Cleverness and Vengeance”, as TechCrunch wrote last week. Some technology companies make their part to address the issue – for example, Cloudflare, for example, Cloudflare Started AI Labyrinthwhich uses content created by AI to slow down detectors.
However, it is a lot of a cat and mouse game that could eventually force many publishers to duck behind connections and paywalls in Damage to all those who use the web today.