The AI Web-crawling bots are internet cockroaches, many software developers believe. Some devs have begun to fight back with intelligent, often humorous ways.
While any website can target from the misconduct of Crawler – sometimes taking the site – open source developers are disproportionately affected, affected, register Niccolò Venerandi, programmer of Linux desktop known as creature and owner of blog Librenews.
By nature, areas that host free projects and FOSS are shared more than their infrastructure publicly and also tend to have less resources than commercial products.
The point is that many Bots AIs do not honor the robot.txt file of the robot protocol, the tool that says bots what not to crawl, originally created for search engine bots.
To a “cry for help” blog In January, FOSS XE IASO developer described how Amazonbot struck relentlessly on a git server website to cause DDOS holidays. Git Servers Host Foss Projects so that anyone who wants to be able to download or contribute to it.
But this bot ignored Iaso’s robot.txt, hiding behind other IP addresses and pretended to be other users, Iaso said.
“It is futile to block Bots Crawler AI because they are. They change their users’ agent. They use home IP addresses as proxies and much more,” mourned IASO.
“They will scrape your site until it falls and then scratch it a little more. They click on each link in each link to each link. Seeing the same pages over and over again.
Enter the god of the tombs
So, Iaso fought back with smart, building a tool called Anubis.
Anubis is a reverse check of proof of proxy This must pass before the ability to allow requests to hit a git server. It excludes bots, but leaves through human -operated browsers.
The funny place: Anubis is the name of a god in Egyptian mythology that leads the dead to the crisis.
“Anubis weighed your soul (heart) and if he was heavier than a feather, your heart ate, as Mega died,” Iaso told TechCrunch. If a web request passes the challenge and is determined to be human, A cute image anime announces success. The plan is “my assumption for Anubis anthropomorphism,” says Iaso. If it is a bot, the request is rejected.
The work called Wryly has spread like the wind between the Foss community. Hare shared it on gitHub On March 19, and in a few days, he gathered 2,000 stars, 20 contributors and 39 forks.
Revenge as defense
The immediate popularity of Anubis shows that the pain of IASO is not unique. In fact, Venerandi shared the story after the story:
- Founder of the chief executive Sourcehut Drew Devault is described Expenditure “from 20-100% of my time in any given week by mitigating the LLM supreme detectors on a scale”, and “experiencing dozens of short holidays a week”.
- Jonathan Corbet, a famed FOSS programmer who runs Linux Industry News site LWN, warned that his site was slows down from traffic to DDOS level “From AI Scraper Bots.”
- Kevin Fenzi, Sysadmin of the huge Linux Fedora project, said the Bots AI Scraper He had gotten so aggressive, he had to prevent the whole country of Brazil from access.
Venerandi tells Techcrunch that he knows many other projects that face the same issues. One of them “had to temporarily ban all Chinese IP addresses at one point.”
Let this sink for a moment – that developers “need to even turn to the ban on entire countries” just to remove bots AI who ignore the robot files. TXT, Venerandi says.
In addition to weighing the soul of an applicant on the internet, other devs believe that revenge is the best defense.
A few days ago Hacker newsuser xyzal Suggested Loading Robot.txt forbidden pages with “a burden of articles bin on the benefits of bleaching” or “articles on the positive effect of measles in bed.”
“Consider that we have to target the bots to get the utility price from visiting our traps, not just zero,” Xyzal explained.
As is the case, in January, an anonymous creator known as “Aaron” released a tool called Reckless This aims to do just that. Traps the crawlers in an endless maze fake content, a target that Dev admitted Ars Technica It is aggressive if not completely malicious. The tool was named after a carnivorous plant.
And Cloudflare, perhaps the largest trading player to offer various tools to prevent AI Crawlers, released a similar tool last week called AI Labyrinth.
It is intended to “slow down, confuse and waste the resources of AI detectors and other bots who do not respect” no detection “instructions,” describes Cloudflare In his post on the blog. Cloudflare said it feeds AI Crawlers’ misconduct “irrelevant content rather than exporting your website legal data”.
The Sourcehut Devault told TechCrunch that “Nepenhes has a satisfactory sense of justice in it, as it feeds nonsense for detectors and poisons their wells, but eventually Anubis is the solution that has worked” for his site.
But Devault also issued a public, honest appeal for a more immediate solution: “Please stop legalizing LLMS or AI Generators or Github Copilot or any of these garbage.
Since the probability is that Zilch, the developers, especially in the Foss, are struggling back with a smart and a touch of humor.