After announcing a framework for an open AI ecosystem earlier this year, the non-profit Creative Commons has advocated “pay-to-crawl” technology — a system to automate the crawling of website content when accessed by machines such as artificial intelligence web crawlers.
Creative Commons (CC) is best known for spearheading the licensing movement that allows creators to share their works while maintaining copyright. In July the organization was announced a plan to provide a legal and technical framework for data sharing between the companies that control the data and the AI providers that want to be trained on it.
Now, the nonprofit is temporarily backing pay-for-tracking systems, saying it is “cautiously supportive.”
“Applied responsibly, pay-to-crawl could represent a way for websites to preserve the creation and sharing of their content and manage alternative uses, keeping content publicly accessible where it otherwise could not be shared or would disappear behind even more restrictive paywalls,” a CC. the blog post said.
Spearheaded by companies like Cloudflare, the idea behind pay-to-crawl would be to charge AI bots every time they crawl a website to collect its content for model training and updates.
In the past, websites allowed web crawlers to index their content for inclusion in search engines like Google. They benefited from this arrangement by seeing their sites listed in search results, which drove visitors and clicks. With AI technology, however, the dynamic has changed. After a consumer receives their answer through an AI chatbot, they are unlikely to click through to the source.
This shift has already been devastating for publishers, killing search traffic, and shows no signs of letting up.
A pay-for-discovery system, on the other hand, could help publishers recover from the hit that AI has had on their bottom line. Additionally, it could work better for smaller web publishers who don’t have the traction to negotiate one-time content deals with AI providers. Major agreements have been made between companies such as OpenAI and Condé Nast, Axel Springer and others. as well as between Perplexity and Gannett; Amazon and The New York Times; and Meta and various media publishers, among others.
CC offered several caveats to its support for pay-to-crawl, noting that such systems could concentrate power on the Web. It could also block access to content for “researchers, non-profit organisations, cultural heritage institutions, educators and other bodies working in the public interest”.
He proposed a number of principles for responsible paying for crawling, including not making paying for crawling the default setting for all websites and avoiding blanket rules for the web. In addition, he said pay-to-crawl systems should allow throttling, not just blocking, and should preserve public interest access. They should also be open, interoperable and built with standardized components.
Cloudflare isn’t the only company investing in the pay-for-tracking space.
Microsoft also makes one AI market for publishers, and smaller startups like ProRata.ai and TollBit have begun to do so as well. Another group called the RSL Collective announced its own specification for a new standard called Really Simple Licensing (RSL) that would dictate what parts of a website crawlers could access, but stop short of actually blocking crawlers. Cloudflare, Akamai, and Quickly they have since adopted the RSL, which is is supported from Yahoo, Ziff Davis, O’Reilly Media and others.
Among those who were K.K was announced its support for RSL, alongside CC tokens, its broader work to develop technology and tools for the AI era;
