On Thursday, Reddit unveiled a new policy aimed at balancing its desire to license its content to larger tech companies such as Google, and the protection of user privacy. The newly announced “Public Content Policy” will now join Reddit’s existing privacy policy and content policy to guide how commercial entities and other partners access and use Reddit data. Related to this, the company also announced a subreddit dedicated to researchers working with Reddit data.
The announcement comes shortly after Reddit’s public debut, where the company is positioning itself to grow revenue not only from ads served on its platform and API use by developers, but also from its data set. The company said in its IPO prospectus that it had already earned $203 million through data licensing deals and expects that number to grow over time.
While Reddit has not historically blocked access to its data for AI training purposes, it changed course last year. Reddit CEO Steve Huffman told the New York Times that it didn’t make sense for Reddit to continue giving “all this value to some of the biggest companies in the world for free,” signaling the company’s plan to move into data licensing . space.
With these efforts already underway, the new Public Content Policy will further lock down access to Reddit data without an agreement.
“Unfortunately, we are seeing more and more commercial entities using unauthorized access or abusing authorized access to bulk harvest public data, including public Reddit content,” He writes Reddit on his blog. “Worse, these entities perceive that they have no restrictions on their use of this data and do so without regard for users’ rights or privacy, ignoring reasonable legal, security and user removal requests. While we will continue our efforts to block known bad actors, we need to do more to limit access to public Reddit content at scale to trusted actors who have agreed to abide by our policies. But we must also continue to ensure that users, mods, researchers and other bona fide, non-commercial actors have access.”
In other words, access to Reddit data for research and other non-commercial endeavors will continue, but entities that want to use Reddit data for other purposes — including AI training — will have to pay. In a graphic shared on the blog, Reddit makes this clear, saying that businesses interested in using Reddit data to “power, augment, or improve your product for any commercial purpose” require a contract.
Advertisers, meanwhile, are directed to an ad API to manage campaigns and track their performance.
Because the company is essentially just one big website that can be indexed by search engines, this new policy aims to lock down Reddit content from any unauthorized collection while respecting users’ rights.
For example, Reddit says its partners should upload users’ decisions to delete their content. So if users don’t want their personal posts to become fodder for future AI engines, they should be able to opt out. Partners are also restricted by the new policy from using Reddit content to identify individuals or their personal information, including for ad targeting. Partners also cannot use Reddit content to spam or harass its users, or to conduct “background checks, facial recognition, government surveillance, or assist law enforcement in doing any of the above.” .
The policy further restricts access to adult media and specifies that Reddit will not sell the personal information of its users. The company also notes that it will never license non-public content such as private messages or non-public account information such as user emails or browsing history, among others.
To help researchers who want to use Reddit data for non-commercial purposes, the company created a new subreddit, r/reddit4researchers. The company says it’s working with OpenMined also develop a program to mentor and develop researchers’ collaboration with Reddit.