OpenAI has reached an agreement with Reddit using the social news site’s data to train AI models.
In a suspension on OpenAI’s press relations site, the company said the Reddit partnership would give it access to “real, structured and unique content” — e.g. posts and replies—from Reddit, allowing its tools and models to “better understand and present” content. Reddit content will be embedded in ChatGPTOpenAI’s popular conversational AI and the companies will work together to bring unspecified new “artificial intelligence features” to both Reddit users and moderators.
OpenAI will also become an advertising partner of Reddit.
“Reddit will build on OpenAI’s AI modeling platform to bring its powerful vision to life,” OpenAI wrote in the post. “Using LLM, ML and AI allows Reddit to improve the user experience for everyone.”
OpenAI has several similar licensing agreements with content providers ranging from media libraries to news publishers. But the unusual angle on this is that Sam Altman, CEO of OpenAI, has a 8.7% share on Redditmaking him the third largest shareholder and was once a member of the company’s board of directors.
In an effort to discourage scrutiny, OpenAI says in its press release that while Altman remains a Reddit shareholder, the partnership was “led by OpenAI’s COO [Brad Lightcap]” and “approved by [OpenAI’s] independent board of directors’. (I’ll note here that Altman is on OpenAI’s board; he resigned over that decision, however, an OpenAI spokesperson tells TechCrunch.)
Reddit has made data licensing deals an increasingly central part of its growth strategy as it navigates the market as a public company.
In its IPO prospectus, Reddit disclosed that it has contractual agreements to license its data customers including Google totaling more than $200 million. And, in its first earnings report as a public company, Reddit reported a 450% year-over-year increase in non-advertising revenue, largely attributable to those deals.
Reddit stock jumped 11% in extended trading after the OpenAI deal was announced.
“The paradox I see is that as more content on the Internet is written by machines, there’s an increasing premium on content that comes from real people,” Reddit CEO Steve Huffman said during the company’s earnings call. on March. “And we have nearly two decades of authentic conversation.”
The Reddit platform — which has more than 1 billion posts and more than 16 billion comments, numbers that grow daily thanks to its hundreds of millions of active users — is a gold mine for AI companies in the making, whose models learn from examples of content , such as text and images, to create new, similar content.
However, the company could face backlash from users concerned about how it monetizes their data.
It is instructive to look at Stack Overflow, the Q&A forum for software developers, which recently signed an agreement with OpenAI to provide data for training the latter’s models. In protest, some users deleted their top answers to questions in the community. But Stack Overflow reinstated the deleted posts and banned those users, claiming they were not abiding by its terms of service.
Reddit has already expressed displeasure over an attempt to give Reddit users more control over their own data.
Vana, a blockchain-based startup, is attempting to launch a data “DAO” (Digital Autonomous Organization) to allow Reddit users to pool their data and decide together how that combined data is used (or sold). Reddit banned Vana’s subreddit dedicated to discussing The DAO, in a statement to TechCrunch, and accused the company of “exploiting” its data export controls.
We’re launching an AI newsletter! Sign up here to start receiving it in your inbox on June 5th.