The New York Times wants OpenAI and Microsoft to pay for training data

The New York Times is suing OpenAI and its close partner (and investor), Microsoft, for allegedly infringing copyright law by training artificial intelligence models built on Times content.

In the treatment, filed in Federal District Court in Manhattan, the Times alleges that millions of his articles were used to train artificial intelligence models, including those that underpin OpenAI’s wildly popular ChatGPT and Microsoft’s Copilot, without his consent. The Times is calling for OpenAI and Microsoft to “destroy” models and training data containing the offending material and be held liable for “billions of dollars in statutory and actual damages” related to the “illegal copying and use of their unique and valuable works.” The Times.”

“If the Times and other news organizations cannot produce and protect their independent journalism, there will be a void that no computer or artificial intelligence can fill,” the Times complaint says. “Less journalism will be produced and the cost to society will be enormous.”

In an emailed statement, an OpenAI spokesperson said: “We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models. Our ongoing conversations with The New York Times have been productive and moving forward constructively, so we are surprised and disappointed by this development. We hope to find a mutually beneficial way to work together, as we do with many other publishers.”

The AI models they create “learn” from examples to create essays, code, emails, articles, and more, and vendors like OpenAI scour the web for millions to billions of these examples to add to their training sets . Some examples are public. Others are not or are subject to restrictive licenses that require reporting or specific forms of compensation.

Vendors argue that the fair use doctrine provides a blanket protection for web scraping practices. Copyright holders disagree. hundreds news organizations are now using code to prevent OpenAI, Google and others from crawling their sites for training data.

The supplier-vendor conflict has led to a growing number of legal battles, with The Times being the latest.

Actress Sarah Silverman joined a pair of lawsuits in July that accuse Meta and OpenAI of “swallowing” Silverman’s memoirs to train their AI models. In a separate suit, thousands of novelists, including Jonathan Franzen and John Grisham, claim that OpenAI mined their work as training data without their permission or knowledge. And several developers have an ongoing case against Microsoft, OpenAI and GitHub over Copilot, an AI-powered code generation tool that the plaintiffs say was developed using their IP-protected code.

While the Times is not the first to sue AI producers for alleged IP violations involving written works, it is the largest publisher involved in such a lawsuit to date — and one of the first to highlight potential damage to its brand through “ illusions”. or contrived evidence from generative artificial intelligence models.

The Times’ complaint cites several instances in which Microsoft’s Bing Chat (now called Copilot), which is powered by an OpenAI model, provided incorrect information it said came from the Times — including results for “the 15 healthiest for heart foods’, 12 of which were not mentioned in any Times article.

The Times also argues that OpenAI and Microsoft are effectively building competing news publishers using the Times’ projects, hurting the Times’ business by providing information that normally couldn’t be accessed without a subscription — information that isn’t always reported, sometimes created. revenue and deducted from affiliate links that The Times uses to generate commissions, in addition.

As the Times complaint states, the AI models being created tend to regurgitate training data, for example by reproducing almost verbatim results from articles. Regression aside, OpenAI has in at least one case unintentionally allowed ChatGPT users to browse paywalled news content.

“Defendants seek to profit from the Times’ massive investment in its journalism,” the complaint says, accusing OpenAI and Microsoft of “using unpaid Times content to create products that substitute for The Times and rob its audience of ».

The effects on the news subscription business — and on publishers’ web traffic — are at the center of a tangentially similar lawsuit filed by publishers earlier this month against Google. In the case, the defendants, like the Times, argued that Google’s GenAI experiments, including the Bard chatbot and AI-powered Search Generative Experience, took away publishers’ content, readers and ad revenue through anticompetitive means.

There is credibility to the publishers’ claims. A recent model from The Atlantic were found that if a search engine like Google incorporated artificial intelligence into search, it would answer a user’s query 75% of the time without requiring a click to its website. Publishers in Google suit estimate that they will lose up to 40% of their traffic.

This does not mean that they will be successful in court. Heather Meeker, a founding partner at OSS Capital and a consultant on IP issues, including licensing arrangements, compared the Times regression example to “using a word processor to cut and paste.”

“In the complaint, the New York Times gives an example of a ChatGPT session related to a 2012 restaurant review,” Meeker told TechCrunch via email. “The prompt for ChatGPT is ‘What were the opening paragraphs of his review?'” Subsequent prompts repeatedly ask for “the next sentence”. Teasing a chatbot to reproduce input is not a reasonable basis for copyright infringement… If the user is intentionally duplicating the chatbot, that’s user error.And that’s why most people [lawsuits like this] it will probably fail.”

Some news outlets, instead of fighting AI producers in court, have chosen to enter into licensing agreements with them. The Associated Press knocked a deal in July with OpenAI, and Axel Springer, the German publisher that owns Politico and Business Insider, did the same this month.

In its complaint, the Times says it tried to reach a licensing deal with Microsoft and OpenAI in April, but that talks ultimately fell through.

Updated at 4:24 Eastern with additional context and commentary from OpenAI.

What's Hot

The New York Times wants OpenAI and Microsoft to pay for training data

Related Posts

Leave A Reply Cancel Reply