ArXiva widely used open repository for preprint research, is doing more to combat careless use of large language models in scientific papers.
Although papers are posted to the site before they are peer-reviewed, arXiv (pronounced “archive”) has become one of the primary ways research is circulated in fields such as computer science and mathematics, and the site itself has become a data source for trends in scientific research.
ArXiv has already taken steps to combat a growing number of low-quality AI-generated papers, for example by requiring posters for the first time get an endorsement from an established writer. And after being hosted by Cornell for more than 20 years, the organization is becoming an independent nonprofit, which will allow it to raise more money to tackle issues like artificial intelligence.
In his latest move, Thomas Dietterich — arXiv’s chair of IT — was posted Thursday that “if a submission contains incontrovertible evidence that the authors did not check the results of the LLM generation, this means that we cannot trust anything in the paper”.
That indisputable evidence could include things like “hallucinations” and comments to or from the LLM, Dietterich said. If such evidence is found, the authors of a paper will face “a one-year ban from arXiv, followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed venue.”
Note that this is not an outright ban on using LLM, but an insistence that, as Dietterich put it, authors take “full responsibility” for the content, “regardless of how the content was created.” So, if researchers copy-paste “inappropriate language, plagiarism, biased content, mistakes, errors, miscitations, or misleading content” directly from an LLM, then they are still responsible for it.
Dieterich he told 404 Media that this will be a “one strike” rule, but the moderators must flag the issue and the division chairs must confirm the evidence before imposing the penalty. Authors will also be able to appeal the decision.
Recent peer-reviewed research has found that fabricated referrals are on the rise in biomedical research, probably because of an LLM — although to be fair, scientists aren’t the only ones getting caught using AI-generated reports.
When you purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.
