Human CEO Dario Amodei Posted an essay Thursday, stressing how few researchers understand the internal operation of the world’s top models. To deal with this, Amodei set an ambitious goal for anthropogenic to reliably detect most AI models by 2027.
Amodei recognizes the challenge forward. In “The Urgescent of Interperau”, the CEO says that Anthropic has made early discoveries in detecting the way models reach their answers – but stresses that much more research is needed to decode these systems as they grow older.
“I am very worried about developing such systems without a better handle for the interpretation,” Amodei wrote in the essay. “These systems will be perfectly central to the economy, technology and national security and will be capable of such a great deal of autonomy that I find it basically unacceptable for humanity to be completely ignorant of how they work.”
Anthropic is one of the pioneering companies in a mechanistic interpretation, a field that aims to open the black box of AI models and understand why they make the decisions they make. Despite the rapid improvements in the performance of the AI models of the technology industry, we still have a relatively a little idea how these systems end up in decisions.
For example, Openai has recently launched new reasonable logical models AI, O3 and O4-MINI, which perform better to certain tasks, but also more than its other models. The company doesn’t know why it happens.
“When an AI genetic system does something, as a financial document summarizes, we have no idea, at a certain or precise level, because it makes the choices it makes – because it chooses some words over others or because it occasionally makes it wrong despite the fact that it is usually accurate,” Amodei writes in the essay.
In the essay, Amodei notes that human co -founder Chris Olah says AI models “are cultivated more than they are built”. In other words, AI researchers have found ways to improve the intelligence of the AI model, but they do not know enough why.
In the essay, Amodei says it could be dangerous to reach AGI – or as he calls it, “a country of genius in a data center” – without understanding how these models work. In a previous essay, Amodei claimed that the technological industry could reach such a milestone by 2026 or 2027, but believes that we are far out of the full understanding of these AI models.
In the long run, Amodei says that Anthropic would essentially want to carry out “brain scans” or “mris” of AI models. These exams would help identify a broad range of problems in AI models, including their tendencies being or seeking power or other weakness, he says. This could take five to 10 years to reach, but these measures will be necessary to test and develop Anthropic’s future AI models, he added.
Anthropic has made some research discoveries that have allowed him to better understand how AI models work. For example, the company recently found ways Locate a Model’s Thinking PathsWhat the company calls, circuits. The man recognized a circuit that helps AI models understand which US cities are located in which the US says. The company has found only a few of these circuits, but estimates that there are millions in AI models.
Anthropic has invested in interpretation research itself and recently Its first investment in startup Work on interpretation. While the interpretation is largely regarded as a security field today, Amodei notes that, in the end, explaining how AI models reach their answers could present a commercial advantage.
In the essay, Amodei called on Openai and Google Deepmind to increase their research efforts in the field. In addition to the friendly Nudge, Anthropic’s chief executive asked governments to impose “light-touch” regulations to encourage interpretation research, such as the requirements for companies to reveal security and security practices. In the essay, Amodei also says that the US will have to put on china export controls in order to limit the likelihood of an out-of-countrol AI world tribe.
Anthropic has always been distinguished by Openai and Google to focus on safety. While other technology companies have pushed back to the controversial California security bill, SB 1047, humanity issued moderate support and recommendations for the bill, which will set safety standards for AI Frontier Model Developers.
In this case, the man seems to be pushing for an effort throughout the industry to better understand AI models, not only increasing their potential.
