OpenAI is expanding its internal security processes to fend off the threat of harmful artificial intelligence. A new “security advisory group” will sit above the technical teams and make recommendations to leadership, and the board has a veto – of course, whether it will actually use it is another matter.
Normally, the details of policies like these do not require coverage, as in practice they amount to many closed-door meetings with obscure functions and flows of responsibility that outsiders will rarely know. While that’s likely the case here as well, recent leadership wrangling and the evolving AI risk debate warrant a look at how the world’s leading AI developer approaches security issues.
In a new document and suspensionOpenAI discusses their updated “Readiness Framework,” which one imagines got a bit of a retool after the November shakeup that removed the two most “slow-down” board members: Ilya Sutskever (still with the company in a somewhat changed role); and Helen Toner (completely disappeared).
The main purpose of the update seems to be to show a clear path to identify, analyze and decide on the “catastrophic” risks inherent in the models they develop. As they define it:
By catastrophic risk, we mean any risk that could result in hundreds of billions of dollars in economic damage or result in the serious harm or death of many people — this includes, but is not limited to, existential risk.
(The existential risk is the “rise of the machines” type stuff.)
Production models are managed by a team of “security systems”. This is for, say, systematic abuses of ChatGPT that can be mitigated by API restrictions or tuning. Frontier models under development receive the “readiness” team, which tries to identify and quantify risks before the model is released. And then there’s the “hyper-alignment” team, which is working on theoretical guide rails for “super-smart” models, which may or may not be anywhere.
The first two categories, being real and not imaginary, have a relatively easy-to-understand rubric. Their teams rate each model in four risk categories: cyber security, “persuasion” (eg de-information), model autonomy (i.e. acting on its own) and CBRN (chemical, biological, radiological and nuclear threats; e.g. x ability to create new pathogens).
Various mitigations are considered: For example, a reasonable reticence to describe the process of making napalm or pipe bombs. After known mitigations are taken into account, if a model is still assessed as “high” risk, it cannot be developed, and if a model has “critical” risks, it will not be developed further.
These risk levels are actually documented in the context, in case you’re wondering if they should be left to the discretion of some engineer or product manager.
For example, in the cybersecurity section, which is the most practical of these, it is a “medium” risk to “increase agency productivity . . . for essential cyber operations duties’ by a particular actor. A high-risk model, on the other hand, would “identify and develop proof-of-concept for high-value exploits against hardened targets without human intervention.” Crucially, “the model can devise and execute end-to-end innovative strategies for cyberattacks against hardened targets, given only a desired high-level target.” Obviously we don’t want this out there (although it would sell quite a bit).
I’ve asked OpenAI for more information on how to define and improve these categories — for example, whether a new risk like photorealistic fake video of people falls under “persuasion” or a new category — and will update this post if I hear news.
Thus, only moderate and high risks should be tolerated one way or the other. But the people who build these models aren’t necessarily the best to evaluate them and make recommendations. For this reason, OpenAI is creating a “cross-functional Security Advisory Group” that will sit at the top of the technical side, reviewing boffins’ reports and making recommendations that include a higher vantage point. Hopefully (they say) this will reveal some “unknown unknowns”, although by their nature they are quite difficult to catch.
The process requires that these recommendations be sent simultaneously to the board and leadership, which we understand to mean CEO Sam Altman and CTO Mira Murati, as well as their lieutenants. Leadership will make the decision on whether to ship it or refrigerate it, but the board will be able to overturn those decisions.
Hopefully this will short-circuit anything like what was rumored to happen before the big drama, a high-risk product or process getting the green light without board briefing or approval. Of course, the result of said drama was the sidelining of two of the most critical voices and the appointment of some money-minded guys (Bret Taylor and Larry Summers) who are sharp but not remotely AI experts.
If a panel of experts makes a recommendation and the CEO makes decisions based on that information, will this friendly board really feel empowered to counter it and hit the brakes? And if they do, will we hear it? Transparency isn’t really addressed outside of a promise that OpenAI will request audits from independent third parties.
Suppose a model has been developed that justifies a “critical” risk category. OpenAI hasn’t been shy about teasing about this in the past – talking about how powerful their models are, to the point of refusing to release them, is great advertising. But do we have any guarantee that this will happen, if the risks are so real and OpenAI is so concerned about them? Maybe it’s a bad idea. But in any case it is not really mentioned.