The UK Security Institute, the UK’s newly formed AI security body, has launched a toolkit designed to “enhance AI security” by making it easier for industry, research organizations and academia to develop AI assessments intelligence.
It’s called Inspect, the toolset — which is available under an open source license, specifically one MIT license — aims to evaluate certain capabilities of artificial intelligence models, including the underlying knowledge and reasoning ability of the models, and generate a score based on the results.
In a press release announcing In the news on Friday, the Security Institute claimed that Inspect marks “the first time that an AI security testing platform, spearheaded by a state-backed body, has been released for wider use.”
“Successfully collaborating on AI security testing means we have a common, accessible approach to assessments, and we hope Inspect can be a building block,” said Security Institute president Ian Hogarth. “We hope to see the global AI community use Inspect not only to perform their own model safety tests, but also to help adapt and leverage the open source platform so we can produce high-quality assessments to everyone the sectors”.
As we’ve written before, AI benchmarks are tricky — mostly because the most sophisticated AI models today are black boxes whose infrastructure, training data, and other key details are closely guarded by the companies that build them. So how does Inspect rise to the challenge? By being extensible and extensible to new testing techniques, mainly.
Inspect consists of three main components: datasets, solvers and markers. The datasets provide samples for evaluation tests. Solvers do the work of running the tests. And raters evaluate solvers’ work and aggregate test scores into metrics.
Inspect’s built-in components can be augmented by third-party packages written in Python.
In a post on X, Deborah Raj, a Mozilla researcher and well-known AI practitioner, called Inspect “a testament to the power of public investment in open source AI accountability tools.”
Clément Delangue, CEO of AI startup Hugging Face, championed the idea of integrating Inspect with Hugging Face’s library of models or creating a public leaderboard of the results of the toolset’s evaluations.
The release of Inspect comes after a government agency — the National Institute of Standards and Technology (NIST) — launched NIST GenAI, a program to evaluate various emerging AI technologies, including AI that generates text and images. NIST GenAI plans to release benchmarks, help build content authentication systems, and encourage the development of software to detect false or misleading information generated by artificial intelligence.
In April, the US and UK announced a partnership to jointly develop advanced AI model testing, following commitments announced at the UK AI Security Summit at Bletchley Park in November last year. As part of the collaboration, the US plans to create its own AI safety institute, which will be broadly tasked with assessing risks from artificial intelligence and genetic artificial intelligence.