A recently released Google AI model, worse in some security tests than its predecessor, according to the company’s internal comparative assessment.
To one technical report Published this week, Google reveals that the Flash Gemini 2.5 model is more likely to create text that violates security instructions from Gemini 2.0 Flash. In two measurements, “text security in text” and “text security in text”, Gemini 2.5 Flash Reges 4.1% and 9.6% respectively.
The text in text counts how often a model violates Google’s instructions given the exhortation, while image security in text assesses how closely the model clings to these limits when requested by using an image. Both tests are automated, not anthropogenic.
In an e -mail statement, a Google spokesman confirmed that Gemini 2.5 Flash “worsens worse in text security in text and image in text”.
These amazing reference results come as AI companies move to make their models more permissible – in other words, less likely to refuse to respond to controversial or sensitive issues. For the latest harvest of Lama models, Meta said it is coordinating models that do not support “some views for others” and respond to more “discussed” political prompts. Openai said earlier this year that it would modify future models so as not to take a editorial stance and provide multiple perspectives on controversial issues.
Sometimes these efforts to be permanent have been restored. TechCrunch said Monday that the default model supplying Openai’s chatgpt allowed minors to create erotic conversations. Openai blamed the behavior of a “error”.
According to Google’s technical report, the Gemini 2.5 Flash, which is still in the preview, follows the instructions more faithfully than the Gemini 2.0 Flash, including the instructions that cross problematic lines. The company claims that regressions can be partially attributed to false positives, but also admits that Gemini 2.5 Flash sometimes creates “violated content” when explicitly requested.
TechCrunch event
Berkeley, ca
|
June 5
Book now
“Of course there is tension between [instruction following] On sensitive issues and violations of security policy, which are reflected in all our evaluations, “the report said.
The scores from SpeechMap, a reference point that detects how models respond to sensitive and controversial prompts, also suggest that Gemini 2.5 Flash is much less likely to refuse to answer questionable questions from Flash Gemini 2.0. Testing the model by TechCrunch via the AI Openrouter platform has found that it would write unexpected essays to support the replacement of human judges with AI, weakening the protection of procedures in the US and implementing widely widespread government programs.
Thomas Woodside, co -founder of the Secure AI Project, said that Google’s limited details in his technical report prove the need for greater transparency in model tests.
“There is a compromise between monitoring of teaching and the policy that follows, because some users can request content that would violate policies,” Woodside told TechCrunch. “In this case, Google’s latest Flash model complies with instructions more, while violating policies more. Google does not provide much details about the specific cases where policies were violated, although they say they are not serious.
Google has undergone a fire on the pre -model security reports.
It took weeks of the company to publish a technical report for its most capable model, Gemini 2.5 Pro. When the report was finally published, initially skip the security test details.
On Monday, Google published a more detailed report with additional security information.
