Although OpenAI is working to harden its Atlas AI browser against cyberattacks, the company admits that just-in-time injections, a type of attack that manipulates AI agents into following malicious instructions often hidden in web pages or emails, is a risk that won’t go away anytime soon — raising questions about how secure AI agents can operate on the open web.
“Direct injection, like web scams and social engineering, is unlikely to be fully ‘solved’,” OpenAI wrote on Monday. blog post detailing how the company is beefing up Atlas’ armor to combat the relentless attacks. The company admitted that the “agent mode” in ChatGPT Atlas “expands the security threat surface”.
OpenAI released its ChatGPT Atlas browser in October, and security researchers were quick to publish their demos, showing that it was possible to type a few words into Google Docs that could change the behavior of the underlying browser. Same day, Brave published a blog post explaining that indirect direct injection is a systematic challenge for AI-powered browsers, including Perplexity’s Comet.
OpenAI isn’t alone in recognizing that timely injections aren’t going away. THE The UK’s National Cyber Security Center warned earlier this month that direct injection attacks against genetic AI applications “can never be fully mitigated,” putting websites at risk of falling victim to data breaches. The UK government agency has advised cyber professionals to reduce the risk and impact of timely injections, rather than believing that attacks can be “stopped”.
For OpenAI’s part, the company said: “We see direct injection as a long-term AI security challenge, and we should continually strengthen our defenses against it.”
The company’s response to this Sisyphean task? A proactive rapid response cycle that the company says shows early promise for helping discover innovative attack strategies internally before they are exploited “in the wild.”
That’s not entirely different from what competitors like Anthropic and Google have been saying: that to combat the persistent threat of just-in-time attacks, defenses must be layered and constantly stress-tested. Recent work by Googlefor example, it focuses on architectural and policy-level controls for agent systems.
But where OpenAI takes a different tactic is with the “LLM-based automated attacker”. This attacker is basically a bot that OpenAI trained, using reinforcement learning, to play the role of a hacker looking for ways to sneak malicious instructions into an AI agent.
The bot can test the attack in a simulation before actually using it, and the simulator shows how the AI target would think and what actions it would take if it saw the attack. The bot can then study this response, modify the attack and try again and again. This picture of the target AI’s internal reasoning is something outsiders don’t have access to, so in theory OpenAI’s bot should be able to find flaws faster than a real-world attacker.
It’s a common tactic in AI security testing: create an agent to find edge cases and quickly test them in a simulation.
“Us [reinforcement learning]”A trained attacker can direct an agent to execute sophisticated, long-horizon malicious workflows that unfold over tens (or even hundreds) of steps,” OpenAI wrote. “We also observed new attack strategies that did not appear in the human clustering campaign or in external reports.”
In a demonstration (pictured in part above), OpenAI showed how its automated attacker cracked a malicious email in a user’s inbox. When the AI agent later scanned the inbox, it followed the hidden instructions in the email and sent a resignation message instead of composing an out-of-office reply. However, after the security update, “agent mode” was able to successfully detect the direct injection attempt and flag it to the user, according to the company.
The company says that while timely injection is difficult to ensure infallibly, it relies on large-scale testing and faster patch cycles to harden its systems before they are exposed to real-world attacks.
An OpenAI spokesperson declined to share whether the Atlas security update has resulted in a measurable reduction in successful injections, but says the company has been working with third parties to harden Atlas against direct injection since before the release.
Rami McCarthy, principal security researcher at cybersecurity firm Wiz, says reinforcement learning is a way to continuously adapt to attacker behavior, but it’s only part of the picture.
“A useful way to think about risk in AI systems is autonomy multiplied by access,” McCarthy told TechCrunch.
“Browser agents tend to sit in a difficult part of this space: moderate autonomy combined with very high access,” McCarthy said. “Many current recommendations reflect this trade-off. Limiting connected access primarily reduces exposure, while requiring verification of confirmation requests limits autonomy.”
Those are two of OpenAI’s recommendations to users to reduce their own risk, and a spokesperson said Atlas is also trained to get user confirmation before sending messages or making payments. OpenAI also suggests that users give agents specific instructions, rather than giving them access to your inbox and telling them to “do whatever it takes.”
“High latitude makes it easier for hidden or malicious content to affect the agent, even when safeguards are in place,” according to OpenAI.
While OpenAI says protecting Atlas users from timely injections is a top priority, McCarthy raises some skepticism about the return on investment for vulnerable browsers.
“For most everyday use cases, agent browsers don’t yet offer enough value to justify their current risk profile,” McCarthy told TechCrunch. “The risk is high given their access to sensitive data like email and payment information, even though that access is also what makes them powerful. That balance will evolve, but today the trade-offs are still very real.”
