THE now-viral X post Meta AI security researcher Summer Yue reads, at first, like satire. She told her OpenClaw AI agent to check her overflowing inbox and suggest what to delete or archive.
The agent went on a rampage. She began deleting all her emails in a “speed run” while ignoring commands from her phone telling her to stop.
“I had to RUN to my Mac mini like defusing a bomb,” she wrote, posting images of the ignored stop prompts as proof.
The Mac Mini, an affordable Apple computer that sits flat on a desk and fits in the palm of your hand, has become the go-to device these days for running OpenClaw. (The Mini is selling “like hotcakes,” a “confused” Apple employee apparently said Famous AI researcher Andrej Karpathy when he bought one to run an OpenClaw alternative called NanoClaw.)
OpenClaw is, of course, the open source AI agent that rose to fame through Moltbook, an AI-only social network. OpenClaw agents were at the center of that largely debunked Moltbook episode in which AIs appeared to conspire against humans.
But OpenClaw’s mission, according to her GitHub pageit is not focused on social networks. It aims to be a personal AI assistant that runs on your own devices.
The Silicon Valley crowd has fallen so in love with OpenClaw that “claw” and “claws” have become the buzzwords of choice for agents operating with personal hardware. Other such factors include; ZeroClaw, IronClawand PicoClaw. Y Combinator’s podcast team even appeared on theirs latest episode dressed in lobster suits.
Techcrunch event
Boston, MA
|
June 9, 2026
But Yue’s post serves as a warning. As others have noted on X, if an AI security researcher could tackle this problem, what hope do mere mortals have?
“Did you test his guardrails on purpose or did you make a rookie mistake?” a software developer asked her at X.
“Rookie mistake tbh,” she replied. She was testing her agent with a smaller inbox “game,” as she called it, and it worked well on less important emails. He had earned her trust, so he figured he’d let it slip away from the real thing.
Yue believes the sheer volume of data in her actual inbox “caused compression,” she wrote. Condensation occurs when the context window—the current record of everything the AI has said and done in a session—grows too large, forcing the agent to begin summarizing, compressing, and managing the conversation.
At that point, the AI can override instructions that the human deems too important.
In this case, he may have skipped the last prompt – where he was told not to act – and reverted to his instructions from the “game” inbox.
As did several others at X pointed out, messages are not reliable to act as guardrails. Models may misinterpret or ignore them.
Various people offered suggestions ranging from the exact syntax Yue should have used to stop the agent, to various methods to ensure better adherence to the guardrails, such as writing instructions in special files or using other open source tools.
In the interest of full transparency, TechCrunch could not independently verify what happened in Yue’s inbox. (He did not respond to our request for comment, although he did respond to several questions and comments were sent to X.)
But it doesn’t really matter.
The point of the story is that knowledge worker agents, at their current stage of development, are dangerous. People who say they use them successfully combine methods to protect themselves.
One day, maybe soon (by 2027? 2028?), they may be ready for widespread use. Goodness knows many of us would love to help with emails, grocery orders, and scheduling dentist appointments. But that day has not yet come.
