OpenAI’s agent tool may be nearing release

OpenAI may be close to releasing an AI tool that can take control of your computer and perform actions on your behalf.

Tibor Blaho, a software engineer with a reputation for accurately leaking upcoming AI products, claims to have revealed details of OpenAI’s long-rumored Operator tool. Publications including Bloomberg has in the past was mentioned in Operator, which is said to be an “agent” system capable of autonomously handling tasks such as writing code and booking travel.

According to in The Information, OpenAI is targeting January as the operator’s release month. The code revealed by Blaho this weekend adds credence to this report.

OpenAI’s ChatGPT client for macOS gained options, hidden for now, to set shortcuts for “Operator Switch” and “Force Operator Exit”, per Blaho. And OpenAI has added references to Operator on its website, Blaho said — though references that aren’t yet publicly visible.

Confirmed – ChatGPT macOS desktop app has hidden options to set desktop launcher shortcuts to “Switch Operator” and “Force Exit Operator” pic.twitter.com/j19YSlexAS

— Tibor Blaho (@btibor91) January 19, 2025

According to Blaho, OpenAI’s website also contains non-public tables comparing Operator’s performance to other AI systems that use computers. Tables may well be placeholders. But if the numbers are accurate, they indicate that the operator is not 100% reliable, depending on the task.

The OpenAI site already has references to Operator/OpenAI CUA (Computer Use Agent) – “Operator System Card Table”, “Operator Survey Evaluation Table” and “Operator Denial Rate Table”

Including comparison with Claude 3.5 Sonnet using PC, Google Mariner etc.

(preview tables… pic.twitter.com/OOBgC3ddkU

— Tibor Blaho (@btibor91) January 20, 2025

On OSWorld, a benchmark that tries to mimic a real computer environment, “OpenAI Computer Use Agent (CUA)” — presumably the AI model powering Operator — scores 38.1%, ahead of Anthropic’s computer control model but far less than 72.4% of people score. OpenAI CUA outperforms human performance on WebVoyager, which assesses an AI’s ability to navigate and interact with websites. However, the model falls short of human-level scores on another web-based benchmark, WebArena, according to the leaked benchmarks.

The operator also struggles with tasks that could easily be performed by a human if the leak is to be believed. In a test that tasked Operator with registering with a cloud provider and starting a virtual machine, Operator was successful only 60% of the time. Tasked with creating a Bitcoin wallet, the operator succeeded only 10% of the time.

We’ve reached out to OpenAI for comment and will update this piece if we hear back.

OpenAI’s impending entry into the AI agent space comes as competitors like the aforementioned Anthropic, Google and others make plays for the nascent segment. AI agents can be dangerous and speculativebut tech giants are already touting them as the next big thing in artificial intelligence. According to for analytics firm Markets and Markets, the market for artificial intelligence agents could be worth $47.1 billion by 2030.

Agents today are rather primitive. However, some experts have raised concerns about their safety if the technology improves quickly.

One of the leaked charts shows the operator performing well in selected security assessments, including tests that try to make the system perform “illegal activities” and search for “sensitive personal data.” According to informationsecurity testing is one of the reasons for the operator’s long development cycle. In recent X positionOpenAI co-founder Wojciech Zaremba criticized Anthropic for releasing an agent he claims lacks security mitigation.

“I can only imagine the backlash if OpenAI made a similar release,” Zaremba wrote.

It’s worth noting that OpenAI has been criticized by AI researchers, including former staff, for allegedly sidelining security work in favor of rapidly producing its technology.

What's Hot

OpenAI’s agent tool may be nearing release

Related Posts

Leave A Reply Cancel Reply