Openai gave me a week to try his new Ag Agent, the operator, a system that can do independent tasks for you online.
The operator is the closest thing I have seen in the vision of technology for AI agents – systems that can automate the boring parts of life, releasing us to do the things we really love. However, judging from my experience with the Openai agent, AI really “autonomous” systems are still careless.
Open Trained a new model in the electricity operatorwhich combines the visual understanding of the GPT-4O with O1 reasoning capabilities.
This model seems to work well for basic tasks. I watched the click buttons, browse menu on site and fill in forms. AI was occasionally successful in independent action and works much faster than the online agents I have seen from Anthropic and Google.
But during my trial, I found myself helping the Openai agent more than I would like. I felt like I trained the operator through every problem, and I wanted to promote some tasks from my plate completely.
Very often during my test, I had to answer various questions, grant rights, complete personal information and help the agent when stuck.
In car terms, the operator is like driving a car with cruise control-peristamas by taking your foot from the pedals and letting the car drive-but is far from the full automatic pilot.
In fact, Openai says that frequent pauses of the operator are from the design.
The AI power operator, such as AI that supplies chatbots such as OpenAi Chatgpt, cannot operate reliably independently for long periods and is prone to the same type of hallucinations. Because of this, Openai does not want to give the system excessive decision -making or sensitive user information. Perhaps this is a safe choice from Openai, but it reduces the practicality of the operator.
This is what the first agent of Openai is an impressive proof of the concept – and the interface – for an AI that can use the front end of any site. But to create truly independent AI systems, technology companies will have to create more reliable AI models that do not require this very steering.
A little “hands”
My test for the operator coincided with the week I moved apartments, so I had helped the Openai agent by moving the logistics.
I asked the operator to help me buy a new parking license. The Openai agent told me, “Sure”, opened a window on his browser on my computer screen.
The operator then conducted a search for a San Francisco parking license in the browser, took me to the right city website and even to the right page.
The operator still allows you to use the rest of your computer while it works, which cannot be said about Google’s Mariner Project. This is due to the fact that the Openai agent does not really work on the computer, but rather, in the cloud somewhere.
For my parking license, I had to grant permission to the operator to start different procedures sometimes. He also stopped asking me to fill in forms with personal information – such as my name, phone number and email address. At times, the operator was also lost, forcing me to take control of the browser and return the agent.
In another test, I asked the operator to book me in a Greek restaurant. For his faith, my operator found a nice place in my area with reasonable prices. But I had to answer more than half questions throughout the flow.
If you have to intervene six or more times only to close a reservation through an AI agent, where is it easier to do it yourself? This is a question I asked a lot during the operator test.
Agent-as-a-Platform
In some of my tests, I ran to websites that prevented the operator for any reason. For example, I tried to keep an electrician using Taskrabbit, but the Openai agent told me he ran in a mistake and asked if he could use an alternative service. Expedia, Reddit and YouTube also prevented Agent AI from access to their platforms.
However, other services embrace the operator with open arms. Instacart, Uber and Ebay worked with Openai to start the operator, allowing the agent to navigate his websites on behalf of the people.
These companies are preparing for a future where a user’s interaction subset is facilitated by an AI agent.
“Customers use Instacart through a variety of different entry points,” said Daniel Danker, head of products on Instacart, in an interview with TechCrunch. “We see the operator as possibly another of these entry points.”
Leaving the OpenAI agent to use the Instacart website on behalf of a person seems to be separating the Instacart from its customers. However, Danker says Instacart wants to meet customers wherever they are.
“We are truly emerging as our belief, similar to Openai, that practical systems will have a significant impact on the way consumers interact with digital properties,” Ebay chief Nitzan Mekel-Bobrov said in an interview with In TechCrunch.
Even if AI agents are growing in popularity, Mekel-Bobrov says he expects users to always come to the ebay website, noting that “online destinations are nowhere to go”.
Issues of trust
I had some issues that trust the operator after misleading sometimes and almost cost me several hundred dollars.
For example, I asked the agent to find me a garage near my new apartment. He ended up suggesting two garages that said it would take a few minutes to walk.
In addition to getting out of my price range, the garages were really far from my apartment. One was a 20 -minute walk, and the other was a 30 -minute walk. It turns out that the operator had put the wrong address.
That is exactly why Openai does not give its agent the credit card number, passwords or email access. If Openai didn’t let me intervene here, the operator would have wasted hundreds of dollars at a parking lot I didn’t need.
Illusions like this are a basic barricade in truly useful autonomous agents – those who can get annoying tasks from your plate. No one will trust agents if they are prone to basic mistakes, especially errors with real consequences.
With the operator, Openai seems to have created some impressive tools to let AI systems tour the web. But these tools will not be much until AI support can reliably do what users ask for. Until then, people will get stuck by helping agents – not the other way around. And this kind of winning the point.