Openai says GPT-5 stacks people in a wide range of jobs

Openai released a new benchmark On Thursday he tries how AI models operate compared to human professionals in a wide range of industries and jobs. Testing, GDPPVal, is an early attempt to understand how closely the Openai systems it is to overcome people in economically valuable work – a key part of the company’s founding mission to develop artificial general intelligence or agi.

Openai says it found that the GPT-5 model and Claude Opus 4.1 of ANTHROPIC are already approaching the quality of work produced by industry experts. ”

This does not mean that OpenAi models are going to start replacing people in their jobs immediately. Despite the forecasts of some CEO who AI will take people’s jobs in just a few years, Openai admits that GDPVal today covers a very limited number of duties that people do in their real jobs. However, it is one of the last ways in which the company measures AI’s progress towards this milestone.

GDPVAL is based on nine industries that contribute more to America’s gross domestic product, including sectors such as healthcare, funding, construction and government. The benchmark tries the performance of an AI model in 44 professions between these industries, ranging from software to nurses to journalists.

For the first version of the OpenAi test, the GDPVAL-V0, Openai asked experienced professionals to compare the reports created by AI with those produced by other professionals and then choose the best. For example, one prompt asked investment bankers to create a landscape of the latest mileage industry and compare them with reports created by AI. Openai then calculates on average the “rhythm of victory” of an AI model over human references to all 44 occupations.

For GPT-5-High, a version of the GPT-5 with additional computing power, the company states that the AI model is classified as better than or is equivalent to industry experts 40.6% of the time.

Openai also examined Anthropic’s Claude Opus 4.1, which was classified as better than or equivalent to industry experts in 49% of duties. Openai says he believes Claude scored so high because of his tendency to make pleasant graphics instead of net performance.

TechCrunch event

Francisco
|
27-29 October 2025

Image credits:Open

It is worth noting that most professionals who work do much more than submitting research reports to their boss, which are all these GDPVAL-V0 tests. Openai recognizes this and says it plans to create more powerful tests in the future that can represent more industries and interactive work flows.

Nevertheless, the company sees the progress on GDPVAL as remarkable.

In an interview with TechCrunch, Openai’s chief economist, Dr. Aaron Chatterji, said the results of GDPVAL indicate that people in these jobs can now use AI models to spend time with more important tasks.

“[Because] The model is good in some of these things, “says Chatterji,” people in these jobs can now use the model, increasingly as the potential improves, to unload part of their job and potentially do higher things. “

Openai’s ratings lead Tejal Patwardhan tells TechCrunch that he is encouraged by the progress rate on GDPVAL. Openai’s GPT-4O model scored just 13.7% (victories and ties against people), released about 15 months ago. Now the GPT-5 scores almost triple this, a trend that Patwardhan expects to continue.

Silicon Valley has a wide range of reference criteria she uses to measure AI models and evaluate if a given model is state-of-the-art. Among the most popular are Aime 2025 (a test of competitive mathematical problems) and GPQA Diamond (a test of scientific questions at a doctoral level). However, several AI models are approaching satiety in some of these reference points, and many AI researchers have reported the need for better tests that can measure AI’s adequacy of real duties.

Reference points such as GDPPVal could become more and more important in this discussion, as OpenAi makes the assumption that AI models are valuable for a wide range of industries. But openai may need a more complete version of the test to permanently say that AI models can overcome people.

What's Hot

Openai says GPT-5 stacks people in a wide range of jobs

Related Posts

Leave A Reply Cancel Reply