Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

The climate tech IPO window could finally open

Meta says its business AI now facilitates 10 million conversations per week

Spotify introduces verified artist badges to distinguish humans from artificial intelligence

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Meta says its business AI now facilitates 10 million conversations per week

    30 April 2026

    Amazon’s cloud business is growing — and so is its capital spending

    30 April 2026

    Firestorm Labs raises $82 million to bring drone factories to the field

    29 April 2026

    YouTube is testing an AI-powered search feature that shows guided answers

    28 April 2026

    OpenAI ends Microsoft’s legal risk over $50 billion Amazon deal

    28 April 2026
  • Apps

    Spotify introduces verified artist badges to distinguish humans from artificial intelligence

    30 April 2026

    Google gains 25 million subscribers in Q1, thanks to YouTube and Google One

    30 April 2026

    Meet Shapes, the app that brings humans and artificial intelligence into the same group chats

    29 April 2026

    Amazon is launching an AI-powered audio Q&A experience on product pages

    29 April 2026

    Snapchat is bringing AI-powered chat ads to its app

    28 April 2026
  • Crypto

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025
  • Fintech

    Amazon, Meta join the fight to end Google Pay and PhonePe’s dominance in India

    30 April 2026

    Steve Ballmer slams founder he backed, who pleaded guilty to fraud: ‘I was cheated and I feel stupid’

    25 April 2026

    Salmon raises $100 million in equity and debt to bring digital credit to unbanked Filipinos

    24 April 2026

    Cash App targets a new type of customer: children aged 6 to 12 years

    22 April 2026

    Revolut eyes up to $200 billion valuation in potential IPO

    22 April 2026
  • Hardware

    More Gemini features are coming to Google TV

    30 April 2026

    OpenAI could be building a phone with AI agents that replace apps

    28 April 2026

    SpeakOn’s dictation device is a good idea marred by platform limitations

    27 April 2026

    What Tim Cook Built | TechCrunch

    27 April 2026

    Apple under Ternus: what’s next for the tech giant’s hardware strategy

    26 April 2026
  • Media & Entertainment

    Roku’s $3 streaming service Howdy hits 1 million subscribers, per recent report

    29 April 2026

    Australia forces Big Tech companies to pay for news or face 2.25% tax.

    28 April 2026

    India’s app market is booming — but global platforms are raking in most of the profits

    23 April 2026

    YouTube extends its AI similarity detection technology to celebrities

    21 April 2026

    Deezer says 44% of songs uploaded to its platform every day are created with artificial intelligence

    20 April 2026
  • Security

    Sri Lanka reveals another missing payment, days after hackers stole $2.5 million from its finance ministry

    29 April 2026

    The US Supreme Court appears divided on the controversial use of ‘geofence’ search warrants.

    29 April 2026

    Paragon is not cooperating with Italian authorities investigating spyware attacks, the report said

    28 April 2026

    Critical infrastructure giant Itron says it was breached

    28 April 2026

    The hacker who allegedly carried out cyberattacks for China is extradited to the US

    27 April 2026
  • Startups

    Bill Gurley, Jack Altman back startup Pursuit, which helps companies sell to the government

    30 April 2026

    BCI startup Neurable wants to license ‘mind reading’ technology to wearable consumer devices

    29 April 2026

    Founder of Shark Tank-backed startup Sholly sues buyer Sallie Mae

    29 April 2026

    Lachy Groom to back Indian startup Pronto at $200m valuation, sources say

    26 April 2026

    Why Tokyo is the most important tech destination of 2026

    25 April 2026
  • Transportation

    Uber is now in the hospitality industry, thanks in part to artificial intelligence

    29 April 2026

    TechCrunch Mobility: Elon’s Acceptance | TechCrunch

    27 April 2026

    Production of the Rivian R2 has begun despite tornado damage at the factory

    25 April 2026

    Porsche is adding an all-electric Cayenne coupe to its lineup

    24 April 2026

    Tesla’s Q1 revenue rises, driven by EV sales and FSD subscriptions

    24 April 2026
  • Venture

    The climate tech IPO window could finally open

    30 April 2026

    Sources: Anthropic Could Raise New $50B Round at $900B Valuation

    30 April 2026

    BMW i Ventures Has a New $300M Fund and AI Rides Shotgun

    29 April 2026

    How a venture firm invests in an increasingly fragmented world

    29 April 2026

    Stanford freshmen who want to rule the world. . . he will probably read this book and try even harder

    27 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»OPENAI’s O3 AI model scores lower at a reference point than initially implied the company
AI

OPENAI’s O3 AI model scores lower at a reference point than initially implied the company

techtost.comBy techtost.com21 April 202504 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Openai's O3 Ai Model Scores Lower At A Reference Point
Share
Facebook Twitter LinkedIn Pinterest Email

A disconnection between first and third -party reference results for O3 AI model is Openai is Asking for questions about the company’s slide and model test practices.

When Openai revealed O3 in December, the company claimed that the model could answer just over a quarter of the Frontiermath questions, a difficult set of mathematical problems. This score broke the competition-the next best model managed to respond properly to 2% about 2% of Frontiermath problems.

‘Today, all offers out there have less than 2% [on FrontierMath]”Mark Chen, Head of Researcher at Openai, said during a lifetime. ‘We see [internally]With O3 in aggressive testing settings, we are able to reach over 25%. ”

It turns out that this number was probably a upper limit, reached by an O3 version with more computers behind it than the OpenAi model that began publicly last week.

Epoch AI, the research institute behind Frontiermath, released the results of O3’s independent reference tests on Friday. Epoch found that O3 recorded about 10%, well below Openai’s highest score.

Openai has released O3, their long-awaited model of logic, along with O4-Mini, a smaller and cheaper model that succeeds O3-Mini.

We evaluated the new models in the suite of mathematics and science. It results in the thread! pic.twitter.com/5gbtzkey1b

– epoch ai (@epochairesearch) April 18 2025

This does not mean that Openai lies, per se. The reference results The company published in December shows a lower score that matches the score is observed. Epoch also noted that the testing test is probably different from Openai’s and used an up -to -date Frontiermath release for its ratings.

“The difference between our results and the Openai may be due to Openai’s evaluation with a more powerful internal scaffold using more testing time [computing]or because these results were carried out on a different subset of Frontiermath (the 180 problems in the Frontiermath-2024-11-26 compared to the 290 problems at Frontiermath-2025-02-28-Private), ” I wrote Time.

According to a post in x From the Arc Prize Foundation, an organization that examined a release before O3 release, the public model O3 “is a different model […] Coordinated for use of conversation/product, “confirms the report of the season.

“All circulators O3 calculate levels are smaller than version we [benchmarked]”He wrote the ARC Award. In general, bigger computational steps are expected to achieve better reference ratings.

The review released by O3 on ARC-AGI-1 will last one day or two. Because today’s liberation is a virtually different system, we re -link our past results as “preview”:

O3-pareview (low): 75.7%, $ 200/work
O3-Preview (high): 87.5%, $ 34.4k/Task

Above uses O1 Pro pricing …

– Mike Knoop (@mikeknoop) April 16 2025

His own Wenda Zhou, a member of the technical staff, Said during a livestream last week That O3 in production is “more optimized for cases of real world use” and speed against the O3 version submitted in December. As a result, he may present “inequalities”, he added.

“[W]You have done [optimizations] To make the [model] more efficient financial [and] More useful in general, “Zhou said.” We still hope – we still believe that – this is a much better model […] You won’t have to wait so much when you ask for an answer, which is real with them [types of] models. ”

The fact that the public release of O3 is not lacking from Openai’s promises is a part of a point, as the O3-Mini-High and O4-Mini models exceeded the O3 in Frontiermath and Openai plans to debut in a stronger O3, O3-PRO variation in the next few weeks.

However, it is another reminder that AI’s reference points are not better taken at their nominal value – especially when the source is a company with services for sale.

The comparative “controversy” evaluation becomes a common phenomenon in the AI ​​industry, as sellers are fighting to capture headlines and Mindshare with new models.

In January, Epoch was criticized for waiting to disclose funding from Openai until O3 announced. Many academics who contributed to Frontiermath were not informed of Openai’s participation until it was made public.

More recently, Elon Musk’s XAI has been accused of publishing misleading reference charts for the latest AI model, Grok 3. Just this month, Meta admitted that he brought a reference rating for a version of a model that was different from the one posted to the developers.

Updated 4:21 PM Pacific: Comments were added by Wenda Zhou, a member of the OpenAi technical staff, from a livelihood last week.

company implied initially model o3 open OpenAIs point reference scores
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleKids definitely love video game movies
Next Article Instagram uses AI to find teenagers for their age and by limiting their accounts
bhanuprakash.cg
techtost.com
  • Website

Related Posts

The climate tech IPO window could finally open

30 April 2026

Meta says its business AI now facilitates 10 million conversations per week

30 April 2026

Amazon’s cloud business is growing — and so is its capital spending

30 April 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

The climate tech IPO window could finally open

30 April 2026

Meta says its business AI now facilitates 10 million conversations per week

30 April 2026

Spotify introduces verified artist badges to distinguish humans from artificial intelligence

30 April 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Amazon, Meta join the fight to end Google Pay and PhonePe’s dominance in India

30 April 2026

Steve Ballmer slams founder he backed, who pleaded guilty to fraud: ‘I was cheated and I feel stupid’

25 April 2026

Salmon raises $100 million in equity and debt to bring digital credit to unbanked Filipinos

24 April 2026
Startups

Bill Gurley, Jack Altman back startup Pursuit, which helps companies sell to the government

BCI startup Neurable wants to license ‘mind reading’ technology to wearable consumer devices

Founder of Shark Tank-backed startup Sholly sues buyer Sallie Mae

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.