Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Imperagen raises £5m to use quantum physics, AI to engineer enzymes

SpaceX’s IPO filing is filled with AI bets, Starship dreams and Elon Musk at the center

Sam Altman does a ‘mic drop’ pitch to every Y Combinator startup

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Jensen Huang Says He’s Found a ‘Brand New’ $200B Market for Nvidia

    21 May 2026

    Stability AI releases a new audio model that can create six-minute songs

    20 May 2026

    You can now speak in your Gmail inbox, as seen at Google IO 2026

    20 May 2026

    Anthropic has acquired the programming tools startup used by OpenAI, Google and Cloudflare

    19 May 2026

    SandboxAQ brings drug discovery models to Claude — no computer science PhD required

    19 May 2026
  • Apps

    Airbnb enters hotels, extends AI to host integration and customer support

    21 May 2026

    Figma adds an AI assistant to its collaborative canvas

    20 May 2026

    Google has just announced that it is a contender in AI design at IO 2026

    20 May 2026

    Apple announces accessibility feature updates with Apple Intelligence support

    19 May 2026

    Kin Health raises $9 million to build an AI notebook for patients

    19 May 2026
  • Crypto

    As crypto cools, a16z crypto raises $2.2 billion in capital

    6 May 2026

    Coinbase to lay off 14% of staff as part of broader restructuring

    5 May 2026

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025
  • Fintech

    Startup Battlefield 200 applications close on May 27

    21 May 2026

    Venmo’s biggest makeover in years comes at a very interesting time

    11 May 2026

    Fintech startup Parker files for bankruptcy

    10 May 2026

    Robinhood’s venture fund IPO attracted 150,000+ private investors, CEO says

    7 May 2026

    PayPal says it’s “becoming a tech company again” — that’s AI

    6 May 2026
  • Hardware

    Minimalist Light Phone teams up with Andrew Yang’s Noble Mobile, which pays you to stop doomscrolling

    20 May 2026

    Mach Industries just spent $50 million to solve a major defense technology problem

    20 May 2026

    South Korea’s LetinAR makes optics behind AI glasses

    18 May 2026

    Users are turning to jailbreaking their older Kindles as Amazon ends support

    17 May 2026

    Cerebras raises $5.5 billion, then shares soar to $108, first huge tech IPO of 2026

    15 May 2026
  • Media & Entertainment

    ‘Ask YouTube’ Brings AI Chat Search to Video, Adds Gemini Omni to Shorts

    20 May 2026

    Google’s Gemini Omni turns images, audio and text into video — and that’s just the beginning

    19 May 2026

    Theo Baker spent four years researching Stanford. Before he leaves, here’s what he found.

    19 May 2026

    YouTube viewers watch 2 billion hours of Shorts on TV every month

    14 May 2026

    Digg is trying again, this time as an AI news aggregator

    12 May 2026
  • Security

    Customers say Trump Mobile is leaking their personal information

    20 May 2026

    US cyber agency CISA has exposed bundles of passwords and cloud keys to the open web

    19 May 2026

    Open source tools maker Grafana Labs says hackers stole its code and refuses to pay ransom

    19 May 2026

    NYC Health + Hospitals says hackers stole medical data and fingerprints during breach affecting at least 1.8 million people

    18 May 2026

    Instructure strikes against hackers who breached it twice

    17 May 2026
  • Startups

    Imperagen raises £5m to use quantum physics, AI to engineer enzymes

    21 May 2026

    NanoClaw creator rejects $20M takeover offer, raises $12M instead

    20 May 2026

    From teenage hacker to Iron Dome researcher, this founder raised $28M to fight AI phishing

    20 May 2026

    “Survivor” stars Kyle Fraser and Kamilla Karthigesu present a goal-tracking app, Paprclip

    19 May 2026

    Clio’s $500 million milestone comes just as Anthropic steps up to first stage

    15 May 2026
  • Transportation

    SpaceX’s IPO filing is filled with AI bets, Starship dreams and Elon Musk at the center

    21 May 2026

    The Quartermaster builds a sea hive mind

    20 May 2026

    OSHA is investigating the death of a worker at SpaceX’s Starbase site

    19 May 2026

    TechCrunch Mobility: The AI ​​skills arms race is coming for the automotive industry

    18 May 2026

    Tesla Reveals Two Robotaxi Accidents With Remote Controls

    16 May 2026
  • Venture

    Sam Altman does a ‘mic drop’ pitch to every Y Combinator startup

    21 May 2026

    Startup Battlefield 200 applications close on May 27

    20 May 2026

    Stilta raises $10.5M from a16z and YC to help companies rediscover patents they forgot they had

    20 May 2026

    Forget Streaming: Status AI Raises $17 Million To Turn Social Media Into Interactive Entertainment

    19 May 2026

    For Eclipse, the $2.5 billion Cerebras win is just the beginning of realizing its physical world thesis

    17 May 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»OPENAI’s O3 AI model scores lower at a reference point than initially implied the company
AI

OPENAI’s O3 AI model scores lower at a reference point than initially implied the company

techtost.comBy techtost.com21 April 202504 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Openai's O3 Ai Model Scores Lower At A Reference Point
Share
Facebook Twitter LinkedIn Pinterest Email

A disconnection between first and third -party reference results for O3 AI model is Openai is Asking for questions about the company’s slide and model test practices.

When Openai revealed O3 in December, the company claimed that the model could answer just over a quarter of the Frontiermath questions, a difficult set of mathematical problems. This score broke the competition-the next best model managed to respond properly to 2% about 2% of Frontiermath problems.

‘Today, all offers out there have less than 2% [on FrontierMath]”Mark Chen, Head of Researcher at Openai, said during a lifetime. ‘We see [internally]With O3 in aggressive testing settings, we are able to reach over 25%. ”

It turns out that this number was probably a upper limit, reached by an O3 version with more computers behind it than the OpenAi model that began publicly last week.

Epoch AI, the research institute behind Frontiermath, released the results of O3’s independent reference tests on Friday. Epoch found that O3 recorded about 10%, well below Openai’s highest score.

Openai has released O3, their long-awaited model of logic, along with O4-Mini, a smaller and cheaper model that succeeds O3-Mini.

We evaluated the new models in the suite of mathematics and science. It results in the thread! pic.twitter.com/5gbtzkey1b

– epoch ai (@epochairesearch) April 18 2025

This does not mean that Openai lies, per se. The reference results The company published in December shows a lower score that matches the score is observed. Epoch also noted that the testing test is probably different from Openai’s and used an up -to -date Frontiermath release for its ratings.

“The difference between our results and the Openai may be due to Openai’s evaluation with a more powerful internal scaffold using more testing time [computing]or because these results were carried out on a different subset of Frontiermath (the 180 problems in the Frontiermath-2024-11-26 compared to the 290 problems at Frontiermath-2025-02-28-Private), ” I wrote Time.

According to a post in x From the Arc Prize Foundation, an organization that examined a release before O3 release, the public model O3 “is a different model […] Coordinated for use of conversation/product, “confirms the report of the season.

“All circulators O3 calculate levels are smaller than version we [benchmarked]”He wrote the ARC Award. In general, bigger computational steps are expected to achieve better reference ratings.

The review released by O3 on ARC-AGI-1 will last one day or two. Because today’s liberation is a virtually different system, we re -link our past results as “preview”:

O3-pareview (low): 75.7%, $ 200/work
O3-Preview (high): 87.5%, $ 34.4k/Task

Above uses O1 Pro pricing …

– Mike Knoop (@mikeknoop) April 16 2025

His own Wenda Zhou, a member of the technical staff, Said during a livestream last week That O3 in production is “more optimized for cases of real world use” and speed against the O3 version submitted in December. As a result, he may present “inequalities”, he added.

“[W]You have done [optimizations] To make the [model] more efficient financial [and] More useful in general, “Zhou said.” We still hope – we still believe that – this is a much better model […] You won’t have to wait so much when you ask for an answer, which is real with them [types of] models. ”

The fact that the public release of O3 is not lacking from Openai’s promises is a part of a point, as the O3-Mini-High and O4-Mini models exceeded the O3 in Frontiermath and Openai plans to debut in a stronger O3, O3-PRO variation in the next few weeks.

However, it is another reminder that AI’s reference points are not better taken at their nominal value – especially when the source is a company with services for sale.

The comparative “controversy” evaluation becomes a common phenomenon in the AI ​​industry, as sellers are fighting to capture headlines and Mindshare with new models.

In January, Epoch was criticized for waiting to disclose funding from Openai until O3 announced. Many academics who contributed to Frontiermath were not informed of Openai’s participation until it was made public.

More recently, Elon Musk’s XAI has been accused of publishing misleading reference charts for the latest AI model, Grok 3. Just this month, Meta admitted that he brought a reference rating for a version of a model that was different from the one posted to the developers.

Updated 4:21 PM Pacific: Comments were added by Wenda Zhou, a member of the OpenAi technical staff, from a livelihood last week.

company implied initially model o3 open OpenAIs point reference scores
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleKids definitely love video game movies
Next Article Instagram uses AI to find teenagers for their age and by limiting their accounts
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Jensen Huang Says He’s Found a ‘Brand New’ $200B Market for Nvidia

21 May 2026

Stability AI releases a new audio model that can create six-minute songs

20 May 2026

You can now speak in your Gmail inbox, as seen at Google IO 2026

20 May 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Imperagen raises £5m to use quantum physics, AI to engineer enzymes

21 May 2026

SpaceX’s IPO filing is filled with AI bets, Starship dreams and Elon Musk at the center

21 May 2026

Sam Altman does a ‘mic drop’ pitch to every Y Combinator startup

21 May 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Startup Battlefield 200 applications close on May 27

21 May 2026

Venmo’s biggest makeover in years comes at a very interesting time

11 May 2026

Fintech startup Parker files for bankruptcy

10 May 2026
Startups

Imperagen raises £5m to use quantum physics, AI to engineer enzymes

NanoClaw creator rejects $20M takeover offer, raises $12M instead

From teenage hacker to Iron Dome researcher, this founder raised $28M to fight AI phishing

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.