Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

SpaceX IPO: Everything You Need To Know

Equal AI raises $30 million to screen calls so Indians don’t have to

ServiceNow is telling customers that a bug left some of their data exposed online

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    SpaceX IPO: Everything You Need To Know

    12 June 2026

    Theker just raised $85 million to build factory robot that specializes in nothing

    12 June 2026

    DoorDash’s new AI chatbot lets you order with prompts and photos

    11 June 2026

    Opendoor’s exit from India fuels a larger conversation about AI and outsourcing

    11 June 2026

    How memory tools can make AI models worse

    10 June 2026
  • Apps

    Equal AI raises $30 million to screen calls so Indians don’t have to

    12 June 2026

    Bluesky launches group chats as company shifts focus to community features

    12 June 2026

    Pool’s new app turns your screenshots into something useful

    11 June 2026

    Pinterest bets on creators with Amazon Storefront integration

    11 June 2026

    Zest Launches Restaurant Discovery App Powered by Where People Really Eat

    10 June 2026
  • Crypto

    Startup Battlefield 200 applications close today

    27 May 2026

    5 days left: Save up to $410 on Disrupt 2026 passes

    25 May 2026

    As crypto cools, a16z crypto raises $2.2 billion in capital

    6 May 2026

    Coinbase to lay off 14% of staff as part of broader restructuring

    5 May 2026

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026
  • Fintech

    Ramp raises $750M at $44B valuation as investors thirst for fintechs with AI history

    5 June 2026

    Last 24 hours to save up to $410 on your Disrupt 2026 ticket

    29 May 2026

    2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

    28 May 2026

    Robinhood now allows your AI agents to trade stocks

    28 May 2026

    Disrupt 2026 Early Bird ticket savings expire in 3 days

    27 May 2026
  • Hardware

    Jeff Bezos’ Prometheus Raises $12 Billion to Build an ‘Artificial General Engineer’ for the Natural World

    12 June 2026

    WWDC 2026: What to expect, from Siri’s long-awaited revamp to Apple Intelligence and iOS 27

    9 June 2026

    What to expect from WWDC 2026: The long-awaited Siri refresh and Apple Intelligence updates

    7 June 2026

    What to expect from WWDC 2026: The long-awaited Siri refresh and Apple Intelligence updates

    5 June 2026

    Oura Ring 5 review: Thinner, lighter, better

    4 June 2026
  • Media & Entertainment

    Deezer’s new tool can recognize AI music from Spotify, Apple Music and more

    11 June 2026

    Netflix expands revamped mobile app across Asia and doubles down on games for kids

    10 June 2026

    Plex adds new social features ahead of major price hike for its lifetime pass

    6 June 2026

    Startup Battlefield 200 applications officially close in 3 days

    5 June 2026

    Founders Fund Launches Series of Games Starring Sam Altman, Palmer Luckey and Other Tech Elites

    5 June 2026
  • Security

    ServiceNow is telling customers that a bug left some of their data exposed online

    12 June 2026

    Oracle warns of security flaw that hackers abused to breach 100+ companies

    11 June 2026

    Cybersecurity researchers not happy with guardrails in Anthropic’s Fable

    11 June 2026

    North Koreans behind nearly half of US tech industry hacks, CrowdStrike says

    10 June 2026

    Massachusetts votes in favor of new privacy bill that bans sale of precise location data

    9 June 2026
  • Startups

    Military SPAC Quantum Space is trying to catch SpaceX’s IPO wave

    12 June 2026

    Microsoft is using Alt Carbon as a sign of India’s growing role in carbon removal

    11 June 2026

    Warner Music acquires artificial intelligence performance startup Sureel AI

    11 June 2026

    Datadog veterans launch AI coding startup Niteshift in a bet against Big AI lock-in

    10 June 2026

    Evotrex raises $30 million to build RV that doesn’t need a charging station

    10 June 2026
  • Transportation

    Decart’s new global model can simulate hours of photorealistic driving — with some caveats

    12 June 2026

    Waymo is launching a rewards program with 10% cash back and free cancellations

    11 June 2026

    Everyone wants a piece of Tesla’s batteries

    11 June 2026

    Because everyone is an energy company now

    10 June 2026

    Top Lucid Motors executive exits amid new CEO shakeup

    10 June 2026
  • Venture

    Why business AI will be the focus of VivaTech 2026

    10 June 2026

    How Justin Ernest invested nearly $500 million in hot startups without a traditional VC fund

    10 June 2026

    Mercor’s Brendan Foody calls out Sequoia, accusing it of “double pricing” valuation tricks.

    9 June 2026

    Founders share VC horror stories and some name names

    6 June 2026

    Defense technology, artificial intelligence and fundraising take center stage at StrictlyVC Los Angeles

    5 June 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Did Xai lie about Grok 3’s reference points?
AI

Did Xai lie about Grok 3’s reference points?

techtost.comBy techtost.com23 February 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Did Xai Lie About Grok 3's Reference Points?
Share
Facebook Twitter LinkedIn Pinterest Email

Discussions about AI’s reference points – and how they are reported by AI Labs – lasts in public.

This week, an Openai employee accused Elon Musk’s AI company, XAI, of the publication of misleading reference results for the latest model AI, Grok 3. insisted that the company was on the right.

The truth is somewhere in between.

To one Post on XAI blogThe company published a graph showing Grok 3’s performance on Aime 2025, a collection of challenging mathematical questions from a recent mathematical invitations. Some experts have challenged Aime’s validity as a reference point. However, the Aime 2025 and earlier versions of the test are commonly used to explore the mathematical ability of a model.

Xai’s chart showed two variants of Grok 3, Grok 3 logic of Beta and Grok 3 mini Reasoning, hitting the available Openai, O3-MINI-High, Aime 2025. In “Cons@64”.

What is cons@64, can you ask? Well, it’s short about “Consensus@64” and basically gives a model 64 tries to answer every problem at a reference point and gets the answers that are most commonly created as final answers. As you can imagine, the cons@64 tends to enhance the models’ reference ratings enough and skip it from one graph can make it look like a model overcoming another when in fact, this is not the case.

Grok 3 Beta Accounting and Grok 3 Mini Collections for Aime 2025 in “@1”-indicates that the first score the models got on the reference-jumping point under the O3-Mini-High score. The Grok 3 the logic beta routes are also constantly-the light behind the Openai O1 model set on “average” computers. However xai is Advertising Grok 3 as “smarter AI in the world.”

Gooseberry supported in x That Openai has published similarly misleading reference charts in the past – though diagrams that compare the performance of its own models. A more neutral party in the debate has gathered a more “accurate” chart showing almost every performance of the model in cons@64:

Hilarious how some people see my plot as an attack on Openai and others as an attack on Grok, while in fact it is Deepseek Propaganda
(I really think Grok looks good there, and Openai’s ttc chicanery behind O3-MINI-*High*-pass@”1″ “” Worth more checks.) pic.twitter.com/3Wh8Foufic

– Teortaxes ▶ Place February 20 2025

But as a researcher Ai Nathan Lambert pointed to a postPerhaps the most important measurement remains a mystery: the computational (and monetary) costs needed for each model to achieve the best score. This is exactly how a few more reference points of AI communicate for the restrictions of models – and their strengths.

Barbecue Grok lie open points reference reference points xai
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSave Twitch caps in 100 hours of significant points and transporter
Next Article Beta Technologies’ bet on electric flight and new Hyundai Tesla charging port comes short
bhanuprakash.cg
techtost.com
  • Website

Related Posts

SpaceX IPO: Everything You Need To Know

12 June 2026

Theker just raised $85 million to build factory robot that specializes in nothing

12 June 2026

DoorDash’s new AI chatbot lets you order with prompts and photos

11 June 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

SpaceX IPO: Everything You Need To Know

12 June 2026

Equal AI raises $30 million to screen calls so Indians don’t have to

12 June 2026

ServiceNow is telling customers that a bug left some of their data exposed online

12 June 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Ramp raises $750M at $44B valuation as investors thirst for fintechs with AI history

5 June 2026

Last 24 hours to save up to $410 on your Disrupt 2026 ticket

29 May 2026

2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

28 May 2026
Startups

Military SPAC Quantum Space is trying to catch SpaceX’s IPO wave

Microsoft is using Alt Carbon as a sign of India’s growing role in carbon removal

Warner Music acquires artificial intelligence performance startup Sureel AI

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.