Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

As US spy laws expire, lawmakers divided over protecting Americans from warrantless surveillance

Sources: Runner in talks to raise $2B+ at $50B valuation as business grows

Sam Altman’s project World is trying to scale the human empire of verification. First stop: Tinder.

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Sam Altman’s project World is trying to scale the human empire of verification. First stop: Tinder.

    18 April 2026

    Physical Intelligence, a hot robotics startup, says its new robot brain can understand tasks it was never taught

    17 April 2026

    Luma launches AI production studio with faith-focused Wonder Project

    17 April 2026

    Runway’s CEO Says AI Could Help Hollywood Make 50 Movies Instead of One $100 Million Blockbuster

    16 April 2026

    OpenAI updates its Agents SDK to help enterprises build safer, more capable agents

    16 April 2026
  • Apps

    Zoom is working with the world to verify people in meetings

    18 April 2026

    Google’s AI feature can now help you find in-stock products nearby

    17 April 2026

    Google now lets you explore the web side-by-side with AI

    17 April 2026

    Canva’s AI assistant can now call on various tools to make designs for you

    16 April 2026

    AI learning app Gizmo soars with 13 million users and $22 million in investment

    16 April 2026
  • Crypto

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025
  • Fintech

    Airwallex is set to take on Stripe and the rest of the payments industry — in the physical world

    16 April 2026

    Cash app launches ‘pay later’ feature for P2P transfers

    3 April 2026

    Doss raises $55 million for AI inventory management that connects to ERP

    24 March 2026

    Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

    23 March 2026

    Amid legal turmoil, Kalshi is temporarily banned in Nevada

    20 March 2026
  • Hardware

    Amazon Unveils Slimmer Fire TV Stick HD, Opens Ember Artline TVs for Pre-Order

    16 April 2026

    Motorola is suing social platforms and creators over posts raising concerns about speech in India

    16 April 2026

    AI data center startup Fluidstack is in talks for a $1 billion round at an $18 billion valuation months after raising $7.5 billion, report says

    15 April 2026

    Amazon is ending support for older Kindle devices

    9 April 2026

    Intel signs Elon Musk’s Terafab chip project

    8 April 2026
  • Media & Entertainment

    Netflix plans to add a vertical video stream, use AI for recommendations

    17 April 2026

    Netflix co-founder and chairman Reed Hastings is stepping down from the board

    17 April 2026

    All we like is soulfulness

    16 April 2026

    Wait, could they still break up Live Nation?

    16 April 2026

    HBO Max is coming to India through an exclusive JioHotstar deal

    15 April 2026
  • Security

    As US spy laws expire, lawmakers divided over protecting Americans from warrantless surveillance

    18 April 2026

    Hackers are exploiting unpatched Windows security flaws to break into organizations

    17 April 2026

    Fashion retailer Express leaked customers’ personal data and order details online

    17 April 2026

    Two Americans convicted of helping North Korea steal $5 million in fake IT worker scheme

    16 April 2026

    Sweden blames Russian hackers for attempted ‘catastrophic’ cyberattack on thermal plant

    15 April 2026
  • Startups

    Sources: Runner in talks to raise $2B+ at $50B valuation as business grows

    18 April 2026

    SaySo is a new short-form video app that aims to restore users’ trust in news

    17 April 2026

    From the Startup Battlefield to the International Space Station: geCKo Materials Made a Sticky Product

    17 April 2026

    This energy startup’s bet on 100-year-old grid technology is paying off

    16 April 2026

    Hightouch reaches $100M ARR powered by AI-powered marketing tools

    16 April 2026
  • Transportation

    Uber will now collect your returns from your doorstep

    17 April 2026

    Lucid Motors Appoints New CEO, Gets More Money From Uber, Saudis

    17 April 2026

    Monarch Tractor collapse ends with takeover by Caterpillar

    16 April 2026

    Ford EV and chief technology officer are leaving the auto industry

    16 April 2026

    Chipmakers AMD, Arm and Qualcomm are investing in this buzzing self-driving technology startup

    15 April 2026
  • Venture

    Anthropic rejects VC funding that values ​​it at $800B+, for now

    16 April 2026

    Financial risk management platform Pillar raises $20 million in rounds led by a16z

    15 April 2026

    Vercel CEO Guillermo Rauch signals IPO readiness as AI agents drive revenue

    14 April 2026

    Nvidia-backed SiFive hits $3.65 billion valuation for open AI chips

    11 April 2026

    How to make the Startup Battlefield Top 20 — and what each company gets regardless

    10 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Openai’s new reasoning, AI models admit more
AI

Openai’s new reasoning, AI models admit more

techtost.comBy techtost.com19 April 202504 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Openai's New Reasoning, Ai Models Admit More
Share
Facebook Twitter LinkedIn Pinterest Email

Openai models recently started O3 and O4-MINI AI are state-of-the-art in many ways. However, new models are still paid or do things – in fact, paid off more From many of the older models of Openai.

Halfuses have been shown to be one of the biggest and most difficult problems to solve AI, even affecting today’s better -performance systems. Historically, every new model has improved slightly in the illusory section, less than its predecessor. But this does not seem to happen for O3 and O4-mini.

According to Openai’s internal tests, O3 and O4-MINI, which are so-called Reasoning models, illusions more often From previous reasoning models of the company-O1, O1-mini and O3-mini-as the traditional models of Openai, such as the GPT-4O.

Perhaps more about, the Chatgpt manufacturer doesn’t really know why it happens.

In his technical report for O3 and o4-miniOpenai writes that “more research is needed” to understand why hallucinations get worse as they scale models of reasoning. O3 and O4-MINI better attribute to certain areas, including coding and mathematics. But because they “make more claims overall”, they often lead to “more accurate allegations as well as more inaccurate/parable claims,” ​​according to the report.

Openai found that the O3 was assigned in response to 33% of the personqa questions, the company’s internal reference point to measure the accuracy of a model of a model for humans. This is about twice the illusion rate of previous Openai, O1 and O3-MINI reasoning models, which recorded 16% and 14.8% respectively. O4-mini even gets worse in Personqa-rendering 48% of the time.

Third trial With Transluce, a non -profit AI research workshop, he also found that O3 tends to compose actions it took in the process of arriving in answers. In an example, Transluce observed the O3 claiming that it ran the code to a 2021 MacBook Pro “outside the chatgpt”, then copies the numbers to its answer. While O3 has access to some tools, it cannot do so.

“Our hypothesis is that the type of aid learning used for models in the O series can strengthen issues that are usually mitigated (but not fully deleted) by standard pipelines after training,” said Neil Chowdhury, a translated researcher and former Openai employee in an email.

Sarah Schwettmann, co -founder of Transluce, added that the O3 illusion rate can make it less useful than it would be.

Kian Katanforoosh, Professor and Managing Director of Stanford, Stanford, told TechCrunch that his team is already testing the O3 in coding flows and found it to be one step above the competition. However, Katanforosh says that O3 tends to give up broken site links. The model will provide a link that, when clicking, does not work.

Halfuses can help models reach interesting ideas and be creative in their “thinking”, but they also make some models a harsh sale for shopping in markets where accuracy is primary. For example, a law firm would probably not be happy with a model that introduces many real errors into customer contracts.

A very promising approach to enhance the accuracy of their models gives web search opportunities. Openai’s GPT-4O with tissue search achieves Accuracy of 90% In Simpleqa, another of the reference points of Openai’s accuracy. Perhaps the search could also improve the illusion rates of logic models, at least in cases where users are willing to expose the suggestions to a third search provider.

If the escalation of the reasoning models continues to aggravate hallucinations, it will make hunting for an even more urgent solution.

“Tackling the hallucinations in all our models is an ongoing research sector and we are constantly working to improve their accuracy and reliability,” Openai Niko Felix spokesman said in an email in TechCrunch.

Last year, the wider AI industry has rotated to focus on reasoning models after techniques to improve traditional AI models has begun to show reduced yields. Reason improves the performance of the model in a variety of work without requiring huge amounts of computers and data during training. However, it seems that reasoning can also lead to more illusions – presenting a challenge.

admit ChatGPT hallucinations models open OpenAIs Reasoning
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleMark Zuckerberg says Tiktok has slowed the Meta growth
Next Article Subaru makes trailseeker debut, an electric SUV coming for Rivian’s Outdoorsy EV Base Outdoorsy EV
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Sam Altman’s project World is trying to scale the human empire of verification. First stop: Tinder.

18 April 2026

Physical Intelligence, a hot robotics startup, says its new robot brain can understand tasks it was never taught

17 April 2026

Luma launches AI production studio with faith-focused Wonder Project

17 April 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

As US spy laws expire, lawmakers divided over protecting Americans from warrantless surveillance

18 April 2026

Sources: Runner in talks to raise $2B+ at $50B valuation as business grows

18 April 2026

Sam Altman’s project World is trying to scale the human empire of verification. First stop: Tinder.

18 April 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Airwallex is set to take on Stripe and the rest of the payments industry — in the physical world

16 April 2026

Cash app launches ‘pay later’ feature for P2P transfers

3 April 2026

Doss raises $55 million for AI inventory management that connects to ERP

24 March 2026
Startups

Sources: Runner in talks to raise $2B+ at $50B valuation as business grows

SaySo is a new short-form video app that aims to restore users’ trust in news

From the Startup Battlefield to the International Space Station: geCKo Materials Made a Sticky Product

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.