Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

What founders can learn from Anjuna’s layoffs and recovery

Volkswagen is dropping the all-electric ID.4 in the U.S

How to make the Startup Battlefield Top 20 — and what each company gets regardless

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    ChatGPT finally offers $100/month plan

    10 April 2026

    AWS boss explains why investing billions in both Anthropic and OpenAI is an okay conflict

    9 April 2026

    Poke makes using AI agents as easy as sending a text

    9 April 2026

    Last 3 days to save up to $500 on your Disrupt 2026 Pass

    8 April 2026

    I can’t help but root for tiny open source AI model maker Arcee

    8 April 2026
  • Apps

    The EFF is the latest organization to leave X

    10 April 2026

    Last 2 days to save up to $500 on your Disrupt 2026 ticket

    9 April 2026

    Canva Doubles Down on AI and Marketing Automation with Simtheory, Ortto Acquisitions

    9 April 2026

    Atlassian launches visual AI tools and third-party agents in Confluence

    8 April 2026

    Chrome is finally adding a better way to deal with too many open tabs

    8 April 2026
  • Crypto

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025
  • Fintech

    Cash app launches ‘pay later’ feature for P2P transfers

    3 April 2026

    Doss raises $55 million for AI inventory management that connects to ERP

    24 March 2026

    Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

    23 March 2026

    Amid legal turmoil, Kalshi is temporarily banned in Nevada

    20 March 2026

    Nominations for the Startup Battlefield 200 are still open

    19 March 2026
  • Hardware

    Amazon is ending support for older Kindle devices

    9 April 2026

    Intel signs Elon Musk’s Terafab chip project

    8 April 2026

    The Xiaomi 17 Ultra has some impressive extras that make taking photos really fun

    6 April 2026

    In Japan, the robot doesn’t come for your job. fills the one no one wants

    6 April 2026

    Peter Thiel’s big bet on solar-powered cow collars

    5 April 2026
  • Media & Entertainment

    Spotify now allows everyone to turn off videos in its app

    9 April 2026

    As YouTube expands into TV, it sees more interactive video across all formats

    9 April 2026

    Tubi is the first streamer to launch a native app on ChatGPT

    8 April 2026

    Binge is a movie watching app that warns you about skips in real time

    7 April 2026

    Netflix is ​​expanding into kids’ games with a new standalone app

    6 April 2026
  • Security

    Hackers steal and leak sensitive LAPD police documents

    9 April 2026

    The developer of WireGuard VPN cannot send software updates after Microsoft locks the account

    9 April 2026

    Hack-for-hire group caught targeting Android devices and iCloud backups

    8 April 2026

    Iranian hackers are targeting critical US infrastructure, US agencies warn

    8 April 2026

    Anthropic debuts preview of powerful new AI model Mythos in new cybersecurity initiative

    7 April 2026
  • Startups

    What founders can learn from Anjuna’s layoffs and recovery

    10 April 2026

    Former Tesla engineer’s startup taps Pronto to help automate a copper mine

    9 April 2026

    Databricks co-founder wins prestigious ACM award, says ‘AGI is already here’

    9 April 2026

    Why a former AirPods engineer is now building heat pumps

    8 April 2026

    AI startup Rocket offers McKinsey-style reporting at a fraction of the cost

    7 April 2026
  • Transportation

    Volkswagen is dropping the all-electric ID.4 in the U.S

    10 April 2026

    Waymo robotaxis tracks potholes and shares that data with Waze users

    9 April 2026

    Self-driving car in Texas hits and kills mother duck, sparking neighborhood outrage

    9 April 2026

    Hermeus raises $350 million to build unmanned hypersonic fighters

    8 April 2026

    Waymo opens robotaxi service in Nashville, partners with Lyft

    7 April 2026
  • Venture

    How to make the Startup Battlefield Top 20 — and what each company gets regardless

    10 April 2026

    Collide Capital Raises $95M to Back Future-of-Work Fintech Startups

    9 April 2026

    VC Eclipse has a new $1.3 billion fund to back — and build — “natural AI” startups

    8 April 2026

    The AI ​​gold rush is pulling private wealth into riskier, older bets

    7 April 2026

    Save up to $500 on tickets this week for Disrupt 2026

    6 April 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Meta’s reference points for new AI models are a bit misleading
AI

Meta’s reference points for new AI models are a bit misleading

techtost.comBy techtost.com7 April 202502 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Meta's Reference Points For New Ai Models Are A Bit
Share
Facebook Twitter LinkedIn Pinterest Email

One of the new Meta Model Model released on Saturday, Maverick, Maverick, Ranked second place in LM ArenaA test that has human graduates compares the exits of the models and chooses who they prefer. But it looks like the Maverick version that Meta develops at LM Arena differs from the version that is widely available to developers.

As several All included researchers It is noted in X, Meta noted in its announcement that Maverick on LM Arena is a “experimental version of conversation”. A chart in Official Llama websiteMeanwhile, he reveals that Meta’s LM Arena test was conducted using “Llama 4 Maverick optimized for the conversation”.

As we have written before, for a variety of reasons, LM Arena was never the most reliable measure of the performance of an AI model. But AI companies have generally not adapted or otherwise adapted their models to score better at LM Arena-or have not admitted to do it at least.

The problem with adjusting a model to a reference point, withholding, and then releasing a “vanilla” variant of the same model is that it makes it difficult for developers to predict exactly how well the model will perform special frameworks. It is also misleading. Ideally, the reference points – sadly inadequate as they are – provide a snapshot of the forces and weaknesses of a model in a series of tasks.

Indeed, researchers in X have observed intense behavior differences From the State download Maverick compared to the model hosted at LM Arena. The LM Arena version appears to use a lot of emojis and give incredibly long answers.

Ok Llama 4 is Def a Littled cooked lol, what is this city yap pic.twitter.com/Y3GVHBVZ65

– Nathan Lambert (@natolambert) April 6 2025

For some reason, the Llama 4 model in the Arena uses much more emojis

together. AI, it looks better: pic.twitter.com/f74odx4zt

– Tech Dev notes (@Techdevnotes) April 6 2025

We have reached the Meta and the Chatbot Arena, the organization that maintains the LM Arena, for comments.

Benchmark bit Lama 4 Llama Metas misleading models points Postpone reference
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleWhat to know about Tiktok’s uncertain future in the US and people who want to buy it
Next Article Truecaller now has over 450m monthly active users
bhanuprakash.cg
techtost.com
  • Website

Related Posts

ChatGPT finally offers $100/month plan

10 April 2026

AWS boss explains why investing billions in both Anthropic and OpenAI is an okay conflict

9 April 2026

Poke makes using AI agents as easy as sending a text

9 April 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

What founders can learn from Anjuna’s layoffs and recovery

10 April 2026

Volkswagen is dropping the all-electric ID.4 in the U.S

10 April 2026

How to make the Startup Battlefield Top 20 — and what each company gets regardless

10 April 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Cash app launches ‘pay later’ feature for P2P transfers

3 April 2026

Doss raises $55 million for AI inventory management that connects to ERP

24 March 2026

Despite stiff competition, Kalshi, Polymarket CEOs back $35m VC fund projections

23 March 2026
Startups

What founders can learn from Anjuna’s layoffs and recovery

Former Tesla engineer’s startup taps Pronto to help automate a copper mine

Databricks co-founder wins prestigious ACM award, says ‘AGI is already here’

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.