Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Jest, a marketplace for messaging games, is challenging the app store status quo

After Zomato, Deepinder Goyal is back with a $54 million brain-monitoring bet

Pentagon moves to designate Anthropic as a supply chain risk

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Pentagon moves to designate Anthropic as a supply chain risk

    28 February 2026

    Anthropic CEO stands firm as Pentagon deadline looms

    27 February 2026

    Jack Dorsey just halved the size of Block’s employee base — and he says your company is next

    27 February 2026

    Salesforce CEO Marc Benioff: This isn’t our first SaaSpocalypse

    26 February 2026

    Gushwork is betting on AI prospecting for leads — and the first results are showing

    26 February 2026
  • Apps

    Spotify releases audiobook maps

    28 February 2026

    Bumble adds AI photo feedback and profile guidance tools

    27 February 2026

    Threads is testing a shortcut to quickly start DM conversations

    27 February 2026

    Instagram now alerts parents if their teen is looking for suicide or self-harm content

    26 February 2026

    Snapchat announces ‘The Snappys’, its first creator awards show

    26 February 2026
  • Crypto

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025
  • Fintech

    3 days left: Save up to $680 on your ticket to Disrupt 2026

    25 February 2026

    More startups surpass $10M ARR in 3 months than ever before

    24 February 2026

    Stripe, PayPal Ventures Bet on India’s Xflow to Fix Cross-Border B2B Payments

    24 February 2026

    InScope raises $14.5M to solve financial reporting pain

    20 February 2026

    OpenAI deepens India push with Pine Labs fintech partnership

    19 February 2026
  • Hardware

    Last 24 hours to get Disrupt 2026 tickets at the lowest prices of the year

    27 February 2026

    Everything announced at Samsung’s Galaxy Unpacked event, including S26 smartphones, privacy screen and more

    26 February 2026

    Samsung introduces new display technology that adds a privacy screen to apps and notifications

    25 February 2026

    Oura launches a proprietary AI model focused on women’s health

    25 February 2026

    Spotify and Liquid Death are releasing a limited-edition speaker shaped like a … container?

    24 February 2026
  • Media & Entertainment

    Apple and Netflix team up to stream Formula 1 Canadian Grand Prix

    27 February 2026

    Netflix pulls out of bid for Warner Bros. Discovery, giving studios, HBO and CNN to Ellison-owned Paramount

    27 February 2026

    Book the best deals for Disrupt 2026 | TechCrunch

    26 February 2026

    Americans now listen to podcasts more often than talk radio, study shows

    25 February 2026

    Music producer ProducerAI joins Google Labs

    25 February 2026
  • Security

    CISA replaces deputy director after a difficult year on the job

    27 February 2026

    Cisco Says Hackers Are Exploiting Critical Flaw To Break Into Large Customer Networks By 2023

    26 February 2026

    US cybersecurity agency CISA reportedly in dire straits amid Trump cuts and layoffs

    26 February 2026

    Treasury sanctions Russian zero-day broker accused of buying holdings stolen from US defense contractor

    25 February 2026

    Former L3Harris Trenchant boss jailed for selling hacking tools to Russian broker

    25 February 2026
  • Startups

    Jest, a marketplace for messaging games, is challenging the app store status quo

    28 February 2026

    Superhuman bets on redesigned smart ring to win back US market after Oura controversy

    27 February 2026

    Trace raises $3 million to solve AI agent adoption in the enterprise

    27 February 2026

    How to avoid bad hires in early stage startups

    26 February 2026

    Apply to take the stage at Founder Summit 2026

    26 February 2026
  • Transportation

    Self-driving truck startup Einride raises $113M PIPE ahead of public debut

    27 February 2026

    It’s time to pull the plug on plug-in hybrids

    26 February 2026

    Harbinger acquires self-driving company Phantom AI

    26 February 2026

    Waymo robotaxis are now operating in 10 US cities

    25 February 2026

    Self-driving tech startup Wayve raises $1.2 billion from Nvidia, Uber and three automakers

    25 February 2026
  • Venture

    After Zomato, Deepinder Goyal is back with a $54 million brain-monitoring bet

    28 February 2026

    Dive into Boston’s startup ecosystem at Founder Summit 2026 | TechCrunch

    27 February 2026

    A VC and some big-name developers are trying to solve the open source funding problem, permanently

    27 February 2026

    Y Combinator grad and AI insurance brokerage Harper raises $47 million

    26 February 2026

    Anthropic acquires AI startup Vercept after Meta indicts one of its founders

    26 February 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Meta’s reference points for new AI models are a bit misleading
AI

Meta’s reference points for new AI models are a bit misleading

techtost.comBy techtost.com7 April 202502 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Meta's Reference Points For New Ai Models Are A Bit
Share
Facebook Twitter LinkedIn Pinterest Email

One of the new Meta Model Model released on Saturday, Maverick, Maverick, Ranked second place in LM ArenaA test that has human graduates compares the exits of the models and chooses who they prefer. But it looks like the Maverick version that Meta develops at LM Arena differs from the version that is widely available to developers.

As several All included researchers It is noted in X, Meta noted in its announcement that Maverick on LM Arena is a “experimental version of conversation”. A chart in Official Llama websiteMeanwhile, he reveals that Meta’s LM Arena test was conducted using “Llama 4 Maverick optimized for the conversation”.

As we have written before, for a variety of reasons, LM Arena was never the most reliable measure of the performance of an AI model. But AI companies have generally not adapted or otherwise adapted their models to score better at LM Arena-or have not admitted to do it at least.

The problem with adjusting a model to a reference point, withholding, and then releasing a “vanilla” variant of the same model is that it makes it difficult for developers to predict exactly how well the model will perform special frameworks. It is also misleading. Ideally, the reference points – sadly inadequate as they are – provide a snapshot of the forces and weaknesses of a model in a series of tasks.

Indeed, researchers in X have observed intense behavior differences From the State download Maverick compared to the model hosted at LM Arena. The LM Arena version appears to use a lot of emojis and give incredibly long answers.

Ok Llama 4 is Def a Littled cooked lol, what is this city yap pic.twitter.com/Y3GVHBVZ65

– Nathan Lambert (@natolambert) April 6 2025

For some reason, the Llama 4 model in the Arena uses much more emojis

together. AI, it looks better: pic.twitter.com/f74odx4zt

– Tech Dev notes (@Techdevnotes) April 6 2025

We have reached the Meta and the Chatbot Arena, the organization that maintains the LM Arena, for comments.

Benchmark bit Lama 4 Llama Metas misleading models points Postpone reference
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleWhat to know about Tiktok’s uncertain future in the US and people who want to buy it
Next Article Truecaller now has over 450m monthly active users
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Pentagon moves to designate Anthropic as a supply chain risk

28 February 2026

Anthropic CEO stands firm as Pentagon deadline looms

27 February 2026

Jack Dorsey just halved the size of Block’s employee base — and he says your company is next

27 February 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Jest, a marketplace for messaging games, is challenging the app store status quo

28 February 2026

After Zomato, Deepinder Goyal is back with a $54 million brain-monitoring bet

28 February 2026

Pentagon moves to designate Anthropic as a supply chain risk

28 February 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

3 days left: Save up to $680 on your ticket to Disrupt 2026

25 February 2026

More startups surpass $10M ARR in 3 months than ever before

24 February 2026

Stripe, PayPal Ventures Bet on India’s Xflow to Fix Cross-Border B2B Payments

24 February 2026
Startups

Jest, a marketplace for messaging games, is challenging the app store status quo

Superhuman bets on redesigned smart ring to win back US market after Oura controversy

Trace raises $3 million to solve AI agent adoption in the enterprise

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.