Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Netflix pulls out of bid for Warner Bros. Discovery, giving studios, HBO and CNN to Ellison-owned Paramount

Trace raises $3 million to solve AI agent adoption in the enterprise

Self-driving truck startup Einride raises $113M PIPE ahead of public debut

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Jack Dorsey just halved the size of Block’s employee base — and he says your company is next

    27 February 2026

    Salesforce CEO Marc Benioff: This isn’t our first SaaSpocalypse

    26 February 2026

    Gushwork is betting on AI prospecting for leads — and the first results are showing

    26 February 2026

    India’s AI boom prompts companies to trade short-term revenue for users

    25 February 2026

    Spanish ‘soonicorn’ Multiverse Computing releases free compressed AI model

    25 February 2026
  • Apps

    Threads is testing a shortcut to quickly start DM conversations

    27 February 2026

    Instagram now alerts parents if their teen is looking for suicide or self-harm content

    26 February 2026

    Snapchat announces ‘The Snappys’, its first creator awards show

    26 February 2026

    Discord delays global rollout of age verification after backlash

    25 February 2026

    Apple launches age verification tools worldwide to comply with growing child safety laws on the web

    25 February 2026
  • Crypto

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025
  • Fintech

    3 days left: Save up to $680 on your ticket to Disrupt 2026

    25 February 2026

    More startups surpass $10M ARR in 3 months than ever before

    24 February 2026

    Stripe, PayPal Ventures Bet on India’s Xflow to Fix Cross-Border B2B Payments

    24 February 2026

    InScope raises $14.5M to solve financial reporting pain

    20 February 2026

    OpenAI deepens India push with Pine Labs fintech partnership

    19 February 2026
  • Hardware

    Everything announced at Samsung’s Galaxy Unpacked event, including S26 smartphones, privacy screen and more

    26 February 2026

    Samsung introduces new display technology that adds a privacy screen to apps and notifications

    25 February 2026

    Oura launches a proprietary AI model focused on women’s health

    25 February 2026

    Spotify and Liquid Death are releasing a limited-edition speaker shaped like a … container?

    24 February 2026

    5 days left to lock in the lowest Disrupt 2026 rates

    23 February 2026
  • Media & Entertainment

    Netflix pulls out of bid for Warner Bros. Discovery, giving studios, HBO and CNN to Ellison-owned Paramount

    27 February 2026

    Book the best deals for Disrupt 2026 | TechCrunch

    26 February 2026

    Americans now listen to podcasts more often than talk radio, study shows

    25 February 2026

    Music producer ProducerAI joins Google Labs

    25 February 2026

    YouTube boosts its $7.99/month Lite subscription with offline downloads and background playback

    24 February 2026
  • Security

    Cisco Says Hackers Are Exploiting Critical Flaw To Break Into Large Customer Networks By 2023

    26 February 2026

    US cybersecurity agency CISA reportedly in dire straits amid Trump cuts and layoffs

    26 February 2026

    Treasury sanctions Russian zero-day broker accused of buying holdings stolen from US defense contractor

    25 February 2026

    Former L3Harris Trenchant boss jailed for selling hacking tools to Russian broker

    25 February 2026

    Marquis Sues Firewall Provider SonicWall, Claims Security Flaws With Firewall Backup Led To Ransomware Attack

    24 February 2026
  • Startups

    Trace raises $3 million to solve AI agent adoption in the enterprise

    27 February 2026

    How to avoid bad hires in early stage startups

    26 February 2026

    Apply to take the stage at Founder Summit 2026

    26 February 2026

    Ukrainian startups continue to build | TechCrunch

    25 February 2026

    Particle’s AI news app listens to podcasts for interesting clips so you don’t have to

    24 February 2026
  • Transportation

    Self-driving truck startup Einride raises $113M PIPE ahead of public debut

    27 February 2026

    It’s time to pull the plug on plug-in hybrids

    26 February 2026

    Harbinger acquires self-driving company Phantom AI

    26 February 2026

    Waymo robotaxis are now operating in 10 US cities

    25 February 2026

    Self-driving tech startup Wayve raises $1.2 billion from Nvidia, Uber and three automakers

    25 February 2026
  • Venture

    A VC and some big-name developers are trying to solve the open source funding problem, permanently

    27 February 2026

    Y Combinator grad and AI insurance brokerage Harper raises $47 million

    26 February 2026

    Anthropic acquires AI startup Vercept after Meta indicts one of its founders

    26 February 2026

    Last 4 days to save up to $680 on your Disrupt 2026 Pass

    25 February 2026

    Quantonation’s second fund of double size shows that quantum still has believers

    23 February 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Anthropic wants to fund a new, more comprehensive generation of AI benchmarks
AI

Anthropic wants to fund a new, more comprehensive generation of AI benchmarks

techtost.comBy techtost.com2 July 202404 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Anthropic Wants To Fund A New, More Comprehensive Generation Of
Share
Facebook Twitter LinkedIn Pinterest Email

Anthropic is launching one program to fund the development of new types of benchmarks capable of evaluating the performance and impact of AI models, including production models like Claude’s.

Anthropic’s program, unveiled Monday, will make payments to third-party organizations that can, as the company puts it in a blog post, “effectively measure advanced capabilities in artificial intelligence models.” Interested parties may submit applications for evaluation on a rolling basis.

“Our investment in these assessments is intended to elevate the entire field of AI security, providing valuable tools that benefit the entire ecosystem,” Anthropic wrote on its official blog. “Developing high-quality, safety-relevant assessments remains a challenge, and demand outstrips supply.”

As we’ve pointed out before, AI has a benchmarking problem. The most commonly cited AI benchmarks today do a poor job of capturing how the average human actually uses the systems under review. There are also questions about whether some benchmarks, particularly those released before the dawn of modern genetic artificial intelligence, even measure what they are supposed to measure, given their age.

The very high-level, harder-than-it-sounds solution proposed by Anthropic creates challenging benchmarks with an emphasis on AI security and social impact through new tools, infrastructure and methods.

The company specifically requests tests that assess a model’s ability to perform tasks such as carrying out cyber attacks, “enhancing” weapons of mass destruction (e.g. nuclear weapons), and manipulating or deceiving people (e.g. via deepfakes or disinformation). For AI risks related to national security and defense, Anthropic says it’s committed to developing some kind of “early warning system” to identify and assess risks, though it didn’t reveal in the blog post what it might to imply such a system.

Anthropic also says it intends its new program to support benchmark research and “end-to-end” work that explores the potential of artificial intelligence to aid scientific study, converse in multiple languages, and mitigate entrenched biases. as well as toxicity self-censoring.

To achieve all this, Anthropic envisions new platforms that allow subject matter experts to develop their own assessments and large-scale model tests involving “thousands” of users. The company says it has hired a full-time coordinator for the program and may buy or expand projects it believes have the potential to scale.

“We offer a range of financing options tailored to the needs and stage of each project,” Anthropic writes in the post, though an Anthropic spokesperson declined to elaborate on those options. “Teams will have the opportunity to interact directly with Anthropic domain experts from the frontier red team, detail, trust and security and other relevant teams.”

Anthropic’s effort to support new AI benchmarks is commendable — assuming, of course, that there’s enough cash and manpower behind it. But given the company’s commercial ambitions in the AI ​​race, it may be hard to fully trust.

In the blog post, Anthropic is rather transparent about the fact that it wants some of the assessments it funds to align with AI Security Classifications the developed (with some input from third parties, such as the non-profit AI research organization METR). This is within the company’s prerogative. But it may also force applicants to the program to accept definitions of “safe” or “dangerous” AI with which they may not agree.

A portion of the AI ​​community is also likely to take issue with Anthropic’s references to “catastrophic” and “misleading” AI risks, such as the dangers of nuclear weapons. Many experts let’s just say there’s little evidence to suggest that AI as we know it will achieve global human-surpassing capabilities anytime soon, if ever. Claims of impending “superintelligence” only serve to draw attention away from pressing AI regulatory issues of the day, such as AI’s hallucinatory tendencies, these experts add.

In its post, Anthropic writes that it hopes its program will serve as a “catalyst for progress toward a future where comprehensive AI assessment is an industry standard.” This is a mission that many have opened, corporate-unaffiliated efforts to create better AI benchmarks can be identified. But it remains to be seen whether those efforts are willing to join forces with an AI vendor whose loyalty ultimately rests with shareholders.

All included Anthropic benchmarks comprehensive fund generation Humane reference points
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleApple is finally adding support for RCS in the latest iOS 18 beta
Next Article Japan’s SmartHR Raises $140M Series E as Strong Demand for HR Tech Boosts ARR to $100M
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Jack Dorsey just halved the size of Block’s employee base — and he says your company is next

27 February 2026

Salesforce CEO Marc Benioff: This isn’t our first SaaSpocalypse

26 February 2026

Anthropic acquires AI startup Vercept after Meta indicts one of its founders

26 February 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Netflix pulls out of bid for Warner Bros. Discovery, giving studios, HBO and CNN to Ellison-owned Paramount

27 February 2026

Trace raises $3 million to solve AI agent adoption in the enterprise

27 February 2026

Self-driving truck startup Einride raises $113M PIPE ahead of public debut

27 February 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

3 days left: Save up to $680 on your ticket to Disrupt 2026

25 February 2026

More startups surpass $10M ARR in 3 months than ever before

24 February 2026

Stripe, PayPal Ventures Bet on India’s Xflow to Fix Cross-Border B2B Payments

24 February 2026
Startups

Trace raises $3 million to solve AI agent adoption in the enterprise

How to avoid bad hires in early stage startups

Apply to take the stage at Founder Summit 2026

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.