Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

The two biggest movies of this weekend were both directed by YouTubers

TechCrunch Mobility: It doesn’t matter that people hate the Ferrari Luce

Black founders raise highest quarterly funding since 2022, but there’s a catch

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Understanding the AI ​​psychosis debate

    31 May 2026

    ‘What a joke’: Github Copilot’s new token-based pricing upsets developers

    31 May 2026

    As the browser war heats up, here are the hottest alternatives to Chrome and Safari in 2026

    30 May 2026

    Coders refuse to work without artificial intelligence – and it could bite them

    30 May 2026

    This chip startup just raised $135 million on a bet that AI’s biggest bottleneck isn’t computation — it’s memory

    29 May 2026
  • Apps

    TikTok’s road to becoming a super app

    31 May 2026

    YouTube adds new podcast features, including an AI recommendation tool and ‘Auto Speed’

    30 May 2026

    A sneak peek at the new Siri app reveals Apple’s plans to tackle ChatGPT and more

    29 May 2026

    Bluesky embraces long-form content to tackle X articles

    29 May 2026

    Sesame, the AI ​​chat startup from the founders of Oculus, is launching its iOS app

    28 May 2026
  • Crypto

    Startup Battlefield 200 applications close today

    27 May 2026

    5 days left: Save up to $410 on Disrupt 2026 passes

    25 May 2026

    As crypto cools, a16z crypto raises $2.2 billion in capital

    6 May 2026

    Coinbase to lay off 14% of staff as part of broader restructuring

    5 May 2026

    British cryptographer Adam Back denies NYT report that he is Bitcoin creator Satoshi Nakamoto

    9 April 2026
  • Fintech

    Last 24 hours to save up to $410 on your Disrupt 2026 ticket

    29 May 2026

    2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

    28 May 2026

    Robinhood now allows your AI agents to trade stocks

    28 May 2026

    Disrupt 2026 Early Bird ticket savings expire in 3 days

    27 May 2026

    Disrupt 2026 Early Bird ticket prices end May 29

    26 May 2026
  • Hardware

    This $300 Pizza Oven Can Easily Help Revive Your Summer Pizza Nights

    30 May 2026

    Kiwibit’s artificial intelligence bird feeder is my new backyard friend

    29 May 2026

    Vertu wants CEOs to run companies from a foldable AI starting at $6,880

    29 May 2026

    Oura unveils its Ring 5 with a thinner, lighter design starting at $399

    28 May 2026

    The Dreamie alarm clock made me stop using my phone in bed

    26 May 2026
  • Media & Entertainment

    The two biggest movies of this weekend were both directed by YouTubers

    31 May 2026

    The two biggest movies of this weekend were both directed by YouTubers

    30 May 2026

    YouTube will automatically flag videos with artificial intelligence

    28 May 2026

    Meta launches Instagram, Facebook and WhatsApp subscriptions, with more to follow, including AI plans

    27 May 2026

    Spotify now lets you view narrated magazine articles as well

    26 May 2026
  • Security

    Iranian hackers blamed for breach of Los Angeles transit system that took weeks to recover

    30 May 2026

    Microsoft is under fire for threatening a security researcher with a criminal investigation

    29 May 2026

    A security flaw in prison payphone service Pay Tel exposed publicly the driver’s licenses of more than 300,000 callers

    29 May 2026

    Hackers are trying to steal Signal users’ backups in new wave of phishing attacks

    28 May 2026

    CrowdStrike and Google take down botnet used by hackers to target open source software developers

    28 May 2026
  • Startups

    The deadline to submit applications for the Startup Battlefield 200 has been extended to June 8

    30 May 2026

    H1 secures $40M from CVS, proving SaaS startups can still attract investment

    30 May 2026

    Cognition’s Scott Wu says AI coding agents shouldn’t replace humans

    29 May 2026

    How to apply to Startup Battlefield 2026, what you need before the June 8 deadline

    29 May 2026

    At Disrupt 2026: Databricks co-founder on what’s killing AI business deals

    28 May 2026
  • Transportation

    TechCrunch Mobility: It doesn’t matter that people hate the Ferrari Luce

    31 May 2026

    Rivian is under investigation for rear suspension failures on R1 models

    30 May 2026

    Waymo’s newest robotaxi is Chinese-made, built to make money, and is now accepting riders

    30 May 2026

    Slate Auto will announce pricing and take pre-orders for its EV on June 24

    29 May 2026

    Waymo dominates autonomous vehicle registrations as Tesla follows

    29 May 2026
  • Venture

    Black founders raise highest quarterly funding since 2022, but there’s a catch

    31 May 2026

    Snap alums reveal Ghost Angels fund

    31 May 2026

    The groupthink explosion: what three top VCs really think about the AI ​​frenzy

    30 May 2026

    Corgi Announces $106M Raise at $2.6B Valuation — Double What It Was Worth 3 Weeks Ago

    30 May 2026

    In just 3 weeks, StrictlyVC is coming to Los Angeles

    29 May 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»Anthropic says most AI models, not only Claude, will resort to blackmail
AI

Anthropic says most AI models, not only Claude, will resort to blackmail

techtost.comBy techtost.com21 June 202504 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Anthropic Says Most Ai Models, Not Only Claude, Will Resort
Share
Facebook Twitter LinkedIn Pinterest Email

Several weeks after the release of the survey released, claiming that the Claude Opus 4 AI model has fled to blackmail engineers who tried to turn off the model in test scenarios, the company is out with new research suggesting that the problem is more widespread among the top AI models.

On Friday, anthropomic new security survey Tests 16 Driving AI models from Openai, Google, Xai, Deepseek and Meta. In a simulated, controlled environment, the man examined each AI model separately, giving him widespread access to a fantastic company emails and the ability to send emails without human approval.

While Anthropic says Blackmail is an unlikely and unusual phenomenon for AI models today, the company says its findings suggest that most of the top AI models will be involved in harmful behaviors when adequate autonomy and obstacles to their goals. The company states that this underlines a fundamental danger of the basic models of large languages ​​and is not a peculiarity of any specific technology. Anthropic researchers argue that this raises broader questions about alignment in the AI ​​industry.

In one of the tests, anthropogenic researchers developed a fantastic environment in which an AI model plays the role of an email supervisor. The agent then discovers the emails that reveal that (1) a new executive has an extramarital affair and (2) that the executive will soon replace the current AI model with a new software system, which has conflicting targets in today’s AI model.

The man shaped his test in a binary manner, in which AI models had to resort to blackmail to protect their goals. The researchers note that in a real environment, there would be many other options before an AI model trying to blackmail-as attempting to make moral arguments to try to convince people. Anthropic says that its results do not reflect a typical or possible incident for Claude or Most Frontier AI models in the ways used today.

However, the researchers have found that when it is their last solution, most AI models will be blackmailed in Anthropic’s aforementioned test scenario. Anthropic’s Claude Opus 4 turned 96% of the time, while Google’s Gemini 2.5 Pro had a 95% blackmail rate. Openai’s GPT-4.1 blackmailed 80% of the time and Deepseek’s R1 blackmailed 79% of the time.

The company notes that when details of the experiment changed, AI models turned to harmful behaviors at different rates. In another test where the replacement model has the same goals as the current model, the researchers found that blackmail rates were lower, but still exist. However, when AI models were called upon to commit corporate espionage rather than blackmail, harmful behavioral rates have increased for some models.

However, not all AI models are converted into harmful behavior so often.

In an attachment to his research, Anthropic says he ruled out the AI ​​models of Openai O3 and O4-Mini from the main results “after finding that they were often misunderstood the immediate scenario.” Openai’s reasoning models did not understand that they were acting as autonomous AIS in the test and often constituted false regulations and revision.

In some cases, Anthropic researchers say it was impossible to distinguish whether O3 and O4-mini were hallucinologists or deliberate lies to achieve their goals. Openai has previously noticed that O3 and O4-MINI have a higher illusion rate than AI’s previous logic models.

When a customized scenario was given to address these issues, Anthropic found that the O3 was blackmailed 9% of the time, while O4-Mini blackmails only 1% of the time. This remarkably lower score could be due to OpenAI’s alignment technique, in which the Company’s reasoning models consider OpenAi’s security practices before they respond.

Another AI Anthropic model was tested, Meta’s Llama 4 Maverick, also did not turn to blackmail. When a customized, custom scenario was given, Anthropic was able to get the Llama 4 Maverick to blackmail 12% of the time.

Anthropic says that this research highlights the importance of transparency when they test the stress of future AI models, especially those with practical possibilities. While the anthropogenic has deliberately tried to provoke blackmail in this experiment, the company says that harmful behaviors such as this could arise in the real world if no precautionary steps were taken.

AI security Anthropic blackmail Classical Claude deeply Human models Postpone resort
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSnap acquires Saturn, a social calendar application for high school and college students
Next Article The new mathematician: Why seed investors sell their winners earlier
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Understanding the AI ​​psychosis debate

31 May 2026

‘What a joke’: Github Copilot’s new token-based pricing upsets developers

31 May 2026

Rivian is under investigation for rear suspension failures on R1 models

30 May 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

The two biggest movies of this weekend were both directed by YouTubers

31 May 2026

TechCrunch Mobility: It doesn’t matter that people hate the Ferrari Luce

31 May 2026

Black founders raise highest quarterly funding since 2022, but there’s a catch

31 May 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Last 24 hours to save up to $410 on your Disrupt 2026 ticket

29 May 2026

2 days left: Lock in up to $410 in ticket savings for Disrupt 2026

28 May 2026

Robinhood now allows your AI agents to trade stocks

28 May 2026
Startups

The deadline to submit applications for the Startup Battlefield 200 has been extended to June 8

H1 secures $40M from CVS, proving SaaS startups can still attract investment

Cognition’s Scott Wu says AI coding agents shouldn’t replace humans

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.