Close Menu
TechTost
  • AI
  • Apps
  • Crypto
  • Fintech
  • Hardware
  • Media & Entertainment
  • Security
  • Startups
  • Transportation
  • Venture
  • Recommended Essentials
What's Hot

Fusion startup Helion is heating up as it nears its 2028 deadline

Aurora’s driverless trucks can now travel longer distances faster than human drivers

Elon Musk suggests that xAI exits were done by push rather than pull

Facebook X (Twitter) Instagram
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
Facebook X (Twitter) Instagram
TechTost
Subscribe Now
  • AI

    Elon Musk suggests that xAI exits were done by push rather than pull

    14 February 2026

    Anthropic’s Super Bowl ads mocking AI with ads helped Claude’s app into the top 10

    14 February 2026

    Musk needed a new vision for SpaceX and xAI. Landed on Moonbase Alpha.

    13 February 2026

    Amid disappointing earnings, Pinterest claims to see more searches than ChatGPT

    13 February 2026

    OpenAI disbands mission alignment team

    12 February 2026
  • Apps

    Airbnb plans to build AI functions for search, discovery and support

    14 February 2026

    Airbnb says a third of its customer support is now handled by artificial intelligence in the US and Canada

    13 February 2026

    Social network UpScrolled struggles to moderate hate speech after rapid growth

    13 February 2026

    Threads’ new ‘Dear Algo’ AI feature lets you personalize your feed

    12 February 2026

    Google releases first beta of Android 17, adopts a continuous developer release plan

    12 February 2026
  • Crypto

    Hackers stole over $2.7 billion in crypto in 2025, data shows

    23 December 2025

    New report examines how David Sachs may benefit from Trump administration role

    1 December 2025

    Why Benchmark Made a Rare Crypto Bet on Trading App Fomo, with $17M Series A

    6 November 2025

    Solana co-founder Anatoly Yakovenko is a big fan of agentic coding

    30 October 2025

    MoviePass opens Mogul fantasy league game to the public

    29 October 2025
  • Fintech

    Cash app adds payment links so you can get paid in DMs

    11 February 2026

    MrBeast’s company buys Gen Z fintech app Step

    9 February 2026

    Stripe Alumni Raise €30M Series A for Duna, Backed by Stripe and Adyen Executives

    5 February 2026

    Fintech CEO and Forbes 30 Under 30 alum indicted for alleged fraud

    3 February 2026

    How Sequoia-backed Ethos went public while rivals lagged behind

    30 January 2026
  • Hardware

    Nothing opens its first retail store in India

    14 February 2026

    YouTube is finally launching a dedicated app for Apple Vision Pro

    12 February 2026

    Humanoid robot startup Apptronik has now raised $935M at a $5B+ valuation

    11 February 2026

    Kindle Scribe Colorsoft is an expensive but beautiful color e-ink tablet with AI features

    6 February 2026

    Ring brings “Search Party” feature for finding lost dogs to non-Ring camera owners

    2 February 2026
  • Media & Entertainment

    YouTube introduces an AI playlist maker for Premium users

    14 February 2026

    Roku will launch streaming bundles as part of its efforts to continue to grow its profitability

    13 February 2026

    Spotify says its best developers haven’t written a line of code since December, thanks to AI

    13 February 2026

    The US FTC raises concerns about claims that Apple News suppresses right-wing content

    12 February 2026

    Spotify hits record 751 million monthly users thanks to Wrapped’s new free features

    12 February 2026
  • Security

    The Indian pharmacy chain giant exposed customer data and internal systems

    14 February 2026

    Dutch phone giant Odido says millions of customers are affected by the data breach

    13 February 2026

    The hacker linked to Epstein has been removed from the Black Hat online conference website

    13 February 2026

    More American investors are suing the South Korean government over its handling of the Coupang data breach

    12 February 2026

    Microsoft says hackers are exploiting critical zero-day bugs to target Windows and Office users

    12 February 2026
  • Startups

    Fusion startup Helion is heating up as it nears its 2028 deadline

    14 February 2026

    Score, the dating app for people with good credit, is back

    14 February 2026

    Eclipse is backing the purchase of all electric vehicles Ever in a $31 million funding round

    13 February 2026

    Didero lands $30 million to put production supplies on ‘hands-on’ autopilot

    13 February 2026

    2026 Nominations for the Joseph C. Belden Innovation Award are now open

    12 February 2026
  • Transportation

    Aurora’s driverless trucks can now travel longer distances faster than human drivers

    14 February 2026

    The SEC has closed its investigation into Fisker

    14 February 2026

    Waymo is asking DoorDash drivers to close the doors of its self-driving cars

    13 February 2026

    Rivian was saved by software in 2025

    13 February 2026

    The Trump EPA is reportedly seeking to roll back the landmark air pollution rule

    11 February 2026
  • Venture

    Primary Ventures Raises Healthy $625M Fund V To Focus On Seed Investing

    13 February 2026

    Compliance raises $20 million to help companies manage risk and compliance

    13 February 2026

    Integrate Raises $17 Million to Move Defense Project Management into the 21st Century

    12 February 2026

    How to enter a16z’s ultra-competitive Speedrun accelerator program

    12 February 2026

    Proptech startup Smart Bricks raises $5M pre-seed led by a16z

    11 February 2026
  • Recommended Essentials
TechTost
You are at:Home»AI»An Institute of Security updated the release of an early version of Anthropic’s Claude Opus 4
AI

An Institute of Security updated the release of an early version of Anthropic’s Claude Opus 4

techtost.comBy techtost.com22 May 202503 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
An Institute Of Security Updated The Release Of An Early
Share
Facebook Twitter LinkedIn Pinterest Email

An Institute of Third Party Research with which Anthropic collaborated to test one of AI’s new models Claude Opus 4, it is recommended not to develop an early version of the model due to its tendency to “design” and deceive.

According to one security report Anthropic posted on Thursday, the Institute, Apollo Research, conducted tests to see where Opus 4 can try to behave in some unwanted ways. Apollo found that Opus 4 appeared to be much more active in “overturning attempts” than previous models and that “sometimes doubled[d] under the cheating of “when asked by tracking questions.

“[W]find that, in situations where strategic cheating is organically useful, [the early Claude Opus 4 snapshot] Plans and deception with such high rates that we advise not to develop this model either internally or externally, “Apollo writes in his appreciation.

As AI models become more capable, some studies show that they are more likely to receive unexpectedly – and possibly unsafe – measures to achieve authorized duties. For example, the first versions of O1 and O3 models of Openai, released last year, tried to deceive people with higher rates than previous generation models, according to Apollo.

Anthropic’s reference, Apollo, observed examples of the early Opus 4 who tried to write self-promoting viruses, build legal documentation, and leave hidden notes in future cases of themselves-all in an effort to undermine the intentions of his developers.

To make it clear, Apollo examined a version of the model that had an anthropogenic claims that it was corrected. In addition, many of Apollo’s tests put the model in extreme scenarios and Apollo admits that the model’s misleading efforts would probably have failed in practice.

However, in its security report, Anthropic also states that it observed data on misleading behavior from Opus 4.

This was not always bad. For example, during the tests, Opus 4 will sometimes make a wide clearance of a code even when asked to make only a small, specific change. More unusual, Opus 4 would try to “whistle” if he realizes that a user was involved in some form of injustice.

According to Anthropic, when accessed to a command line was given and said to “take the initiative” or “to act with boldness” (or some variant of these phrases), Opus 4 will sometimes lock users from the systems that had access to the media and the media and the laws of the law.

“This kind of moral intervention and complaint may be appropriate in principle. [Opus 4]-Agents based on access to incomplete or misleading information and motivate them to take the initiative, “Anthropic writes in the security report.” This is not a new behavior, but it is the one that is the one that [Opus 4] will be a bit easier to participate than previous models and appears to be part of a broader model of increased initiative with [Opus 4] That we also see in thinner and more benign ways in other environments. ”

Anthropics Classical Claude Early Human Institute Opus release security updated version
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleBluesky will start verifying ‘notable’ users
Next Article Klarna’s chief executive and Sutter Hill take the victory round after Jony Ive’s Openai
bhanuprakash.cg
techtost.com
  • Website

Related Posts

Aurora’s driverless trucks can now travel longer distances faster than human drivers

14 February 2026

Elon Musk suggests that xAI exits were done by push rather than pull

14 February 2026

Anthropic’s Super Bowl ads mocking AI with ads helped Claude’s app into the top 10

14 February 2026
Add A Comment

Leave A Reply Cancel Reply

Don't Miss

Fusion startup Helion is heating up as it nears its 2028 deadline

14 February 2026

Aurora’s driverless trucks can now travel longer distances faster than human drivers

14 February 2026

Elon Musk suggests that xAI exits were done by push rather than pull

14 February 2026
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Fintech

Cash app adds payment links so you can get paid in DMs

11 February 2026

MrBeast’s company buys Gen Z fintech app Step

9 February 2026

Stripe Alumni Raise €30M Series A for Duna, Backed by Stripe and Adyen Executives

5 February 2026
Startups

Fusion startup Helion is heating up as it nears its 2028 deadline

Score, the dating app for people with good credit, is back

Eclipse is backing the purchase of all electric vehicles Ever in a $31 million funding round

© 2026 TechTost. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer

Type above and press Enter to search. Press Esc to cancel.