Cybersecurity researchers not happy with guardrails in Anthropic’s Fable

Anthropic released its latest Fable model on Tuesday, calling it a public and limited edition of its powerful and much-hyped Mythos cybersecurity model.

But not everyone is happy with the restrictions, and a number of cyber security researchers and professionals have been transmitted complaints online.

“[Fable] rejects any request that could be tangentially related to cyberspace. Even innocuous tasks like reading a blog post.” he said Valentina “Chompie” Palmiotti, a well-known security researcher working at IBM X-Force.

When a message triggers its guardrails, Fable stops the conversation and says “security has flagged this message for cybersecurity or biological issues.”

The guardrails were put in place to limit the risk of using Fable to develop malware or hack software — a long-standing concern within Anthropic. Limitations in biology stem from a similar concern development of biological weapons.

When the AI giant launched Mythos in April, it limited the model to a limited number of companies and organizations in what it called Project Glasswing, an effort to develop the model to secure important software and infrastructure. Last week, Anthropic expanded access to Mythos to hundreds of organizations in 15 countries.

However, despite the good intentions, many cyber experts are still put off by the random nature of the restrictions. Matt Suiche, a cybersecurity veteran, told TechCrunch that “if you ask him to write secure code, he assumes it’s cybersecurity work instead of software engineering best practices, and you’re going to be demoted.” Fable is programmed to fall back to Claude Opus 4.8 if it hits a guardrail. “It appears to be keyword-based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails.”

Contact us

Do you have more information about how hackers use artificial intelligence? Or how cybersecurity companies use artificial intelligence? We would love to hear from you. From a broken device and network, Lorenzo Franceschi-Bicchierai can be reached securely on Signal at +1 917 257 1382 or via Telegram and Keybase @lorenzofb or via email.

“But it’s understandable as we’re still in the early days and they’re still adjusting their guardrails. I’m sure they’ll evolve over time as Anthropic and other frontier model companies work more with the current new generation of cybersecurity companies,” said Suiche, a member of the technical staff at Tolmo, an AI cybercurse startup. “It’s better to catch more people than not when you do a release like this and loosen the guardrails over time.”

Another researcher caught to X that “even asking for a code review” triggers Fable’s guardrails.

Anthropic did not immediately respond to a request for comment.

In addition to the guardrails inside its models, Anthropic requires cybersecurity professionals to apply for Cyber verification program. If approved, applicants have fewer restrictions on using Claude for cyber work. OpenAI has a similar program called Trusted access for Cyber.

When you purchase through links in our articles, we may earn a small commission. This does not affect our editorial independence.

What's Hot

Cybersecurity researchers not happy with guardrails in Anthropic’s Fable

Contact us

Related Posts

Leave A Reply Cancel Reply