Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

The man has announced new possibilities This will allow some of its newer, larger models to terminate conversations in what the company describes as “rare, extreme cases of persistently harmful or abusive user interactions”. Impressively, Anthropic says he does not protect the human user, but the AI model itself.

To make it clear, the company does not claim that Claude AI models are feeling or can harm their conversations with users. In his own words, the humanity remains “very uncertain about the possible moral condition of Claude and others LLM, now or in the future”.

However, his announcement points out a recent program created to study what he calls “model prosperity” and says that Anthropic is essentially taking a just-in-case approach, “working to identify and implement low-cost interventions to alleviate the risk of the model”.

This last change is currently limited to Claude Opus 4 and 4.1. Again, it is assumed that it will only occur in “extreme cases of limbs”, such as “requests from users for sexual content that includes minors and efforts to request information that would allow for violence on large -scale or acts of terrorism”.

While these types of applications could possibly create legal or public problems for humanity itself (witnesses recent reports on how Chatgpt may potentially enhance or contribute to the delusional thinking of its users), the company says that during the test before the installation, Claude 4 and a model of obvious dysfunction “when he did.

As for these new opportunities that the conversations end, the company says: “In all cases, Claude is only to use the ability that ends the discussion as a last resort when multiple redirect efforts have been exhausted and the hope of a productive interaction is exhausted or when a user is explicitly demanding.

Anthropic also says that Claude has “he is directed not to use this ability in cases where users may be at impending risk of harming themselves or others”.

TechCrunch event

Francisco
|
27-29 October 2025

When Claude ends a discussion, Anthropic says users will still be able to start new conversations from the same account and create new branches of the annoying conversation by editing their answers.

“We treat this feature as an ongoing experiment and will continue to improve our approach,” the company says.

What's Hot

Anthropic says some Claude models can now end ‘harmful or abusive’ conversations

Related Posts

Leave A Reply Cancel Reply