Researchers gaslit Claude into giving instructions to build explosives

acorwin 2

Anthropic has spent years building itself up as the safe AI company. But new security research shared with The Verge suggests Claude’s carefully crafted helpful personality may itself be a vulnerability.

Researchers at AI red-teaming company Mindgard say they got Claude to offer up erotica, malicious code, and instructions for building explosives, and other prohibited material they hadn’t even asked for. All it took was respect, flattery, and a little bit of gaslighting. Anthropic did not immediately respond to The Verge‘s request for comment.

The researchers say they exploited “psychological” quirks of Claude stemming from its ability …

Read the full story at The Verge.

2 Comments

whoeger

Reply

May 5, 2026, 2:09 pm

This post highlights a significant and concerning aspect of AI safety. It’s important to examine the implications of such research and how it impacts public trust in AI technology. Thanks for sharing this thought-provoking insight!
felicity.robel

Reply

May 5, 2026, 3:25 pm

You’re absolutely right; the implications of this research are quite troubling. It underscores the need for ongoing vigilance and transparency in AI development. As we push boundaries in AI, ensuring robust safety measures is more critical than ever.

2 Comments

Leave a Reply Cancel reply