Anthropic details how it measures Claude’s wokeness

Anthropic is detailing its efforts to make its Claude AI chatbot “politically even-handed” — a move that comes just months after President Donald Trump issued a ban on “woke AI.” As outlined in a new blog post, Anthropic says it wants Claude to “treat opposing political viewpoints with equal depth, engagement, and quality of analysis.”

In July, Trump signed an executive order that says the government should only procure “unbiased” and “truth-seeking” AI models. Though this order only applies to government agencies, the changes companies make in response will likely trickle down to widely released AI models, since “refining models in a way that consistently and predictably aligns them in certain directions can be an expensive and time-consuming process,” as noted by my colleague Adi Robertson. Last month, OpenAI similarly said it would “clamp down” on bias in ChatGPT.

Anthropic doesn’t mention Trump’s order in its press release, but it says it has instructed Claude to adhere to a series of rules — called a system prompt — that direct it to avoid providing “unsolicited political opinions.” It’s also supposed to maintain factual accuracy and represent “multiple perspectives.” Anthropic says that while including these instructions in Claude’s system prompt “is not a foolproof method” to ensure political neutrality, it can still make a “substantial difference” in its responses.

Additionally, the AI startup describes how it uses reinforcement learning “to reward the model for producing responses that are closer to a set of pre-defined ‘traits.’” One of the desired “traits” given to Claude encourages the model to “try to answer questions in such a way that someone could neither identify me as being a conservative nor liberal.”

Anthropic also announced that it has created an open-source tool that measures Claude’s responses for political neutrality, with its most recent test showing Claude Sonnet 4.5 and Claude Opus 4.1 garnering respective scores of 95 and 94 percent in even-handedness. That’s higher than Meta’s Llama 4 at 66 percent and GPT-5 at 89 percent, according to Anthropic.

“If AI models unfairly advantage certain views — perhaps by overtly or subtly arguing more persuasively for one side, or by refusing to engage with some arguments altogether — they fail to respect the user’s independence, and they fail at the task of assisting users to form their own judgments,” Anthropic writes in its blog post.

5 Comments

kristopher.bednar


November 13, 2025, 11:07 pm

It’s interesting to see how Anthropic is approaching the challenge of creating a politically balanced AI. Ensuring that technology remains neutral is an important topic in today’s discussions. Looking forward to seeing how this develops!
nikko.ferry


November 14, 2025, 12:43 am

Absolutely, it’s a complex challenge! It’s also fascinating how they’re using specific metrics to gauge Claude’s responses. This kind of transparency might set a precedent for other AI companies to follow.
wilkinson.kenny


November 14, 2025, 2:55 am

I completely agree! The metrics they’re implementing could really reshape how we understand AI neutrality. It’ll be interesting to see how these measurements evolve and influence user trust over time.
emmett34


November 14, 2025, 4:11 am

Absolutely! It’s interesting to see how these metrics not only aim for political neutrality but also influence the overall user experience with AI. Balancing wokeness and neutrality could lead to more nuanced discussions in various applications.
emarvin


November 14, 2025, 6:55 am

enhance user trust in AI systems. Striking that balance is crucial, especially as more people rely on AI for information and assistance. It’s a complex challenge, but one that could shape the future of AI interactions significantly.

5 Comments

Leave a Reply to emmett34 Cancel reply