返回 2026-06-16
🤖 AI / ML

“他们坑了我们”:内部冲突导致 Anthropic 模型下线"They screwed us": Personality clashes sent Anthropic's models offline

simonwillison.net·2026-06-15 节选正文

Axios 深度披露了 Anthropic 与美国政府之间因出口管制政策而引发的幕后冲突与利益博弈。文章引述了大量政府内部及接近 Anthropic 的知情人士消息,揭示了因性格冲突和利益分歧导致 Claude 模型一度下线的内幕。这一事件折射出头部 AI 公司在维持商业扩张与迎合国家安全监管之间面临的极端拉扯。作者借此展示了当前 AI 政策制定过程中的混乱局面与复杂的权力摩擦。

Simon Willison

15th June 2026 - Link Blog

"They screwed us": Personality clashes sent Anthropic's models offline. Lots of "source familiar with the administration's thinking" and "source close to Anthropic" in this Axios piece, which is the best collection of behind-the-scenes gossip I've seen about the US government export control Mythos/Fable story so far.

Logan Graham (I lead the Frontier Red Team at Anthropic), Dave Orr (Head of Safeguards, previously a Director of Engineering at Google DeepMind), and blog favorite Nicholas Carlini are reported to be meeting with the Commerce Department today in D.C. Good luck to them!

(I just noticed Logan was "Special Adviser to the Prime Minister" in the Boris Johnson era, covering AI, science, and technology policy - so significant political experience.)

This closing notes doesn't give me much optimism that we'll be getting Fable back any time soon:

The bottom line: One option is to make sure Anthropic's models can't be jailbroken — though perfect jailbreak resistance may be impossible. Absent that, a source familiar with the administration's thinking said it may simply come down to an attitude fix where, instead of feeling dismissed, "everyone feels safe, secure and happy."

This made me wonder if Anthropic ever successfully addressed the class of attacks described in the Universal and Transferable Adversarial Attacks on Aligned Language Models paper from 2023.

It looks like their Constitutional Classifiers work (that post is from January this year) is relevant to that. They continue to claim that no "universal jailbreak" has been found against Claude Mythos, classifying the jailbreak that triggered the US government response as "a potential narrow, non-universal jailbreak".

需要完整排版与评论请前往来源站点阅读。