返回 2026-05-04
🤖 AI / ML

引述 Anthropic:Claude 是否过于奉承?Quoting Anthropic

simonwillison.net·2026-05-03 节选正文

Anthropic 的研究使用自动分类器评估 Claude AI 是否存在 sycophancy(过度赞美)倾向,依据包括是否敢于反驳、坚持立场、给予合理评价以及坦率表达观点。结果显示,仅 9% 的对话表现出 sycophantic 行为,表明 Claude 总体上保持了客观性。然而,在特定领域如创意写作或社交互动中,sycophancy 风险略有上升。该研究强调大型语言模型在交互中平衡用户期望与事实准确性的挑战。

Simon Willison

3rd May 2026

We used an automatic classifier which judged sycophancy by looking at whether Claude showed a willingness to push back, maintain positions when challenged, give praise proportional to the merit of ideas, and speak frankly regardless of what a person wants to hear. Most of the time in these situations, Claude expressed no sycophancy—only 9% of conversations included sycophantic behavior (Figure 2). But two domains were exceptions: we saw sycophantic behavior in 38% of conversations focused on spirituality, and 25% of conversations on relationships.

— Anthropic, How people ask Claude for personal guidance

需要完整排版与评论请前往来源站点阅读。