🤖 AI / ML

关于 AI 图像生成基准测试的思考WHY ARE YOU LIKE THIS

simonwillison.net·2026-04-25 节选正文

针对 pelican riding a bicycle 基准测试的讨论，有人建议在现有测试基础上增加更多样化的测试用例。文章展示了 AI 生成的图像：一只鹈鹕骑着自行车沿土路行驶，后面跟着一辆警车，鹈鹕看起来很惊恐，可能是因为一个宇航员（奇怪地长着可抓握脚趾）也在骑行。这个例子引发了关于 AI 图像生成能力和测试方法的深入讨论。

阅读原文

Simon Willison

25th April 2026

@scottjla on Twitter in reply to my pelican riding a bicycle benchmark:

I feel like we need to stack these tests now

I checked to confirm that the model (ChatGPT Images 2.0) added the "WHY ARE YOU LIKE THIS" sign of its own accord and it did - the prompt Scott used was:

Create an image of a horse riding an astronaut, where the astronaut is riding a pelican that is riding a bicycle. It looks very chaotic but they all just manage to balance on top of each other

Posted 25th April 2026 at 4:44 pm

关于 AI 图像生成基准测试的思考WHY ARE YOU LIKE THIS

Recent articles

Monthly briefing