🤖 AI / ML

使用 DSPy 评估和改进 Datasette Agent 的 SQL 系统提示词Using DSPy to evaluate and improve Datasette Agent's SQL system prompts

simonwillison.net·2026-07-02 节选正文

DSPy 框架能有效评估和优化大语言模型应用中的系统提示词。作者利用 DSPy 测试并改进了 Datasette Agent 的 SQL 生成提示词，以提升其数据库查询的准确性。该方案展示了如何通过自动化程序化手段替代传统的人工试错。这为开发者提供了一种高效的提示词工程优化范式。

阅读原文

Simon Willison

Research Using DSPy to evaluate and improve Datasette Agent's SQL system prompts — Leveraging the DSPy framework, this project evaluates and refines the core production system prompts used by Datasette Agent’s read-only SQL question answerer. The methodology involves a harness where DSPy agents invoke Datasette Agent’s actual tool implementations and prompts against a live in-process Datasette, and a gold-standard, auto-generated dataset provides rigorous evaluation via custom metrics.

One of this morning's AIE keynotes covered dspy, which reminded me I've been meaning to see if it could help me improve the system prompt used by Datasette Agent - so I fired off an asynchronous research task in Claude Code for web using Claude Fable 5:

Pip install the latest Datasette alpha and datasette-agent and dspy - then figure out how to use dspy to evaluate and improve the main system prompts used by Datasette Agent for the feature where it can execute read only SQL queries to answer user questions about data.

Fable chose to test using GPT 4.1 mini and nano, and identified several promising looking directions for improvements. I particularly like this one:

The schema listing gives only table names; the "don't call describe_table if you already have the information" advice caused column-name guessing (page_count, o.order_id, first_name) and error-retry loops in baseline traces. Either include column names in the prompt's schema listing or soften that advice.

需要完整排版与评论请前往来源站点阅读。