🤖 AI / ML
Gemini 3.1 Flash TTS:谷歌发布支持提示词驱动的新文本转语音模型Gemini 3.1 Flash TTS
Google 发布了 Gemini 3.1 Flash TTS,这是一个全新的文本转语音(TTS)模型,可通过自然语言提示进行控制。该模型通过标准的 Gemini API 提供,使用 `gemini-3.1-flash-tts-preview` 作为模型 ID,但仅支持输出音频文件。其功能基于 AI 驱动的语音生成技术,适用于需要动态语音合成的应用场景。
Simon Willison
2026年4月15日 - Link Blog
Gemini 3.1 Flash TTS。谷歌今日发布了 Gemini 3.1 Flash TTS,这是一款新的文本转语音模型,支持通过提示词进行控制。
它通过标准的 Gemini API 提供,使用 gemini-3.1-flash-tts-preview 作为模型 ID,但仅能输出音频文件。
其提示指南令人惊讶,至少可以说如此。以下是他们生成几句话音频的示例提示:
# AUDIO PROFILE: Jaz R.
## "The Morning Hype"
## THE SCENE: The London Studio
It is 10:00 PM in a glass-walled studio overlooking the moonlit London skyline, but inside, it is blindingly bright. The red "ON AIR" tally light is blazing. Jaz is standing up, not sitting, bouncing on the balls of their heels to the rhythm of a thumping backing track. Their hands fly across the faders on a massive mixing desk. It is a chaotic, caffeine-fueled cockpit designed to wake up an entire nation.
### DIRECTOR'S NOTES
Style:
* The "Vocal Smile": You must hear the grin in the audio. The soft palate is always raised to keep the tone bright, sunny, and explicitly inviting.
* Dynamics: High projection without shouting. Punchy consonants and elongated vowels on excitement words (e.g., "Beauuutiful morning").
Pace: Speaks at an energetic pace, keeping up with the fast music. Speaks with A "bouncing" cadence. High-speed delivery with fluid transitions — no dead air, no gaps.
Accent: Jaz is from Brixton, London
### SAMPLE CONTEXT
Jaz is the industry standard for Top 40 radio, high-octane event promos, or any script that requires a charismatic Estuary accent and 11/10 infectious energy.
#### TRANSCRIPT
[excitedly] Yes, massive vibes in the studio! You are locked in and it is absolutely popping off in London right now. If you're stuck on the tube, or just sat there pretending to work... stop it. Seriously, I see you.
[shouting] Turn this up! We've got the project roadmap landing in three, two... let's go!我使用该示例提示得到的结果如下:
您的浏览器不支持 audio 元素。
然后我将其修改为“Jaz is from Newcastle”和“... requires a charismatic Newcastle accent”,并得到了这个结果:
您的浏览器不支持 audio 元素。
为了更全面地展示,这里还有埃克塞特(Exeter),德文郡(Devon)的发音:
您的浏览器不支持 audio 元素。
我让 Gemini 3.1 Pro 为我编写了这个 UI 来试用它:
需要完整排版与评论请前往来源站点阅读。