Back to pipelines

Audio Generator

Turn text into natural speech with dozens of voices, 70+ languages, and custom style directions for tone, accent, and pacing.

0.6
AI
Auto
AI picks the best voice based on your instructions and text
Or pick a specific voice...
Zephyr
F · Bright, Enthusiastic
Puck
M · Upbeat, Casual
Charon
M · Deep, Calm
Kore
F · Firm, Clear
Fenrir
M · Warm, Friendly
Leda
F · Youthful, Energetic
Orus
M · Firm, Casual
Aoede
F · Breezy, Professional
Callirrhoe
F · Easy-going, Friendly
Autonoe
F · Bright, Warm
Enceladus
M · Breathy, Confident
Iapetus
M · Clear, Professional
Umbriel
M · Easy-going, Resonant
Algieba
M · Smooth, Warm
Despina
F · Smooth, Warm
Erinome
F · Clear, Sophisticated
Algenib
M · Gravelly, Calm
Rasalgethi
M · Informative, Energetic
Laomedeia
F · Upbeat, Approachable
Achernar
F · Soft, Inviting
Alnilam
M · Firm, Optimistic
Schedar
M · Even, Casual
Gacrux
F · Mature, Engaging
Pulcherrima
M · Forward, Youthful
Achird
M · Friendly, Articulate
Zubenelgenubi
M · Casual, Deep
Vindemiatrix
F · Gentle, Smooth
Sadachbia
M · Lively, Resonant
Sadaltager
M · Knowledgeable, Calm
Sulafat
F · Warm, Enthusiastic

Best for

  • Narration and voiceover in 70+ languages
  • Text-to-speech with natural language style control (e.g. 'speak warmly', 'whisper', 'excited tone')
  • Multi-voice content with dozens of expressive voices
  • Multilingual scripts — the language is auto-detected from the text

Tips

  • Write narration in short, clear sentences for better pacing
  • Use punctuation to control pauses: periods for full stops, commas for brief pauses, ellipsis for long pauses
  • Leave the voice on Auto and the AI picks one that matches the style instructions, or pin a specific voice if you want a recurring character
  • Use natural language style hints: 'speak slowly and dramatically', 'cheerful and upbeat', 'calm and reassuring'

AI Text-to-Speech Generator

Turn any text into natural-sounding speech in dozens of voices and 70+ languages. Control tone, accent, pacing, and emotion through plain-language instructions — production-ready voiceover in seconds.

What you can make

  • Documentary narration — authoritative, professional voices
  • Podcast intros, outros, dialogue — multi-character delivery
  • Multilingual voiceovers — cover the same script in 70+ languages
  • Audiobook narration — emotional pacing with mid-sentence direction
  • Accessibility — make text content available as audio

How it works

  1. Paste your text — what you want spoken
  2. Pick a voice or leave on Auto — Auto picks the best voice based on your text and instructions
  3. Add a style hint (optional) — describe accent, pace, emotion: "warm British accent, slow pace with dramatic pauses"
  4. Pick a model — Flash TTS for fast drafts, Pro TTS for studio-grade prosody on final delivery

Style tips

  • Write narration in short, clear sentences for better pacing
  • Use punctuation to control pauses — periods for full stops, commas for brief pauses, ellipsis for long pauses
  • Style hints take plain English: "speak slowly and dramatically," "cheerful and upbeat," "calm and reassuring"

Frequently Asked Questions

What voices are available?
Dozens of expressive voices spanning bright, deep, warm, firm, and dramatic characters. Each responds to plain-language style instructions for tone and delivery.
What languages are supported?
70+ languages with automatic detection. From major languages like English, Chinese, Japanese, Spanish to regional languages including Cebuano, Konkani, and Luxembourgish.
How do instructions work?
Describe how you want the voice to sound in plain English. For example: 'A 40-year-old British journalist, speaking slowly with dramatic pauses on key points.' AI interprets your direction and crafts the vocal performance.
What audio format is the output?
MP3, ready to download or chain into pipelines like Video Reel as a narration track.
Which model should I pick?
Flash TTS is the cheaper, faster path — great for drafts, short clips, and iteration. Pro TTS uses a higher-fidelity model with stronger prosody and pacing — pick it for production narration. Both render in seconds; Pro takes a beat longer.
How many credits does it cost?
0.6 credits for Flash TTS, 1.2 credits for Pro TTS. Results are ready in seconds.

Explore more pipelines

See all →
Video Generator
12–96
Video Generator
Image Generator
0.5–7
Image Generator
Music Generator
from 3
Music Generator
Video Editor
8
Video Editor