Speech Synthesis Models - VideoDraft AI

Overview

VideoDraft uses Google’s advanced Text-to-Speech technology to provide natural, expressive voices for your video narration. With support for multiple languages and voice styles, you can create professional voiceovers that connect with your audience.

Speech generation costs 1 credit per 100 characters. Visionary and Studio plans include unlimited speech generation!

Available Voices

English Voices

Alex (Male)

Voice ID: en-US-Wavenet-D

Natural American accent
Professional tone
Clear articulation
Versatile delivery

Listen to sample →

Maria (Female)

Voice ID: en-US-Wavenet-C

Warm, friendly tone
American accent
Engaging delivery
Perfect for education

Listen to sample →

International Voices

Spanish
Hindi
Telugu

Carlos (Male)

Voice ID: es-ES-Wavenet-B

Native Spanish speaker
Clear pronunciation
Professional tone

Listen →

Lucia (Female)

Voice ID: es-ES-Wavenet-C

Warm Spanish voice
Natural intonation
Engaging style

Listen →

Voice Technology

Google Wavenet

Technology: Advanced neural network synthesis

Natural prosody and intonation
Human-like voice quality
Contextual emphasis
Emotional expression

Google Neural2 & Chirp3-HD

Next-Generation Voices:

Enhanced naturalness
Better pronunciation
Improved clarity
Regional accents

Speech Synthesis Features

Speaking Styles

Conversational

Natural, relaxed delivery perfect for:

Educational content
Explainer videos
Casual presentations
Social media

Professional

Clear, authoritative tone ideal for:

Corporate videos
Product demos
Training materials
News updates

Storytelling

Expressive, engaging style for:

Narratives
Children’s content
Entertainment
Documentaries

Instructional

Clear, measured pace for:

Tutorials
How-to guides
Technical content
E-learning

Voice Parameters

Adjustable Settings:

Speaking Rate: 0.5x to 2.0x speed
Pitch: -20% to +20% adjustment
Volume Gain: -10dB to +10dB
Emphasis: Automatic or manual

Text Formatting for Speech

SSML Support

Use Speech Synthesis Markup Language for advanced control:

<speak>
  Welcome to VideoDraft. 
  <break time="500ms"/>
  Let's create <emphasis level="strong">amazing</emphasis> videos together.
  <prosody rate="slow">Take your time to explore.</prosody>
</speak>

Pronunciation Guide

Numbers and Dates:

“2024” → “twenty twenty-four”
“$99” → “ninety-nine dollars”
“3/4” → “three quarters”

Abbreviations:

“Dr.” → “Doctor”
“vs.” → “versus”
“etc.” → “et cetera”

Custom Pronunciation:

Use phonetic spelling in parentheses:
"CEO (see-ee-oh)" for letter-by-letter
"GIF (jiff)" for specific pronunciation

Credit Usage

Pricing Structure

Characters	Credits	Cost
100	1	Standard rate
1,000	10	~1 minute speech
5,000	50	~5 minutes speech
10,000	100	~10 minutes speech

Average speaking rate is 150-180 words per minute, approximately 900-1,100 characters.

Unlimited Plans

Visionary & Studio Plans Include:

✅ Unlimited speech generation
✅ All voice options
✅ No character limits
✅ Priority processing

Best Practices

Script Writing for TTS

Clarity
Pacing
Emphasis

Write for the ear, not the eye:

Use simple sentences
Avoid complex punctuation
Break up long thoughts
Use natural pauses

Example:

Instead of: "The product—which launched in 2023—has received numerous accolades."

Write: "The product launched in 2023. It has received numerous accolades."

Voice Selection Guide

Choose Based on Content:

Corporate/Professional

Alex (Male): Authority
Maria (Female): Approachable
Formal scripts
Clear delivery

Educational

Maria (Female): Friendly teacher
Alex (Male): Expert instructor
Clear explanations
Engaging tone

Marketing

Either voice works
Match brand personality
Enthusiastic delivery
Call-to-action focus

Storytelling

Choose by character
Match narrative tone
Expressive delivery
Emotional range

Common Issues

Pronunciation Problems

Solutions for common TTS issues:

Acronyms
- Write phonetically: “NASA (nah-sah)”
- Or spell out: “N-A-S-A”
Technical Terms
- Add pronunciation guides
- Use simpler alternatives
- Test before finalizing
Foreign Words
- Use phonetic spelling
- Or provide translation
- Test with target audience
Numbers
- Write as words for clarity
- “1st” → “first”
- “1,000” → “one thousand”

Quality Optimization

For Best Results:

Proofread carefully
Test short segments first
Listen to full narration
Make adjustments as needed

Multi-Language Projects

Creating Versions

Prepare Master Script

Write in primary language with clear structure

Professional Translation

Ensure cultural adaptation, not just literal translation

Select Native Voice

Choose voice that matches regional preferences

Test and Refine

Have native speakers review the output

Localization Tips

Adjust speaking pace for language
Consider cultural context
Maintain consistent tone
Verify technical terms

Advanced Techniques

Emotional Delivery

Conveying Emotion Through Text:

Short sentences = urgency
Long sentences = calm
Questions = engagement
Exclamations = excitement

Character Voices

Creating Distinction:

Use different voices for characters
Adjust speaking rate
Vary sentence structure
Add personality through word choice

Background Integration

Mixing with Music:

Keep narration clear
Allow musical breaks
Sync to rhythm when appropriate
Balance audio levels

Future Features

Coming soon:

Voice cloning (custom voices)
Emotional voice variants
More language options
Real-time preview
Advanced SSML editor

Next Steps

Test Voices

Try different voices in AI Studio

Script Guide

Learn script writing tips

Start Project

Create your first narrated video

Pro tip: Generate a 30-second test narration with your script to ensure the voice and pacing match your vision before creating the full video!

Get Started

Workflows

Features

Models & Pricing

Guides

​Overview

​Available Voices

​English Voices

Alex (Male)

Maria (Female)

​International Voices

Carlos (Male)

Lucia (Female)

Ravi (Male)

Sita (Female)

Raju (Male)

Lakshmi (Female)

​Voice Technology

​Google Wavenet

​Google Neural2 & Chirp3-HD

​Speech Synthesis Features

​Speaking Styles

​Voice Parameters

​Text Formatting for Speech

​SSML Support

​Pronunciation Guide

​Credit Usage

​Pricing Structure

​Unlimited Plans

​Best Practices

​Script Writing for TTS

​Voice Selection Guide

Corporate/Professional

Educational

Marketing

Storytelling

​Common Issues

​Pronunciation Problems

​Quality Optimization

​Multi-Language Projects

​Creating Versions

​Localization Tips

​Advanced Techniques

​Emotional Delivery

​Character Voices

​Background Integration

​Future Features

​Next Steps

Test Voices

Script Guide

Start Project

Overview

Available Voices

English Voices

International Voices

Voice Technology

Google Wavenet

Google Neural2 & Chirp3-HD

Speech Synthesis Features

Speaking Styles

Voice Parameters

Text Formatting for Speech

SSML Support

Pronunciation Guide

Credit Usage

Pricing Structure

Unlimited Plans

Best Practices

Script Writing for TTS

Voice Selection Guide

Common Issues

Pronunciation Problems

Quality Optimization

Multi-Language Projects

Creating Versions

Localization Tips

Advanced Techniques

Emotional Delivery

Character Voices

Background Integration

Future Features

Next Steps