Skip to main content

Overview

Upload your audio narration. AI transcribes it, creates matching visuals, and syncs everything perfectly.

How It Works

1

Upload Audio

MP3, WAV, or M4A (max 500MB)
2

AI Transcribes

Speech-to-text with timing
3

Scene Detection

AI identifies natural breaks
4

Generate Visuals

Matched to narration content
5

Sync & Export

Perfectly timed video output

Recording Tips

Audio quality:
  • Use a decent microphone
  • Record in a quiet room
  • Maintain 6-12 inches from mic
For better scene detection:
  • Pause 1-2 seconds between topics
  • Use transition phrases (“Next…”, “Now…”)
  • Speak clearly at moderate pace

Supported Formats

FormatNotes
MP3Recommended
WAVHighest quality
M4AApple format
Max size500MB

Credit Cost

3-minute voiceover (typical):
Transcription: 5 credits
15 images: 15-60 credits
4 video clips: 100 credits
Total: ~120-165 credits
No credits for using your own voice - only visual generation.

Tips

  1. Review transcription - Fix errors before generating visuals
  2. Add pauses - Natural breaks create better scene splits
  3. Test short first - Generate 30 seconds to check style
  4. Use keywords - Descriptive language helps visual matching