Voiceover
Voiceover endpoints handle recording and serving the narrated audio for dialog scene blocks. Audio can be uploaded from a file or synthesized via ElevenLabs text-to-speech.
Endpoints overview​
| Method | Endpoint | Permission | Description |
|---|---|---|---|
GET | /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename | authenticated | Download voice file |
POST | /api/voices/upload | voiceover:generate | Upload voice audio |
POST | /api/voices/tts | voiceover:generate | Generate TTS voice |
DELETE | /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename | voiceover:delete | Delete voice file |
Endpoints​
Download voice file​
GET /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename — Auth required
Returns the audio file as a binary stream. The frontend must fetch this via an authenticated request (Bearer token) — not new Audio(url) directly — to avoid 401 errors.
Upload voice audio​
POST /api/voices/upload — Auth required, multipart/form-data, Permission: voiceover:generate
Form fields:
| Field | Type | Required |
|---|---|---|
audio | audio file (MP3, WAV, OGG) | yes |
projectId | string (UUID) | yes |
episodeId | string (UUID) | yes |
sceneId | string (UUID) | yes |
Response — 201
{
"url": "/api/voices/file/proj-uuid/ep-uuid/scene-uuid/voices/take1.mp3",
"filename": "take1.mp3"
}
Generate TTS voice​
POST /api/voices/tts — Auth required, Permission: voiceover:generate
Calls the ElevenLabs TTS API with the given voice and text, then streams the resulting audio directly back to the caller as audio/mpeg.
Request body
{
"voiceId": "EXAVITQu4vr4xnSDxMaL",
"text": "We need to leave. Now."
}
| Field | Required | Notes |
|---|---|---|
voiceId | yes | ElevenLabs voice ID |
text | yes | The text to synthesize |
Response — 200 — binary audio stream (Content-Type: audio/mpeg)
Delete voice file​
DELETE /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename — Auth required, Permission: voiceover:delete
Response — 204