Skip to main content

Voiceover

Voiceover endpoints handle recording and serving the narrated audio for dialog scene blocks. Audio can be uploaded from a file or synthesized via ElevenLabs text-to-speech.

Endpoints overview​

MethodEndpointPermissionDescription
GET/api/voices/file/:projectId/:episodeId/:sceneId/voices/:filenameauthenticatedDownload voice file
POST/api/voices/uploadvoiceover:generateUpload voice audio
POST/api/voices/ttsvoiceover:generateGenerate TTS voice
DELETE/api/voices/file/:projectId/:episodeId/:sceneId/voices/:filenamevoiceover:deleteDelete voice file

Endpoints​

Download voice file​

GET /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename — Auth required

Returns the audio file as a binary stream. The frontend must fetch this via an authenticated request (Bearer token) — not new Audio(url) directly — to avoid 401 errors.


Upload voice audio​

POST /api/voices/upload — Auth required, multipart/form-data, Permission: voiceover:generate

Form fields:

FieldTypeRequired
audioaudio file (MP3, WAV, OGG)yes
projectIdstring (UUID)yes
episodeIdstring (UUID)yes
sceneIdstring (UUID)yes

Response — 201

{
"url": "/api/voices/file/proj-uuid/ep-uuid/scene-uuid/voices/take1.mp3",
"filename": "take1.mp3"
}

Generate TTS voice​

POST /api/voices/tts — Auth required, Permission: voiceover:generate

Calls the ElevenLabs TTS API with the given voice and text, then streams the resulting audio directly back to the caller as audio/mpeg.

Request body

{
"voiceId": "EXAVITQu4vr4xnSDxMaL",
"text": "We need to leave. Now."
}
FieldRequiredNotes
voiceIdyesElevenLabs voice ID
textyesThe text to synthesize

Response — 200 — binary audio stream (Content-Type: audio/mpeg)


Delete voice file​

DELETE /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename — Auth required, Permission: voiceover:delete

Response — 204