Voiceover

Voiceover endpoints handle recording and serving the narrated audio for dialog scene blocks. Audio can be uploaded from a file or synthesized via ElevenLabs text-to-speech.

Endpoints overview

Method	Endpoint	Permission	Description
`GET`	`/api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename`	authenticated	Download voice file
`POST`	`/api/voices/upload`	`voiceover:generate`	Upload voice audio
`POST`	`/api/voices/tts`	`voiceover:generate`	Generate TTS voice
`DELETE`	`/api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename`	`voiceover:delete`	Delete voice file

Endpoints

Download voice file

GET /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename — Auth required

Returns the audio file as a binary stream. The frontend must fetch this via an authenticated request (Bearer token) — not new Audio(url) directly — to avoid 401 errors.

Upload voice audio

POST /api/voices/upload — Auth required, multipart/form-data, Permission: voiceover:generate

Form fields:

Field	Type	Required
`audio`	audio file (MP3, WAV, OGG)	yes
`projectId`	string (UUID)	yes
`episodeId`	string (UUID)	yes
`sceneId`	string (UUID)	yes

Response — 201

{
  "url": "/api/voices/file/proj-uuid/ep-uuid/scene-uuid/voices/take1.mp3",
  "filename": "take1.mp3"
}

Generate TTS voice

POST /api/voices/tts — Auth required, Permission: voiceover:generate

Calls the ElevenLabs TTS API with the given voice and text, then streams the resulting audio directly back to the caller as audio/mpeg.

Request body

{
  "voiceId": "EXAVITQu4vr4xnSDxMaL",
  "text": "We need to leave. Now."
}

Field	Required	Notes
`voiceId`	yes	ElevenLabs voice ID
`text`	yes	The text to synthesize

Response — 200 — binary audio stream (Content-Type: audio/mpeg)

Delete voice file

DELETE /api/voices/file/:projectId/:episodeId/:sceneId/voices/:filename — Auth required, Permission: voiceover:delete

Response — 204

Endpoints overview​

Endpoints​

Download voice file​

Upload voice audio​

Generate TTS voice​

Delete voice file​