FastPix Developer Documentation

Q: What formats does FastPix return for auto-generated subtitles?

FastPix returns a WebVTT (.vtt) subtitle track for synchronized playback and a plain text (.txt) transcript for downstream processing. Both are served from https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.{ext}.

FastPix generates subtitles from your video’s audio using AI transcription, creating a synchronized WebVTT track that improves accessibility, SEO, and viewer engagement. You can enable auto-generation at upload time by including a subtitles object in the request, or generate subtitles for an existing video by calling the generate-subtitles endpoint with the audio trackId.

Prerequisites

A FastPix account with an active workspace (Activate your account)
Your Access Token ID and Secret Key from the FastPix dashboard
A video uploaded or a ready video with a known audio trackId
Clear source audio in a supported language

Key terms

mediaId is the unique identifier FastPix assigns to every uploaded asset.
trackId identifies a single audio, video, or subtitle track that belongs to a media asset.
playbackId is a separate, access-controlled identifier used to construct the HLS playback URL: https://stream.fastpix.com/<playbackId>.m3u8.
accessPolicy field controls whether a media is public or private.

How auto-generated subtitles work

FastPix transcribes your on-demand media using the OpenAI Whisper model, converting spoken audio into synchronized subtitles.

Key considerations

Audio quality: Auto-generated captions perform best with clear audio. Results may vary on media with excessive non-speech audio, such as music, background noise, or long silences.

Language compatibility: FastPix generates subtitles in the same language as the audio. The feature does not translate captions into other languages.

Test this feature with your typical content to evaluate transcription quality before rolling it out in production.

Generate subtitles during upload

Enable auto-generated subtitles at upload time by including a subtitles object in your upload request.

Prepare your video

Remove unwanted sounds, reduce background noise, and avoid overlapping audio.
Adjust volume levels so speech is clear and audible.

Enable subtitle generation in the request

The subtitles object takes three fields:

languageName: The language of the audio (for example, "english").
metadata: Optional key-value pairs to tag the subtitle track.
languageCode: The BCP 47 code for the spoken language (for example, en).

Upload video using url request

1 { 
2   "inputs": [ 
3     { 
4       "type": "video", 
5       "url": "https://example.com/sample.mp4"
6     } 
7   ],
8   "subtitles": { 
9     "languageName": "english", 
10     "metadata": { 
11       "key1": "value1" 
12     }, 
13     "languageCode": "en" 
14   }, 
15   "accessPolicy": "public" 
16 }

Upload video from device request

1 {
2   "corsOrigin": "*",
3   "pushMediaSettings": {
4     "accessPolicy": "public",
5     "subtitles": { 
6     "languageName": "english", 
7     "metadata": { 
8       "key1": "value1" 
9     }, 
10     "languageCode": "en" 
11   }, 
12     "maxResolution": "1080p"
13   }
14 }

IMPORTANT
Verify that languageCode matches the spoken language in your video. The transcription model follows this setting.

Process the video

After upload, FastPix transcribes the audio and attaches a synchronized WebVTT subtitle track to the asset.

Supported languages for auto-generated subtitles

FastPix supports the following languages and language codes for auto-generated subtitles on on-demand media:

Language	Language Code	Status
English	en	Supported
Spanish	es	Supported
Italian	it	Supported
Portuguese	pt	Supported
German	de	Supported
French	fr	Supported
Polish	pl	Beta
Russian	ru	Beta
Dutch	nl	Beta
Catalan	ca	Beta
Turkish	tr	Beta
Swedish	sv	Beta
Ukrainian	uk	Beta
Norwegian	no	Beta
Finnish	fi	Beta
Slovak	sk	Beta
Greek	el	Beta
Czech	cs	Beta
Croatian	hr	Beta
Danish	da	Beta
Romanian	ro	Beta
Bulgarian	bg	Beta

NOTE
Subtitles match the spoken language directly. FastPix does not generate translated captions from this endpoint.

Generate subtitles for existing audio tracks

Call the generate track subtitles API with the audio trackId to produce subtitles for an asset that is already ready. Provide the language name and code in the request body.

Endpoint: POST

api.fastpix.com/v1/on-demand/{mediaId}/tracks/{trackId}/generate-subtitles

Request headers:

Content-Type: application/json

Authorization: Basic Auth YOUR_ACCESS_TOKEN YOUR_SECRET_KEY

Request Body

1 { 
2    "languageCode": "en", 
3   "languageName": "English" 
4 }

NOTE

Use the correct trackId for the audio track.

Make sure the languageCode follows BCP 47 standards.

Response

1 {
2     "success": true,
3     "data": {
4         "id": "0fd317c7-7237-413c-a252-8ad68b370166",
5         "type": "subtitle",
6         "languageCode": "en",
7         "languageName": "English"
8     }
9 }

Retrieve a transcript

If your media has an auto-generated subtitle track, you can extract a plain text transcript of the recognized speech. This is useful for content moderation, sentiment analysis, summarization, or downstream processing.

To retrieve the transcript, use the playbackId of the media and the trackId of the generated subtitles.

Plain text transcript (TXT format)

A plain text transcript returns the raw, unformatted speech content without timestamps. This format suits natural language processing pipelines and search indexing.

To fetch the transcript in plain text:

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt

NOTE

The plain text transcript contains only spoken words, without timecodes or additional metadata.

WebVTT subtitle file (VTT format)

A WebVTT file provides subtitles in a structured format with timestamps for synchronization in HLS-compatible players. Use this format to edit, refine, or repurpose subtitles on other platforms.

To fetch the WebVTT file, replace .txt with .vtt:

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt

NOTE

Most HLS-compatible players support WebVTT, and you can edit the file in any text or subtitle editor.

Retrieve transcripts for signed media

If your video uses signed playback, append a JWT (JSON Web Token) as a query parameter on the transcript URL so only authorized viewers can fetch it.

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt?token={JWT}

For WebVTT subtitles on signed media:

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt?token={JWT}

Transcripts extend accessibility, repurpose content, and integrate subtitles into external workflows.

Use cases for transcripts

Automated content review- Run transcripts through AI tools to detect key topics or compliance issues.
SEO optimization- Transcripts make video content indexable and improve search coverage.
Podcast and blog conversion- Convert video speech into written formats for repurposing.
Educational materials - Provide readable transcripts alongside instructional videos.

Edit or replace generated subtitles

Auto-generated captions rely on AI transcription, which can misinterpret strong accents, background noise, or fast dialogue. To correct errors:

Download the existing WebVTT file:

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt

Edit the file in a text editor or subtitle editor such as Aegisub or Subtitle Edit.
Remove the auto-generated track using the Delete track API.

You can also overwrite the edited subtitles in place using the Update track API, or continue to the next step.
Upload the edited subtitles as a new track via the Add track API.

This workflow keeps subtitles accurate and improves the viewing experience.

Best practices for accurate subtitles

Audio quality: Use clear, high-quality audio. Minimize background sounds, echo, and interruptions.
Consistent speech: Maintain a steady speaking pace and clear pronunciation. Avoid mixing languages inside a single segment, the transcription model may not differentiate between them accurately.
Language consistency: Keep the entire video in a single language where possible. For multilingual content, post-edit or author subtitles manually for the non-primary segments.

Frequently asked questions

How accurate is FastPix AI transcription?

Accuracy depends on audio clarity, speaking pace, and language. Transcription quality is highest on fully supported languages with clean speech audio; Beta languages and content with heavy background noise, accents, or overlapping speakers can lower accuracy. Test on a representative sample before rolling out.

Can FastPix generate subtitles in multiple languages for the same video?

The auto-generate endpoint produces subtitles in the same language as the audio track, not translations. If a media asset has multiple audio tracks in different languages, you can call the generate-subtitles endpoint once per audio trackId to produce one subtitle track per language.

What happens if transcription fails or the result is poor?

Remove the generated track with the Delete track API and call the generate track subtitles endpoint again after verifying the correct trackId, a supported languageCode, and clean audio. For manual corrections, edit the WebVTT file and re-upload it via the Add track API.

What formats does FastPix return for auto-generated subtitles?

FastPix returns a WebVTT (.vtt) subtitle track for synchronized playback and a plain text (.txt) transcript for downstream processing. Both are served from https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.{ext}.

Generate subtitles and transcripts automatically