FastPix generates subtitles from your video’s audio using AI transcription, creating a synchronized WebVTT track that improves accessibility, SEO, and viewer engagement. You can enable auto-generation at upload time by including a subtitles object in the request, or generate subtitles for an existing video by calling the generate-subtitles endpoint with the audio trackId.
trackIdmediaId is the unique identifier FastPix assigns to every uploaded asset.trackId identifies a single audio, video, or subtitle track that belongs to a media asset.playbackId is a separate, access-controlled identifier used to construct the HLS playback URL: https://stream.fastpix.com/<playbackId>.m3u8.accessPolicy field controls whether a media is public or private.FastPix transcribes your on-demand media using the OpenAI Whisper model, converting spoken audio into synchronized subtitles.
Audio quality: Auto-generated captions perform best with clear audio. Results may vary on media with excessive non-speech audio, such as music, background noise, or long silences.
Language compatibility: FastPix generates subtitles in the same language as the audio. The feature does not translate captions into other languages.
Test this feature with your typical content to evaluate transcription quality before rolling it out in production.
Enable auto-generated subtitles at upload time by including a subtitles object in your upload request.
The subtitles object takes three fields:
"english").en).IMPORTANT
Verify thatlanguageCodematches the spoken language in your video. The transcription model follows this setting.
After upload, FastPix transcribes the audio and attaches a synchronized WebVTT subtitle track to the asset.
FastPix supports the following languages and language codes for auto-generated subtitles on on-demand media:
NOTE
Subtitles match the spoken language directly. FastPix does not generate translated captions from this endpoint.
Call the generate track subtitles API with the audio trackId to produce subtitles for an asset that is already ready. Provide the language name and code in the request body.
Endpoint:
POST
api.fastpix.com/v1/on-demand/{mediaId}/tracks/{trackId}/generate-subtitles
Request headers:
Content-Type: application/json
Authorization: Basic Auth YOUR_ACCESS_TOKEN YOUR_SECRET_KEY
NOTE
- Use the correct
trackIdfor the audio track.- Make sure the
languageCodefollows BCP 47 standards.
If your media has an auto-generated subtitle track, you can extract a plain text transcript of the recognized speech. This is useful for content moderation, sentiment analysis, summarization, or downstream processing.
To retrieve the transcript, use the playbackId of the media and the trackId of the generated subtitles.
A plain text transcript returns the raw, unformatted speech content without timestamps. This format suits natural language processing pipelines and search indexing.
To fetch the transcript in plain text:
NOTE
The plain text transcript contains only spoken words, without timecodes or additional metadata.
A WebVTT file provides subtitles in a structured format with timestamps for synchronization in HLS-compatible players. Use this format to edit, refine, or repurpose subtitles on other platforms.
To fetch the WebVTT file, replace .txt with .vtt:
NOTE
Most HLS-compatible players support WebVTT, and you can edit the file in any text or subtitle editor.
If your video uses signed playback, append a JWT (JSON Web Token) as a query parameter on the transcript URL so only authorized viewers can fetch it.
For WebVTT subtitles on signed media:
Transcripts extend accessibility, repurpose content, and integrate subtitles into external workflows.
Use cases for transcripts
Auto-generated captions rely on AI transcription, which can misinterpret strong accents, background noise, or fast dialogue. To correct errors:
Download the existing WebVTT file:
Edit the file in a text editor or subtitle editor such as Aegisub or Subtitle Edit.
Remove the auto-generated track using the Delete track API.
You can also overwrite the edited subtitles in place using the Update track API, or continue to the next step.
Upload the edited subtitles as a new track via the Add track API.
This workflow keeps subtitles accurate and improves the viewing experience.
Audio quality: Use clear, high-quality audio. Minimize background sounds, echo, and interruptions.
Consistent speech: Maintain a steady speaking pace and clear pronunciation. Avoid mixing languages inside a single segment, the transcription model may not differentiate between them accurately.
Language consistency: Keep the entire video in a single language where possible. For multilingual content, post-edit or author subtitles manually for the non-primary segments.
How accurate is FastPix AI transcription?
Accuracy depends on audio clarity, speaking pace, and language. Transcription quality is highest on fully supported languages with clean speech audio; Beta languages and content with heavy background noise, accents, or overlapping speakers can lower accuracy. Test on a representative sample before rolling out.
Can FastPix generate subtitles in multiple languages for the same video?
The auto-generate endpoint produces subtitles in the same language as the audio track, not translations. If a media asset has multiple audio tracks in different languages, you can call the generate-subtitles endpoint once per audio trackId to produce one subtitle track per language.
What happens if transcription fails or the result is poor?
Remove the generated track with the Delete track API and call the generate track subtitles endpoint again after verifying the correct trackId, a supported languageCode, and clean audio. For manual corrections, edit the WebVTT file and re-upload it via the Add track API.
What formats does FastPix return for auto-generated subtitles?
FastPix returns a WebVTT (.vtt) subtitle track for synchronized playback and a plain text (.txt) transcript for downstream processing. Both are served from https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.{ext}.