For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
StatusSupportDiscussionsLog inSign Up
Docs HomeAPI ReferenceVideo on DemandAI FeaturesLive StreamingVideo PlayerVideo DataCloud PlayoutRecipes
Docs HomeAPI ReferenceVideo on DemandAI FeaturesLive StreamingVideo PlayerVideo DataCloud PlayoutRecipes
  • Get started
    • Overview
    • Quickstart
  • Upload videos
    • Upload videos from device
    • Upload videos from a URL
    • Upload 4K videos
    • Speed up video processing
  • Playback and delivery
    • Play your videos
    • Embed a video in your app
    • Configure media quality levels
    • Enable MP4 Support for offline viewing
    • Create and manage playlists
  • Edit and transform videos
    • Add metadata to videos
    • Add a watermark to a video
    • Add an intro and outro to a video
    • Clip and trim videos
    • Merge and stitch videos
    • Remove unwanted video segments
  • Manage audio and subtitle tracks
    • Upload and play audio and subtitle tracks
    • Add subtitles to a video
    • Generate subtitles automatically
    • Add audio to a video
    • Replace a video's audio track
    • Normalize audio loudness
    • Overlay audio on a video timeline
  • Extract images from video
    • Create thumbnails from a video
    • Create GIFs from a video
    • Create timeline hovers from a video
  • Video security
    • Generate JWTs for secure media
    • Secure media access with JWTs
    • Restrict playback access
    • Set up DRM encryption
    • FairPlay DRM integration
  • VOD events
    • Media events
    • Transform media events
LogoLogo
StatusSupportDiscussionsLog inSign Up
On this page
  • Prerequisites
  • Key terms
  • How auto-generated subtitles work
  • Key considerations
  • Generate subtitles during upload
  • Prepare your video
  • Enable subtitle generation in the request
  • Upload video using url request
  • Upload video from device request
  • Process the video
  • Supported languages for auto-generated subtitles
  • Generate subtitles for existing audio tracks
  • Retrieve a transcript
  • Plain text transcript (TXT format)
  • WebVTT subtitle file (VTT format)
  • Retrieve transcripts for signed media
  • Edit or replace generated subtitles
  • Best practices for accurate subtitles
  • Frequently asked questions
  • What’s next?
Manage audio and subtitle tracks

Generate subtitles automatically

Automatically generate subtitles from video audio using an AI transcription model for accessibility and engagement.
Was this page helpful?
Previous

Add audio to a video

Manage multiple audio tracks for a video, including adding, updating, and deleting tracks to enable multi-language support.

Next
Built with

FastPix generates subtitles from your video’s audio using AI transcription, creating a synchronized WebVTT track that improves accessibility, SEO, and viewer engagement. You can enable auto-generation at upload time by including a subtitles object in the request, or generate subtitles for an existing video by calling the generate-subtitles endpoint with the audio trackId.


Prerequisites

  • A FastPix account with an active workspace (Activate your account)
  • Your Access Token ID and Secret Key from the FastPix dashboard
  • A video uploaded or a ready video with a known audio trackId
  • Clear source audio in a supported language

Key terms

  • mediaId is the unique identifier FastPix assigns to every uploaded asset.
  • trackId identifies a single audio, video, or subtitle track that belongs to a media asset.
  • playbackId is a separate, access-controlled identifier used to construct the HLS playback URL: https://stream.fastpix.com/<playbackId>.m3u8.
  • accessPolicy field controls whether a media is public or private.

How auto-generated subtitles work

FastPix transcribes your on-demand media using the OpenAI Whisper model, converting spoken audio into synchronized subtitles.


Key considerations

Audio quality: Auto-generated captions perform best with clear audio. Results may vary on media with excessive non-speech audio, such as music, background noise, or long silences.

Language compatibility: FastPix generates subtitles in the same language as the audio. The feature does not translate captions into other languages.

Test this feature with your typical content to evaluate transcription quality before rolling it out in production.


Generate subtitles during upload

Enable auto-generated subtitles at upload time by including a subtitles object in your upload request.

Prepare your video

  • Remove unwanted sounds, reduce background noise, and avoid overlapping audio.
  • Adjust volume levels so speech is clear and audible.

Enable subtitle generation in the request

The subtitles object takes three fields:

  • languageName: The language of the audio (for example, "english").
  • metadata: Optional key-value pairs to tag the subtitle track.
  • languageCode: The BCP 47 code for the spoken language (for example, en).

Upload video using url request

1{
2 "inputs": [
3 {
4 "type": "video",
5 "url": "https://example.com/sample.mp4"
6 }
7 ],
8 "subtitles": {
9 "languageName": "english",
10 "metadata": {
11 "key1": "value1"
12 },
13 "languageCode": "en"
14 },
15 "accessPolicy": "public"
16}

Upload video from device request

1{
2 "corsOrigin": "*",
3 "pushMediaSettings": {
4 "accessPolicy": "public",
5 "subtitles": {
6 "languageName": "english",
7 "metadata": {
8 "key1": "value1"
9 },
10 "languageCode": "en"
11 },
12 "maxResolution": "1080p"
13 }
14}

IMPORTANT
Verify that languageCode matches the spoken language in your video. The transcription model follows this setting.


Process the video

After upload, FastPix transcribes the audio and attaches a synchronized WebVTT subtitle track to the asset.


Supported languages for auto-generated subtitles

FastPix supports the following languages and language codes for auto-generated subtitles on on-demand media:


LanguageLanguage CodeStatus
EnglishenSupported
SpanishesSupported
ItalianitSupported
PortugueseptSupported
GermandeSupported
FrenchfrSupported
PolishplBeta
RussianruBeta
DutchnlBeta
CatalancaBeta
TurkishtrBeta
SwedishsvBeta
UkrainianukBeta
NorwegiannoBeta
FinnishfiBeta
SlovakskBeta
GreekelBeta
CzechcsBeta
CroatianhrBeta
DanishdaBeta
RomanianroBeta
BulgarianbgBeta

NOTE
Subtitles match the spoken language directly. FastPix does not generate translated captions from this endpoint.


Generate subtitles for existing audio tracks

Call the generate track subtitles API with the audio trackId to produce subtitles for an asset that is already ready. Provide the language name and code in the request body.


Endpoint: POST

api.fastpix.com/v1/on-demand/{mediaId}/tracks/{trackId}/generate-subtitles


Request headers:

Content-Type: application/json

Authorization: Basic Auth YOUR_ACCESS_TOKEN YOUR_SECRET_KEY


Request Body
1{
2 "languageCode": "en",
3 "languageName": "English"
4}

NOTE

  • Use the correct trackId for the audio track.
  • Make sure the languageCode follows BCP 47 standards.

Response
1{
2 "success": true,
3 "data": {
4 "id": "0fd317c7-7237-413c-a252-8ad68b370166",
5 "type": "subtitle",
6 "languageCode": "en",
7 "languageName": "English"
8 }
9}

Retrieve a transcript

If your media has an auto-generated subtitle track, you can extract a plain text transcript of the recognized speech. This is useful for content moderation, sentiment analysis, summarization, or downstream processing.

To retrieve the transcript, use the playbackId of the media and the trackId of the generated subtitles.


Plain text transcript (TXT format)

A plain text transcript returns the raw, unformatted speech content without timestamps. This format suits natural language processing pipelines and search indexing.

To fetch the transcript in plain text:

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt

NOTE

The plain text transcript contains only spoken words, without timecodes or additional metadata.


WebVTT subtitle file (VTT format)

A WebVTT file provides subtitles in a structured format with timestamps for synchronization in HLS-compatible players. Use this format to edit, refine, or repurpose subtitles on other platforms.

To fetch the WebVTT file, replace .txt with .vtt:


https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt

NOTE

Most HLS-compatible players support WebVTT, and you can edit the file in any text or subtitle editor.


Retrieve transcripts for signed media

If your video uses signed playback, append a JWT (JSON Web Token) as a query parameter on the transcript URL so only authorized viewers can fetch it.

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt?token={JWT}

For WebVTT subtitles on signed media:

https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt?token={JWT}

Transcripts extend accessibility, repurpose content, and integrate subtitles into external workflows.


Use cases for transcripts

  • Automated content review- Run transcripts through AI tools to detect key topics or compliance issues.
  • SEO optimization- Transcripts make video content indexable and improve search coverage.
  • Podcast and blog conversion- Convert video speech into written formats for repurposing.
  • Educational materials - Provide readable transcripts alongside instructional videos.

Edit or replace generated subtitles

Auto-generated captions rely on AI transcription, which can misinterpret strong accents, background noise, or fast dialogue. To correct errors:

  1. Download the existing WebVTT file:

    https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt
  2. Edit the file in a text editor or subtitle editor such as Aegisub or Subtitle Edit.

  3. Remove the auto-generated track using the Delete track API.

    You can also overwrite the edited subtitles in place using the Update track API, or continue to the next step.

  4. Upload the edited subtitles as a new track via the Add track API.

This workflow keeps subtitles accurate and improves the viewing experience.


Best practices for accurate subtitles

  • Audio quality: Use clear, high-quality audio. Minimize background sounds, echo, and interruptions.

  • Consistent speech: Maintain a steady speaking pace and clear pronunciation. Avoid mixing languages inside a single segment, the transcription model may not differentiate between them accurately.

  • Language consistency: Keep the entire video in a single language where possible. For multilingual content, post-edit or author subtitles manually for the non-primary segments.


Frequently asked questions

How accurate is FastPix AI transcription?

Accuracy depends on audio clarity, speaking pace, and language. Transcription quality is highest on fully supported languages with clean speech audio; Beta languages and content with heavy background noise, accents, or overlapping speakers can lower accuracy. Test on a representative sample before rolling out.

Can FastPix generate subtitles in multiple languages for the same video?

The auto-generate endpoint produces subtitles in the same language as the audio track, not translations. If a media asset has multiple audio tracks in different languages, you can call the generate-subtitles endpoint once per audio trackId to produce one subtitle track per language.

What happens if transcription fails or the result is poor?

Remove the generated track with the Delete track API and call the generate track subtitles endpoint again after verifying the correct trackId, a supported languageCode, and clean audio. For manual corrections, edit the WebVTT file and re-upload it via the Add track API.

What formats does FastPix return for auto-generated subtitles?

FastPix returns a WebVTT (.vtt) subtitle track for synchronized playback and a plain text (.txt) transcript for downstream processing. Both are served from https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.{ext}.


What’s next?

  • Upload videos from a URL
  • Add tracks to existing media
  • API Reference: generate subtitle track
  • Secure playback with JWTs