How to add multi-language subtitle tracks to your video

June 16, 2026

12 Min

Video Engineering

Adding multi-language subtitles to a video product comes down to three implementation choices: where in the upload flow the subtitle work happens, how the language inventory stays consistent as the catalog grows, and how the player surfaces the languages to viewers. Get the first wrong and your translation pipeline is locked to one workflow shape. Get the second wrong and the team is running a spreadsheet against the catalog every quarter. Get the third wrong and every new language ships with a player rebuild.

TL;DR

There are three places subtitle work happens: alongside the upload, after the video is already live, or auto-generated from the audio. Which one fits depends on where your translation pipeline lands its files.
WebVTT is the modern web standard. SRT is what most translation vendors ship. Both work as input; the player only ever sees WebVTT.
Languages travel through the system as BCP 47 codes (en, fr, pt-BR). The visible label in the player's language picker is a separate, human-readable field.
Every subtitle file becomes a track with an ID. You update it, delete it, or swap its URL without touching the underlying video.
Auto-generation transcribes the audio, it does not translate. A Spanish subtitle track needs either an authored Spanish file or a Spanish audio track to transcribe from.
Once tracks are attached, any HLS-compatible player surfaces them in its built-in language picker by reading the manifest. Zero player-side code, no app update per new language.

Where the subtitle work happens

Method 1: When every translation is ready before the upload

The cleanest path is also the rarest in practice. Every translation has come back from the vendor, every review pass is done, and the upload has not happened yet.

For that situation, the URL upload endpoint accepts an inputs array where every entry is its own object. The video is one entry. Each subtitle file is another entry. The asset is created with every language attached before processing even starts, so the first time the video appears in any player, every language is already in the picker.

POST /v1/on-demand/media:

json

1{
2  "inputs": [
3    { "type": "video", "url": "https://static.fastpix.com/fp-sample-video.mp4" },
4    { "type": "subtitle", "url": "https://static.fastpix.com/subtitle_Spanish.srt",    "languageCode": "es", "languageName": "Spanish" },
5    { "type": "subtitle", "url": "https://static.fastpix.com/subtitle_Russian.srt",    "languageCode": "ru", "languageName": "Russian" },
6    { "type": "subtitle", "url": "https://static.fastpix.com/subtitle_Portuguese.srt", "languageCode": "pt", "languageName": "Portuguese" }
7  ],
8  "accessPolicy": "public",
9  "maxResolution": "1080p"
10}

Direct uploads accept the same shape inside pushMediaSettings.inputs. Either path, the same one-call result.

When this method fits: marketing localization where every language is reviewed before launch, training content with a final approval pass, anything where the upload happens after the language work, not before.

Method 2: When languages arrive one at a time

In practice, catalogs rarely operate that cleanly. English ships first because that is what is available on Monday. Two weeks later, the French translation lands from the vendor. The Portuguese file shows up a month after that, and Italian only when the Q3 EU launch is finally on the calendar.

For staggered translations, the Add Track endpoint adds a single subtitle track to a video that is already live.

POST /v1/on-demand/{mediaId}/tracks:

json

1{
2  "tracks": {
3    "url": "https://static.fastpix.com/subtitle_Spanish.srt",
4    "type": "subtitle",
5    "languageCode": "es",
6    "languageName": "Spanish"
7  }
8}

The response returns a new trackId. Save it. That ID is the handle for later updates and deletes on this specific track.

Three webhook events fire as the track makes its way through processing: video.media.track.created when the request is accepted, video.media.track.ready when the file is processed and available, and video.media.updated so any system watching the asset-level state knows something changed. The end-to-end pattern for consuming these events (signatures, retries, idempotency) is covered in Webhooks for video streaming.

When this method fits: translations land asynchronously from a vendor, an LSP queue, a human review pass, or a post-edit machine translation pipeline. Also when a single language needs a correction without re-uploading the whole video.

Method 3: When the script only exists as audio

There is a third situation that comes up more often than the others: the script never existed in writing at all. The video is a founder explainer, a customer interview, a recorded webinar, a marketing improv. No .vtt was authored because nobody sat down and authored one.

For those, the audio itself is the source. A transcription model produces the subtitle track from what was actually spoken.

Two trigger points. Inline at upload, by including a subtitles object on the create-media payload:

json

1{
2  "corsOrigin": "*",
3  "pushMediaSettings": {
4    "accessPolicy": "public",
5    "subtitles": { "languageName": "english", "languageCode": "en" },
6    "maxResolution": "1080p"
7  }
8}

Or for a video that is already live, call the generate-subtitles endpoint against the audio trackId:

POST /v1/on-demand/{mediaId}/tracks/{trackId}/generate-subtitles:

json

1{ "languageCode": "en", "languageName": "English" }

One thing about this method that surprises people: the model transcribes, it does not translate. English audio produces an English subtitle track. To ship a Spanish subtitle track from English audio, the path is either to author the Spanish .vtt manually and use Method 1 or 2, or to attach a Spanish-language audio track and transcribe against that.

Six languages are fully supported for production: English, Spanish, Italian, Portuguese, German, French. Sixteen more are in beta: Polish, Russian, Dutch, Catalan, Turkish, Swedish, Ukrainian, Norwegian, Finnish, Slovak, Greek, Czech, Croatian, Danish, Romanian, Bulgarian.

When this method fits: no subtitle files exist, the audio is in a supported language, and the goal is to make the content searchable and accessible rather than to ship a final user-facing experience without a human review pass.

Keeping the language inventory consistent

Once the catalog has more than a handful of videos with multiple languages each, the next question is which video has which language right now. Three operations cover most of it.

Read inventory: GET /v1/on-demand/{mediaId} returns a tracks array listing every audio and subtitle track with its type, languageCode, languageName, and id. Run it against the catalog to surface which assets are missing which languages.

Correct or replace: the Update Track endpoint swaps the URL or language metadata in place, keeping the trackId stable. Use Delete Track followed by Add when downstream systems do not depend on the ID staying the same.

Remove: Delete Track takes mediaId plus trackId. It is permanent, so confirm the ID first.

A practical pattern that scales: log (mediaId, trackId, languageCode) to your application database when the track.ready webhook fires. That local index, refreshed on every webhook, becomes the inventory dashboard your team checks instead of hitting the video API for every page load.

How the languages reach the viewer

That covers attaching tracks. The next question is how the viewer actually picks one, and the short answer is that the player does that part on its own.

Every attached track lands in the HLS manifest under the SUBTITLES group, which is part of the HLS specification (RFC 8216). Any HLS-compatible player reads the manifest, populates its language picker with your languageName values, and switches tracks when the viewer picks a new one. The fifth language you add travels the same path as the first. The fiftieth would too.

What that means for the team: no player rebuild per language, no client config to ship, no app store release for the new track. The picker updates itself the moment the track.ready webhook fires.

This holds across the FastPix Player SDKs (Web, iOS, Android, Flutter, React Native) and every third-party HLS player your stack is likely to reach for: hls.js, video.js, Shaka Player, AVPlayer, ExoPlayer, and the readers built into modern CTV runtimes.

For the auto-generated track specifically, the raw WebVTT file and a plain text transcript are available at well-known URLs:

text

1https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.vtt
2https://stream.fastpix.com/{PLAYBACK_ID}/text/{TRACK_ID}.txt

The .vtt is what the manifest points at. The .txt is the same content as a plain transcript, useful for search indexing, content moderation, summarization, or downstream analytics. Both accept a ?token={JWT} query parameter when the video uses signed playback.

WebVTT or SRT: the format conversation matters less than it feels

The format question usually shows up early in the design conversation and consumes more time than it deserves. The short version: pick the one your authoring pipeline already produces.

WebVTT is the W3C standard for the HTML5 video element. It supports styling, positioning, and metadata, and the HTML5 <track> element requires it. The header is the WEBVTT line. Timestamps use a period for milliseconds: 00:00:01.000.

SRT is the older, simpler format. No styling, no positioning, no metadata. Comma separator: 00:00:01,000. Most translation vendors send you SRT because that is what their desktop tools have always produced.

The upload-based methods accept either format. The auto-generation method only produces WebVTT. Either way, the player only ever sees WebVTT-compatible output. So the format you upload is a question about your authoring pipeline, not about anything the viewer notices.

If you control authoring, store WebVTT. If a vendor ships SRT, upload SRT. There is no conversion step worth engineering at ingest.

A short note on accessibility compliance

This is a developer reference, not a legal review. But the team adding multi-language subtitles is usually the same team that gets pulled into the accessibility audit, so it is worth knowing what the audit checks against.

Four frameworks show up in 2026 product conversations: WCAG 2.2 from the W3C, the EU Accessibility Act (effective June 28, 2025, in scope for any product reaching EU users), ADA Title III in the United States, and Section 508 for federal contracts. They converge on the same bar: prerecorded video with audio needs captions or subtitles, synced within about a second, exposed through a language picker, with rendered contrast clearing 4.5:1.

The three methods above cover the technical implementation. What they do not cover is the editorial pass: word-level accuracy on proper nouns, regulated content review, brand-voice corrections. Auto-generated tracks especially benefit from a human review pass before they ship into production traffic, since the audit will be reading the same text your viewers do.

Get started

The hands-on version of all of the above takes about an hour the first time.

Create a FastPix account and grab the Access Token ID and Secret Key from the dashboard.
Upload the video. If translations are already in hand, include subtitle objects inline. If not, include the subtitles object to auto-generate.
For each language that lands later from the translation pipeline, POST to the /tracks endpoint with the language-specific payload.
Wire the three webhook events into the catalog system so the language inventory dashboard updates in real time.

The on-demand video product on FastPix includes 100,000 streaming views per month on the free tier, which is enough to ship the setup, verify the player handoff across web and mobile, and run the first few thousand production views before any commercial decision.

FAQ

How many subtitle tracks can a single video have?

There is no documented limit. Each track needs a distinct languageCode, and the player surfaces every attached track in its language picker.

Will viewers see the language picker automatically?

Yes, with any HLS-compatible player. The subtitle tracks are embedded in the HLS manifest under the SUBTITLES group, which is the standard surface for any compliant player. The picker labels come from the languageName field set when the track was added.

Can a subtitle track in one language be translated into another language automatically?

The generate-subtitles endpoint transcribes audio, it does not translate. To produce a Spanish track from English audio, either author the Spanish .vtt yourself and use Method 1 or 2, or attach a Spanish-language audio track and transcribe against that audio trackId.

Does VTT versus SRT need different code on the client?

No. The same Add Track payload accepts either format. Set the url to the .vtt or .srt file. The player always sees WebVTT-compatible output.

What happens if a subtitle URL stops responding before ingest finishes?

The track stays in a pending state. Update Track can repoint it at a working URL, or Delete + Add can replace it.

Do new languages need a player or app update?

No. As soon as track.ready fires, the language is available in the manifest, and the next viewer to load the asset sees it in the picker without any deploy.

Which language codes are accepted?

BCP 47. Use the primary language subtag (en, fr, es) when the region does not matter, or language plus region (en-US, pt-BR, zh-Hant) when it does. The visible picker label comes from the separate languageName field, so the code can be precise without making the UI verbose.

Author