Extract named entities from a video
Extract named entities such as people, organizations, and locations from video or audio with FastPix in-video AI, then use them for semantic tagging, search, and content discovery.
Extract named entities such as people, organizations, and locations from video or audio with FastPix in-video AI, then use them for semantic tagging, search, and content discovery.
Video libraries become unsearchable when the only metadata is a filename and upload date, viewers cannot find content by topic, speaker, or location. FastPix solves this by extracting named entities (people, organizations, locations, dates, and key terms) from video and audio transcripts using Named Entity Recognition (NER), part of in-video AI multimodal indexing. Enable the feature by setting namedEntities: true during upload or by sending a PATCH request to the named entities endpoint for existing media.
mediaId for previously uploaded mediavideo.mediaAI.namedEntities.ready eventmediaId is the unique identifier FastPix assigns to every uploaded video.playbackId is a separate, access-controlled identifier used to construct the HLS URL https://stream.fastpix.com/<playbackId>.m3u8. Named entity extraction operates on the mediaId and produces structured metadata you retrieve through the Get Media by ID endpoint or a webhook event.Named Entity Recognition is a natural language processing technique that turns unstructured transcript text into structured tags. In the sentence “Apple announced its latest product in California,” NER identifies Apple as an Organization, not a fruit, and California as a Location. FastPix attaches a category to every extracted entity so you can filter and index by type.
FastPix groups entities into categories that cover the common NER taxonomy:
John SmithUNICEF, FastPixMount EverestJuly 4, 1776iPhone 14 or monetary values such as $1,000Read more background in our blog on named entity recognition.
Enable NER at upload time by adding namedEntities: true to the request body. FastPix runs transcription, then extracts and ranks entities during encoding.
POST request to the Create media from URL endpoint or the Direct upload endpoint.The following parameters control entity extraction:
type: specify whether the input is video or audio.url: the HTTPS URL of the source file (URL-based uploads only).namedEntities : set to true to enable extraction.accessPolicy: (optional). Set to public or private.maxResolution: (optional). Cap the output rendition, for example 1080p.Request body (create new media from URL):
Request body (create new media by direct upload):
Run NER on an video you have already uploaded by calling the Generate named entities endpoint.
mediaId of the media from the dashboard or from a prior create-media response.PATCH request to /on-demand/<mediaId>/named-entities, replacing <mediaId> with the actual value.Example request body:
In the left navigation, go to Video > Media. On the Upload media page, add your video using one of the following methods:
The Media Settings panel opens. Select Custom settings and set "namedEntities": true in the JSON configuration. For example:
Click Continue, then click Start upload all media.
In the left navigation, go to Video > Media. Select the video you want to process from the media list to open its Media Details page.
In the left navigation of the Media Details page, click Named Entity Recognition under In-Video AI. Click Generate to start the analysis. FastPix analyzes the transcript and extracts named entities such as people, organizations, locations, dates, and other key terms.
Retrieve extracted entities in two ways:
video.mediaAI.namedEntities.ready webhook, which fires when extraction completes.Example event payload:
The namedEntities array contains each extracted entity and its category. Entities are ordered by relevance, so the first items reflect the topics most central to the transcript. Use this payload to populate a search index, drive semantic tagging, or enrich downstream recommendations.
mediaId is correct when calling /on-demand/<mediaId>/named-entities. A missing or mismatched ID returns a 404.accessPolicy and maxResolution are optional. Set them when you need private playback or a capped rendition.What is the difference between NER and POS tagging?
Part-of-speech (POS) tagging labels every word in a sentence with a grammatical role, such as noun, verb, or adjective. NER operates at a higher level: it finds spans of text that refer to real-world entities: people, organizations, locations and assigns each span a category. FastPix uses NER, not POS tagging, so you receive entity-level metadata rather than per-token grammar tags.
How are NER models trained?
NER models are trained on large corpora of text where human annotators have marked entity spans and categories. The model learns contextual patterns, for example, that a capitalized token following “Mr.” is likely a person and generalizes to new text. FastPix applies pretrained models to the transcript generated for your video, so you do not need to train or host a model yourself.
Can I build a custom NER model with FastPix?
FastPix does not expose a custom NER training endpoint. The in-video AI pipeline uses managed models optimized for transcripts across common domains. If you need domain-specific entities beyond the default categories, post-process the returned entities with your own classifier.
Why does my event payload show zero named entities?
Entity counts depend on the transcript. If the source file has no intelligible speech, contains silent sections only, or uses a language the transcription model does not support, the extractor returns an empty namedEntities array. Verify the transcript was generated and that the media contains spoken content.