Narrative Metadata
Narrative Metadata lets you automatically classify videos with AI-generated attributes like mood, subject, genre, and format. Define custom taxonomies, tag videos against your categories, and persist the results as searchable metadata for SQL-based filtering and analytics.
When to use Narrative Metadata
Narrative Metadata works best when you need to categorize videos by storytelling elements rather than visual features.
-
Entertainment Industry Example – Mood-Based Curation
-
A streaming platform classifies its library by mood (suspenseful, uplifting, comedic) to power “mood playlists” and personalized recommendations based on emotional tone.
-
Media Industry Example – Ad Placement
-
A media company tags content by subject and genre to match advertisements to contextually appropriate videos, ensuring brand-safe placements.
-
Content Operations Example – Library Organization
-
A production studio automatically categorizes thousands of videos by format (scripted, unscripted, interview, live event) to streamline content discovery and asset management.
Narrative Metadata works best when your categories are well-defined and mutually distinguishable. If you need to detect specific visual objects or custom visual patterns, consider using Concepts instead.
Classification Dimensions
Narrative Metadata supports four independent classification dimensions, each with its own custom taxonomy:
We recommend 5–15 categories per dimension. Keep categories distinct—avoid near-synonyms like “happy” and “joyful” in the same taxonomy.
Getting Started
Step 0: Generate Captions for Videos
Before classifying videos, generate captions for each video’s keyframes. The Narrative Metadata system uses these captions to understand your video content.
Get your video IDs
Query your dataset to get the videos you want to process:
Generate captions for each video
Call the caption-keyframes endpoint for each video:
API Endpoint: POST /api/v0/video-summarization/datasets/{dataset_id}/videos/{video_id}/caption-keyframes
Example Request:
Example Response:
Process all videos in your dataset before moving to Step 1. Caption generation is required for accurate narrative classification.
Step 1: Define Your Taxonomy
Define categories for each dimension you want to use. Each category requires:
- Name: A stable identifier (e.g., “suspenseful”)
- Description: A clear definition that helps the model understand what to look for
- Examples: Representative examples to guide classification
Categories are scoped to your dataset and must be created before classification.
API Endpoint: POST /api/v0/video-narrative-metadata/metadata
Key Fields:
dataset_id: Your dataset identifiermetadata_type: The dimension (mood,subject,genre, orformat)name: Category namedescription: Category definitionexamples: Representative examples
Taxonomy setup is typically a one-time operation per dataset. If you change your taxonomy later, you’ll need to reclassify affected videos.
Step 2: Classify Videos
Once your taxonomy is defined, classify videos against your categories:
- Select the classification dimension (e.g., mood)
- Provide the list of candidate category names
- Submit the classification request
- Receive the matched categories
The API returns only the categories it determines are present in the video.
API Endpoints:
Request:
Response:
Step 3: Persist Results as Metadata
Classification results must be explicitly persisted to make them available for search and SQL queries. Use the ingestion metadata endpoint to write classifications back to your assets.
API Endpoint: POST /api/v1/ingestion/metadata
Use a consistent naming convention for metadata keys (e.g., video_narrative_mood, video_narrative_subject) to make querying easier.
The metadata endpoint supports upsert semantics and can batch multiple asset updates in a single request. Only persist final classifications to avoid overwriting unrelated metadata.
Best Practices
Dos
- Keep categories mutually distinguishable—the model performs best when categories don’t overlap
- Use clear, descriptive names that reflect visual and narrative content
- Include concrete examples in category definitions
- Start with 5–15 categories per dimension and adjust based on results
Don’ts
- Overly granular emotional states: Use “sad” rather than separate categories for “melancholic,” “sorrowful,” “dejected”
- Synonym-based categories: Avoid having both “exciting” and “thrilling” in the same taxonomy
- Implicit hierarchies: If you need “action > chase scene,” encode that structure explicitly
Querying Narrative Metadata with SQL
Once persisted, narrative metadata can be queried using the Query Engine. For example, to find all videos classified as “suspenseful”:
We are here to help
If you have questions about setting up Narrative Metadata or designing your taxonomy, please feel free to contact us here.
