Audio Search

Executes a semantic search on audio transcripts using a natural language text query (e.g., 'climate change' or 'economic policy') within a specified dataset (dataset_id) to find audio segments with semantically similar spoken content in their transcripts. The text query is encoded into an embedding using the dataset's configured audio encoder (e.g., sentence transformers, Qwen, or other text encoders), then searched against audio transcript embeddings in the dataset using vector similarity. This performs semantic matching on what was spoken/transcribed in the audio, not exact phrase matching or audio sounds/effects. Returns a ranked list of audio segments (from videos) ordered by similarity score, where each result includes the audio segment details, parent video information, transcript text, timestamps, and an associated frame for visual reference. Results can optionally include content moderation scores.

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

This endpoint expects an object.
dataset_idstringRequiredformat: "uuid"
The unique identifier for the dataset
text_querystringRequired
The text query to search for
offsetintegerOptional0-9223372036854776000Defaults to 0
Starting index to return
limitintegerOptional1-1000Defaults to 100
Max number of items to return
metadata_filtersobject or nullOptional
List of metadata filters to apply to the search

Response

Successful Response
datalist of objects
The paginated results for native video model datasets

Errors

422
Unprocessable Entity Error