Audio Search
Authentication
Bearer authentication of the form Bearer <token>, where token is your auth token.
Bearer authentication of the form Bearer <token>, where token is your auth token.
Executes a semantic search on audio transcripts using a natural language text query (e.g., ‘climate change’ or ‘economic policy’) within a specified dataset (dataset_id) to find audio segments with semantically similar spoken content in their transcripts. The text query is encoded into an embedding using the dataset’s configured audio encoder (e.g., sentence transformers, Qwen, or other text encoders), then searched against audio transcript embeddings in the dataset using vector similarity. This performs semantic matching on what was spoken/transcribed in the audio, not exact phrase matching or audio sounds/effects. Returns a ranked list of audio segments (from videos) ordered by similarity score, where each result includes the audio segment details, parent video information, transcript text, timestamps, and an associated frame for visual reference. Results can optionally include content moderation scores.