Executes a combined semantic and person-filtered search that takes a natural language text query (e.g., 'Jane Doe at a press conference' or 'John Smith speaking at a podium') and searches within a dataset to find images or video keyframes that both match the text query AND contain a specific enrolled person. The person must be enrolled in the organization for detection in images and video frames. You must specify the person via either person_ids (unique identifiers) OR person_names_or_aliases (names/aliases of enrolled persons) - exactly one is required, not both. The text query is encoded using the dataset's vision-language encoder (CLIP, SigLIP, Perception Encoder, etc.) and searched via vector similarity, filtered to assets where the specified person appears. Works with both legacy and native video model datasets. Use asset_type='image' for still images or 'keyframe' for video frames. Returns a list of matching image/keyframe results (each with asset ID, dataset ID, asset type, and for keyframes: video ID, composite slice ID and composite type) plus a list of video IDs aggregated from matching keyframes. Optional moderation scores are included per asset if requested.
Authentication
AuthorizationBearer
Bearer authentication of the form Bearer <token>, where token is your auth token.
Request
This endpoint expects an object.
dataset_idstringRequiredformat: "uuid"
The unique identifier for the dataset
text_querystringRequired
The text query to search for
asset_typeenumRequired
Visual asset type to search over. For datasets using the native video data model, only 'keyframe' is supported.
Allowed values:
offsetintegerOptional0-9223372036854776000Defaults to 0
Starting index to return
limitintegerOptional1-1000Defaults to 100
Max number of items to return
metadata_filtersobject or nullOptional
List of metadata filters to apply to the search
person_idslist of strings or nullOptional
List of person IDs to filter results by. Currently only a single person_id is supported.
person_names_or_aliaseslist of strings or nullOptional
List of person names/aliases to filter results by. Currently only a single person name is supported.
Response
Successful Response
datalist of objects
The search results matching the person filter
videoslist of objects
Search results aggregated by video, ordered by best matching keyframe score
Executes a combined semantic and person-filtered search that takes a natural language text query (e.g., ‘Jane Doe at a press conference’ or ‘John Smith speaking at a podium’) and searches within a dataset to find images or video keyframes that both match the text query AND contain a specific enrolled person. The person must be enrolled in the organization for detection in images and video frames. You must specify the person via either person_ids (unique identifiers) OR person_names_or_aliases (names/aliases of enrolled persons) - exactly one is required, not both. The text query is encoded using the dataset’s vision-language encoder (CLIP, SigLIP, Perception Encoder, etc.) and searched via vector similarity, filtered to assets where the specified person appears. Works with both legacy and native video model datasets. Use asset_type=‘image’ for still images or ‘keyframe’ for video frames. Returns a list of matching image/keyframe results (each with asset ID, dataset ID, asset type, and for keyframes: video ID, composite slice ID and composite type) plus a list of video IDs aggregated from matching keyframes. Optional moderation scores are included per asset if requested.