Agentic Search | Coactive AI

Agentic Search is a conversational way to find media in your Coactive workspace. Instead of writing precise filters, describe what you’re looking for in natural language. The system interprets your intent and searches across your image and video datasets to surface the most relevant results.

What You Can Do

Search across your image and video datasets in a single query
Search spoken content in videos using transcripts (semantic or exact match)
Steer results away from things you don’t want with negative search
Use time-based context in your queries (e.g., a specific year)
Search by image—upload a reference image to find similar images or keyframes
Drill into a video to find the most relevant shots within it

How to Access

When you log in to Coactive, you land on the Agentic Search experience, which queries both your image and video datasets in parallel. You can also use Agentic Search from inside a specific dataset—in that case, it searches only that dataset.

Core Capabilities

Search Across Images and Videos

When you search from the top-level view, your query runs against both an image dataset and a video dataset together. Results come back grouped into separate Images and Videos tabs. The system automatically picks up modality cues in your query—if you say “show me images of…” it will favor images; if you don’t specify, it searches both.

Example queries:

show me videos of a press conference at the white house
protesters marching in Paris
show me videos of trump meeting other world leaders

Transcript Search

You can search the spoken content of your videos in two ways:

Semantic transcript search—finds moments where someone is talking about a concept, even if they don’t use the exact words.
Exact transcript match—finds the precise word or phrase as spoken.

Example queries:

videos where someone talks about interest rate hikes (semantic)
videos where someone says “breaking news” (exact match)

Negative Search

You can tell the system to steer away from something while keeping your main concept.

Example queries:

show me videos of crowds but not sports crowds
show me videos of aerial shots of cities, not at night

How negative search works

Negative search is a steer, not a hard filter—occasionally an excluded item may still appear in the results. It works best when the thing you’re excluding is visually distinct (a color, an object, a clear scene type).

Metadata-Aware Search

Agentic search can use metadata implicitly without requiring users to write filters manually.

Example queries:

videos of US presidential candidates giving speeches at rallies in 2024
images of wildfires in 2025

Metadata-aware ranking

Works best with structured metadata fields (e.g., date/time like version_created). Reliably extracting specific concepts from a paragraph-length metadata field (e.g., identifying “flood in Spain” in a long caption) can introduce high latency and reduce reliability due to variability in how the information is written. Metadata-aware queries in general may have higher latency.

Image-Based Search

Image-to-image search performs visual similarity retrieval against the uploaded image and is optimized to return results that are visually similar to the source image.

Because retrieval is based on overall image similarity, results are typically most effective for finding related or near-duplicate images rather than retrieving every appearance of a specific individual across a dataset.

For example, uploading a photo of a public figure at a press conference may return visually similar images from the same event or scene, while not necessarily surfacing all images containing that individual across different settings or contexts.

For workflows focused on retrieving media containing a specific enrolled individual across broader visual contexts, Celebrity Detection is the recommended retrieval workflow.

Note

Combining an uploaded image with a text query in the same search isn’t supported.

Video Drill-Down

After a video search returns results, you can ask follow-ups to dig into a specific video and find the most relevant shots inside it, or find similar moments in a different video.

Example queries:

show me the most relevant shots in the top result
find similar moments in other videos

The system keeps your earlier search context when drilling in.

Safety and Moderation

Please note that the agentic search LLM includes built-in safety and moderation guardrails that may impact certain queries or generated responses. In some cases, specific wording or phrasing may trigger moderation systems and result in messages such as “Unable to complete this request”, even when the user intent is benign. Rephrasing queries using more neutral or descriptive language can often improve retrieval behavior.

Tips for Best Results

Be specific about modality if you have a preference—“show me videos of…” or “show me images of…”
Use visually descriptive language—what would you see on screen?
Include time references when relevant—“in 2025”, “from last December”
Iterate—if results aren’t what you expected, refine and follow up
Keep queries in English—Agentic Search is English-only today

Tips for Best Performance

English only for now
A single dataset can hold either images or videos, not both (cross-dataset search still works)
Negative search is a soft steer, not a hard exclusion
Metadata-aware ranking uses content timestamps; concepts embedded in long-form caption/headline text aren’t reliably extracted yet
Image upload and text query can’t be combined in a single search
For the fastest results, start a new chat when switching to a new search topic