Agentic Search

Agentic Search is a conversational way to find media in your Coactive workspace. Instead of writing precise filters, describe what you’re looking for in natural language. The system interprets your intent and searches across your image and video datasets to surface the most relevant results.

What You Can Do

  • Search across your image and video datasets in a single query
  • Search spoken content in videos using transcripts (semantic or exact match)
  • Steer results away from things you don’t want with negative search
  • Use time-based context in your queries (e.g., a specific year)
  • Search by image—upload a reference image to find similar images or keyframes
  • Drill into a video to find the most relevant shots within it

How to Access

When you log in to Coactive, you land on the Agentic Search experience, which queries both your image and video datasets in parallel. You can also use Agentic Search from inside a specific dataset—in that case, it searches only that dataset.

Core Capabilities

Search Across Images and Videos

When you search from the top-level view, your query runs against both an image dataset and a video dataset together. Results come back grouped into separate Images and Videos tabs. The system automatically picks up modality cues in your query—if you say “show me images of…” it will favor images; if you don’t specify, it searches both.

Example queries:

  • show me videos of a press conference at the white house
  • protesters marching in Paris
  • show me videos of trump meeting other world leaders

You can search the spoken content of your videos in two ways:

  • Semantic transcript search—finds moments where someone is talking about a concept, even if they don’t use the exact words.
  • Exact transcript match—finds the precise word or phrase as spoken.

Example queries:

  • videos where someone talks about interest rate hikes (semantic)
  • videos where someone says “breaking news” (exact match)

You can tell the system to steer away from something while keeping your main concept.

Example queries:

  • show me videos of crowds but not sports crowds
  • show me videos of aerial shots of cities, not at night
How negative search works

Negative search is a steer, not a hard filter—occasionally an excluded item may still appear in the results. It works best when the thing you’re excluding is visually distinct (a color, an object, a clear scene type).

Agentic search can use metadata implicitly without requiring users to write filters manually.

Example queries:

  • videos of US presidential candidates giving speeches at rallies in 2024
  • images of wildfires in 2025
Metadata-aware ranking

Works best with structured metadata fields (e.g., date/time like version_created). Reliably extracting specific concepts from a paragraph-length metadata field (e.g., identifying “flood in Spain” in a long caption) can introduce high latency and reduce reliability due to variability in how the information is written. Metadata-aware queries in general may have higher latency.

Upload an image, and the system finds similar images and video keyframes in your datasets. This is useful when you have a reference shot and want to find more like it.

Note

Combining an uploaded image with a text query in the same search isn’t supported.

Video Drill-Down

After a video search returns results, you can ask follow-ups to dig into a specific video and find the most relevant shots inside it, or find similar moments in a different video.

Example queries:

  • show me the most relevant shots in the top result
  • find similar moments in other videos

The system keeps your earlier search context when drilling in.

Tips for Best Results

  • Be specific about modality if you have a preference—“show me videos of…” or “show me images of…”
  • Use visually descriptive language—what would you see on screen?
  • Include time references when relevant—“in 2025”, “from last December”
  • Iterate—if results aren’t what you expected, refine and follow up
  • Keep queries in English—Agentic Search is English-only today

Tips for Best Performance

  • English only for now
  • A single dataset can hold either images or videos, not both (cross-dataset search still works)
  • Negative search is a soft steer, not a hard exclusion
  • Metadata-aware ranking uses content timestamps; concepts embedded in long-form caption/headline text aren’t reliably extracted yet
  • Image upload and text query can’t be combined in a single search
  • For the fastest results, start a new chat when switching to a new search topic