Analyzing Dynamic Tags with SQL | Coactive AI

Understanding the distribution of dynamic tags in a dataset is crucial for metadata generation, content categorization, and trend analysis. This tutorial will guide you through SQL queries that help you:

Analyze the distribution of dynamic tags to determine the most frequent categories.
Retrieve the top tags for each image to identify key labels in your dataset.
Retrieve the top tags for each video to generate meaningful metadata at the video level.

By the end of this tutorial, you’ll be able to extract structured insights from dynamic tag metadata using SQL.

Note: This tutorial uses the V3 Dynamic Tags table schema. If you’re using an older version, table names may differ (e.g., group_[name] instead of dt_[group_name]_visual).

Analyze the Distribution of Dynamic Tags in a Dataset

This query calculates how often each dynamic tag appears across the dataset, helping you understand category distribution.

1 -- Step 1: Rank dynamic tags for each image based on their scores
2 WITH ranked AS (
3     SELECT
4         coactive_image_id,        -- The unique ID for each image
5         tag_name,                 -- The tag assigned to the image
6         score,                    -- The confidence score for the tag
7         ROW_NUMBER() OVER (
8             PARTITION BY coactive_image_id  -- Separate ranking for each image
9             ORDER BY score DESC             -- Rank tags by confidence score in descending order
10         ) AS rn                   -- Assign a rank to each tag per image
11     FROM 
12         dt_sports_visual          -- V3 Dynamic Tags visual table
13 ),
14 
15 -- Step 2: Filter to keep the most relevant tags for each image
16 image_tag_table AS (
17     SELECT
18         coactive_image_id,        -- Image ID
19         tag_name,                 -- Tag name
20         score                     -- Confidence score for the tag
21     FROM 
22         ranked
23     WHERE 
24         rn <= 3                   -- Keep only the top 3 tags per image
25         AND score > 0.01          -- Exclude low-confidence tags (score <= 0.01)
26 )
27 
28 -- Step 3: Aggregate tag distribution across the dataset
29 SELECT 
30     tag_name,                     -- Tag name
31     COUNT(DISTINCT coactive_image_id) AS image_count  -- Count the number of unique images with this tag
32 FROM 
33     image_tag_table
34 GROUP BY 
35     tag_name                      -- Group results by tag name
36 ORDER BY 
37     image_count DESC;            -- Order by the frequency of each tag in descending order

Explanation

CTE (ranked): Assigns a rank to each dynamic tag within every image based on the tag’s score.
CTE (image_tag_table): Filters the ranked results to include:
- The top 3 tags per image (rn <= 3).
- Tags with scores above a configurable threshold (dynamic_tag_score > 0.01)
Final Query: Counts how many unique images contain each dynamic tag.

Use Cases

Identify most common categories in a dataset.
Prioritize frequently occurring tags for search and recommendation systems.

Retrieve the Top Tags for Each Image

This query retrieves the most significant dynamic tags for each image, helping with content categorization.

1 -- Step 1: Join with core table to get video association information
2 WITH video_info AS (
3     SELECT DISTINCT 
4         dt.coactive_image_id,
5         t.coactive_video_id
6     FROM dt_sports_visual dt
7     JOIN coactive_table t ON dt.coactive_image_id = t.coactive_image_id
8     WHERE t.coactive_video_id IS NOT NULL -- Filter for video keyframes
9 ),
10 
11 -- Step 2: Rank tags for each image, excluding video keyframes for standalone image analysis
12 ranked AS (
13     SELECT
14         dt.coactive_image_id,
15         dt.tag_name,
16         dt.score,
17         ROW_NUMBER() OVER (
18             PARTITION BY dt.coactive_image_id -- Rank tags for each image
19             ORDER BY dt.score DESC -- Highest scoring tags first
20         ) AS rn
21     FROM dt_sports_visual dt
22     WHERE dt.coactive_image_id NOT IN (SELECT coactive_image_id FROM video_info) -- Exclude video keyframes
23 ),
24 
25 -- Step 3: Filter to retain the top 5 tags per image
26 image_tag_table AS (
27     SELECT
28         coactive_image_id,
29         tag_name,
30         score
31     FROM ranked
32     WHERE rn <= 5 -- Keep only the top 5 tags per image
33           AND score > 0.001 -- Exclude low-confidence tags
34 )
35 
36 -- Final Output: Retrieve the most significant tags for each image
37 SELECT 
38     coactive_image_id,
39     tag_name,
40     score
41 FROM image_tag_table;

Explanation

CTE (video_only): Ensures the analysis focuses on standalone images by excluding video-related data in the next step, which is not necessary if your dataset only contains images.
CTE (ranked): Assigns a rank to each dynamic tag for every image (coactive_image_id), based on the tag’s score.
Final Query: Filters results to include only:
- The top 5 tags for each image (rn <= 5).
- Tags with a score above 0.001 to exclude irrelevant or low-confidence tags.

Use Cases

Enable image-based metadata tagging for search and filtering.
Improve content organization by identifying dominant tags in images.

Retrieve the Top Tags for Each Video

This query retrieves the most significant dynamic tags for each video, helping structure video metadata.

1 -- Step 1: Get video associations from keyframe data
2 WITH video_keyframes AS (
3     SELECT * 
4         dt.coactive_image_id,
5         dt.tag_name,
6         dt.score,
7         t.coactive_video_id
8     FROM dt_sports_visual dt
9     JOIN coactive_table t ON dt.coactive_image_id = t.coactive_image_id
10     WHERE t.coactive_video_id IS NOT NULL -- Ensure valid video data
11 ),
12 
13 -- Step 2: Rank tags for each keyframe within each video
14 ranked AS (
15     SELECT
16         coactive_video_id,
17         coactive_image_id,
18         tag_name,
19         score,
20         ROW_NUMBER() OVER (
21             PARTITION BY coactive_video_id, coactive_image_id -- Rank tags separately for each keyframe within a video
22             ORDER BY score DESC -- Highest scoring tags first
23         ) AS rn
24     FROM video_keyframes
25 ),
26 
27 -- Step 3: Remove duplicates for tags within each video
28 distinct_tags AS (
29     SELECT
30         coactive_video_id,
31         coactive_image_id,
32         tag_name,
33         score,
34         ROW_NUMBER() OVER (
35             PARTITION BY coactive_video_id, tag_name -- Rank each tag within the same video
36             ORDER BY score DESC -- Keep the highest score for each tag
37         ) AS tag_rank
38     FROM ranked
39     WHERE rn <= 5 -- Keep only the top 5 tags per keyframe
40           AND score > 0.001 -- Exclude low-confidence tags
41 ),
42 
43 -- Step 4: Rank tags across the video to find the most significant ones
44 video_tag_table AS (
45     SELECT
46         coactive_video_id,
47         coactive_image_id,
48         tag_name,
49         score,
50         DENSE_RANK() OVER (
51             PARTITION BY coactive_video_id -- Rank tags globally for the video
52             ORDER BY score DESC
53         ) AS rn_video
54     FROM distinct_tags
55     WHERE tag_rank = 1 -- Include only the best instance of each tag
56 ),
57 
58 -- Step 5: Select the top 5 tags for each video
59 video_top_tag_table AS (
60     SELECT * 
61     FROM video_tag_table
62     WHERE rn_video <= 5 -- Limit to the top 5 tags per video
63     ORDER BY coactive_video_id, score DESC
64 )
65 
66 -- Final Output: Retrieve the most significant tags for each video
67 SELECT * 
68 FROM video_top_tag_table;

Explanation

CTE (video_only): Filters to keep only valid video data.
CTE (ranked): Ranks tags within each video by confidence score.
CTE (distinct_tags): Ensures that each tag appears only once per video, keeping the highest confidence score.
CTE (video_tag_table): Retrieves the top 5 tags per video.

Use Cases

Enhance video metadata for search and categorization.
Provide a structured summary of video content for analysis.