Analyzing Dynamic Tags with SQL

Understanding the distribution of dynamic tags in a dataset is crucial for metadata generation, content categorization, and trend analysis. This tutorial will guide you through SQL queries that help you:

  • Analyze the distribution of dynamic tags to determine the most frequent categories.
  • Retrieve the top tags for each image to identify key labels in your dataset.
  • Retrieve the top tags for each video to generate meaningful metadata at the video level.

By the end of this tutorial, you’ll be able to extract structured insights from dynamic tag metadata using SQL.

Analyze the Distribution of Dynamic Tags in a Dataset

This query calculates how often each dynamic tag appears across the dataset, helping you understand category distribution.

1-- Step 1: Rank dynamic tags for each image based on their scores
2WITH ranked AS (
3 SELECT
4 coactive_video_id, -- The ID of the video the image belongs to (if applicable)
5 coactive_image_id, -- The unique ID for each image
6 dynamic_tag, -- The tag assigned to the image
7 dynamic_tag_score, -- The confidence score for the tag
8 ROW_NUMBER() OVER (
9 PARTITION BY coactive_image_id -- Separate ranking for each image
10 ORDER BY dynamic_tag_score DESC -- Rank tags by confidence score in descending order
11 ) AS rn -- Assign a rank to each tag per image
12 FROM
13 category_sports_visual -- Source table containing image-level dynamic tags
14),
15
16-- Step 2: Filter to keep the most relevant tags for each image
17image_tag_table AS (
18 SELECT
19 coactive_image_id, -- Image ID
20 coactive_video_id, -- Video ID (if applicable)
21 dynamic_tag, -- Tag name
22 dynamic_tag_score -- Confidence score for the tag
23 FROM
24 ranked
25 WHERE
26 rn <= 3 -- Keep only the top 3 tags per image
27 AND dynamic_tag_score > 0.01 -- Exclude low-confidence tags (score <= 0.01)
28)
29
30-- Step 3: Aggregate tag distribution across the dataset
31SELECT
32 dynamic_tag, -- Tag name
33 COUNT(DISTINCT coactive_image_id) AS image_count -- Count the number of unique images with this tag
34FROM
35 image_tag_table
36GROUP BY
37 dynamic_tag -- Group results by tag name
38ORDER BY
39 image_count DESC; -- Order by the frequency of each tag in descending order

Explanation

  1. CTE (ranked): Assigns a rank to each dynamic tag within every image based on the tag’s score.

  2. CTE (image_tag_table): Filters the ranked results to include:

    • The top 3 tags per image (rn <= 3).
    • Tags with scores above a configurable threshold (dynamic_tag_score > 0.01)
  3. Final Query: Counts how many unique images contain each dynamic tag.

Use Cases

  • Identify most common categories in a dataset.
  • Prioritize frequently occurring tags for search and recommendation systems.

Retrieve the Top Tags for Each Image

This query retrieves the most significant dynamic tags for each image, helping with content categorization.

1-- Step 1: Identify all images from datasets with both videos and images
2WITH video_only AS (
3 SELECT DISTINCT coactive_image_id
4 FROM category_sports_visual
5 WHERE coactive_shot_id IS NOT NULL AND coactive_shot_id != '' -- Filter for valid video data
6),
7
8-- Step 2: Rank tags for each image, excluding those associated with videos
9ranked AS (
10 SELECT
11 tags.coactive_image_id,
12 dynamic_tag,
13 dynamic_tag_score,
14 ROW_NUMBER() OVER (
15 PARTITION BY coactive_image_id -- Rank tags for each image
16 ORDER BY dynamic_tag_score DESC -- Highest scoring tags first
17 ) AS rn
18 FROM category_sports_visual AS tags
19 WHERE tags.coactive_image_id
20NOT IN (SELECT coactive_image_id FROM video_only) -- Exclude video-related images
21),
22
23-- Step 3: Filter to retain the top 5 tags per image
24image_tag_table AS (
25 SELECT
26 coactive_image_id,
27 dynamic_tag,
28 dynamic_tag_score
29 FROM ranked
30 WHERE rn <= 5 -- Keep only the top 5 tags per image
31 AND dynamic_tag_score > 0.001 -- Exclude low-confidence tags
32)
33
34-- Final Output: Retrieve the most significant tags for each image
35SELECT
36 coactive_image_id,
37 dynamic_tag,
38 dynamic_tag_score
39FROM image_tag_table;

Explanation

  1. CTE (video_only): Ensures the analysis focuses on standalone images by excluding video-related data in the next step, which is not necessary if your dataset only contains images.

  2. CTE (ranked): Assigns a rank to each dynamic tag for every image (coactive_image_id), based on the tag’s score.

  3. Final Query: Filters results to include only:

    • The top 5 tags for each image (rn <= 5).
    • Tags with a score above 0.001 to exclude irrelevant or low-confidence tags.

Use Cases

  • Enable image-based metadata tagging for search and filtering.
  • Improve content organization by identifying dominant tags in images.

Retrieve the Top Tags for Each Video

This query retrieves the most significant dynamic tags for each video, helping structure video metadata.

1-- Step 1: Filter for video-specific data only
2WITH video_only AS (
3 SELECT *
4 FROM category_sports_visual
5 WHERE coactive_shot_id IS NOT NULL AND coactive_shot_id != '' -- Ensure valid video data
6),
7
8-- Step 2: Rank tags for each video and image by score
9ranked AS (
10 SELECT
11 coactive_video_id,
12 coactive_image_id,
13 dynamic_tag,
14 dynamic_tag_score,
15 ROW_NUMBER() OVER (
16 PARTITION BY coactive_video_id, coactive_image_id -- Rank tags separately for each image within a video
17 ORDER BY dynamic_tag_score DESC -- Highest scoring tags first
18 ) AS rn
19 FROM video_only
20),
21
22-- Step 3: Remove duplicates for tags within each video
23distinct_tags AS (
24 SELECT
25 coactive_video_id,
26 coactive_image_id,
27 dynamic_tag,
28 dynamic_tag_score,
29 ROW_NUMBER() OVER (
30 PARTITION BY coactive_video_id, dynamic_tag -- Rank each tag within the same video
31 ORDER BY dynamic_tag_score DESC -- Keep the highest score for each tag
32 ) AS tag_rank
33 FROM ranked
34 WHERE rn <= 5 -- Keep only the top 5 tags per image
35 AND dynamic_tag_score > 0.001 -- Exclude low-confidence tags
36),
37
38-- Step 4: Rank tags across the video to find the most significant ones
39video_tag_table AS (
40 SELECT
41 coactive_video_id,
42 coactive_image_id,
43 dynamic_tag,
44 dynamic_tag_score,
45 DENSE_RANK() OVER (
46 PARTITION BY coactive_video_id -- Rank tags globally for the video
47 ORDER BY dynamic_tag_score DESC
48 ) AS rn_video
49 FROM distinct_tags
50 WHERE tag_rank = 1 -- Include only the best instance of each tag
51),
52
53-- Step 5: Select the top 5 tags for each video
54video_top_tag_table AS (
55 SELECT *
56 FROM video_tag_table
57 WHERE rn_video <= 5 -- Limit to the top 5 tags per video
58 ORDER BY coactive_video_id, dynamic_tag_score DESC
59)
60
61-- Final Output: Retrieve the most significant tags for each video
62SELECT *
63FROM video_top_tag_table;

Explanation

  1. CTE (video_only): Filters to keep only valid video data.

  2. CTE (ranked): Ranks tags within each video by confidence score.

  3. CTE (distinct_tags): Ensures that each tag appears only once per video, keeping the highest confidence score.

  4. CTE (video_tag_table): Retrieves the top 5 tags per video.

Use Cases

  • Enhance video metadata for search and categorization.
  • Provide a structured summary of video content for analysis.