Analyzing Dynamic Tags with SQL
Understanding the distribution of dynamic tags in a dataset is crucial for metadata generation, content categorization, and trend analysis. This tutorial will guide you through SQL queries that help you:
- Analyze the distribution of dynamic tags to determine the most frequent categories.
- Retrieve the top tags for each image to identify key labels in your dataset.
- Retrieve the top tags for each video to generate meaningful metadata at the video level.
By the end of this tutorial, you’ll be able to extract structured insights from dynamic tag metadata using SQL.
Analyze the Distribution of Dynamic Tags in a Dataset
This query calculates how often each dynamic tag appears across the dataset, helping you understand category distribution.
Explanation
-
CTE (ranked): Assigns a rank to each dynamic tag within every image based on the tag’s score.
-
CTE (image_tag_table): Filters the ranked results to include:
- The top 3 tags per image (
rn <= 3
). - Tags with scores above a configurable threshold (dynamic_tag_score > 0.01)
- The top 3 tags per image (
-
Final Query: Counts how many unique images contain each dynamic tag.
Use Cases
- Identify most common categories in a dataset.
- Prioritize frequently occurring tags for search and recommendation systems.
Retrieve the Top Tags for Each Image
This query retrieves the most significant dynamic tags for each image, helping with content categorization.
Explanation
-
CTE (video_only): Ensures the analysis focuses on standalone images by excluding video-related data in the next step, which is not necessary if your dataset only contains images.
-
CTE (ranked): Assigns a rank to each dynamic tag for every image (coactive_image_id), based on the tag’s score.
-
Final Query: Filters results to include only:
- The top 5 tags for each image (
rn <= 5
). - Tags with a score above 0.001 to exclude irrelevant or low-confidence tags.
- The top 5 tags for each image (
Use Cases
- Enable image-based metadata tagging for search and filtering.
- Improve content organization by identifying dominant tags in images.
Retrieve the Top Tags for Each Video
This query retrieves the most significant dynamic tags for each video, helping structure video metadata.
Explanation
-
CTE (video_only): Filters to keep only valid video data.
-
CTE (ranked): Ranks tags within each video by confidence score.
-
CTE (distinct_tags): Ensures that each tag appears only once per video, keeping the highest confidence score.
-
CTE (video_tag_table): Retrieves the top 5 tags per video.
Use Cases
- Enhance video metadata for search and categorization.
- Provide a structured summary of video content for analysis.