media

media / VideoTagSequencer / 0.2.0

README.md

Table of Contents

Introduction

This algorithm takes the JSON output from Video Metadata Extraction and converts it into a time based index of tagged sequences for easy semantic search and filtering of video content.

note: This algorithm is designed to work with output from Video Metadata Extraction, if you edit or alter your JSON file this may not function properly.

another note: This algorithm doesn't work with the following algorithms as they return multiple results per frame::

I/O

Input

{  
  "source": String,
	"traversal_path": String or {},
  "tag_key":String or [],
  "confidence_key": String,
  "minimum_confidence": Float,
  "minimum_sequence_length": Int,
	"filter": String or [],
	"top_n": Int
}
  • source - (required) - a data connector (data://, s3://, dropbox://, etc) URL pointing to a JSON point data file generated by Video Metadata Extraction.
  • traversal_path - (required) - the json template traversal path to the chosen image extraction algorithm's prediction elements, see examples below. The 'root' path must contain the '$ROOT' keyword, if no traversal is necessary to get to either the prediction array or singleton prediction, just pass '$ROOT'
  • tag_key - (required) - the JSON key within the prediction element that defines the tag, label, or annotation of the prediction; see examples below. If an array is provided, will compile all tag keys into a single composite key for sequencing
  • confidence_key - (required) - the JSON key within the prediction element that defines the confidence or probability of the prediction; see examples below.
  • minimum_confidence - (optional) - defines the threshold for the minimum allowed confidence in a sequence, any sequence below this theshold is filtered out. defaults to 0.5
  • minimum_sequence_length - (optional) - defines the threshold for the minimum allowed sequence length (aka sequential frames), any sequence shorter than this value is filtered out. defaults to 10
  • filter - (optional) - if provided defines a tag whitelist, removing all other tagged sequences from the output. May be a string, a list of strings, or a list of JSON objects, where the list of JSON objects is used when an image extraction prediction contains multiple labels, see examples below. if not provided, no filters are used and all valid tagged sequences are returned
  • top_n - (optimal) - the number maximum number of sequences to return for each tag, change this if you expect to have many sequences per tag. defaults to 5

Output

[  
  {  
    'tag': Json Object,
    'sequences':[  
      {  
        'number_of_frames':Int,
        'stopTime':Float,
        'startTime': Float,
        'mode':Float,
        'mean': Float
      }
			...
			for each sequence
			...
    ]
  },
	...
	for each annotation
	...
]
  • tag - the tag for the following sequences, this is a JSON object as it may contain multiple key/value pairs when an extraction algorithm has that format, see examples below.
  • sequences - the array of valid sequences for the tag.
  • number_of_frames - the number of frames in the sequence.
  • stop_time - the end of the sequence from the start of the video, in seconds.
  • start_time - the start of the sequence from the start of the video, in seconds.
  • mode - the sequence's confidence mode for the sequence.
  • mean - the sequence's confidence mean for the sequence.

Examples

nudity detection

The nudity detection model returns a simple tag key named nude and a confidence key named confidence.

https://gist.github.com/zeryx/7edd8f202ddc6b7c1a5d5af216ca7f25

Input

{  
  "source":"data://media/extractions/massage701_nudity.json",
  "tag_key":"nude",
  "confidence_key":"confidence",
  "traversal_path":"$ROOT",
  "minimum_confidence":0.65,
  "minimum_sequence_length":8
}

Output

[  
  {  
    "tag":{  
      "nude":True
    },
    "sequences":[  
      {  
        "start_time":8.9589498,
        "mode":1.0,
        "number_of_frames":15,
        "mean":0.9570593279515379,
        "stop_time":11.746178626666666
      }
    ]
  }
]

Make and model Car Classifier

The make and model car classifier is a multi key, single tag prediction model, which means that we need to define all possible tag keys in the tag_key, however since there's no traversals necessary to get to the prediction iterable, we can still say that root is $ROOT.

https://gist.github.com/zeryx/6f916959eb0e0ba63de596b0be84dfa4

Input

{  
  "source":"data://media/extractions/bus_video_car_detection.json",
  "tag_key":[  
    "body_style",
    "make",
    "model",
    "model_year"
  ],
  "confidence_key":"confidence",
  "traversal_path":"$ROOT",
  "minimum_confidence":0.45,
  "minimum_sequence_length":8
}

Output

[
  {
    "sequences": [
      {
        "mean": 0.8335714285714285,
        "mode": 1,
        "number_of_frames": 13,
        "start_time": 16.725041666666666,
        "stop_time": 17.22554166666667
      }
    ],
    "tag": {
      "body_style": "Hatchback",
      "make": "Smart",
      "model": "Forfour",
      "model_year": "2014"
    }
  },
  {
    "sequences": [
      {
        "mean": 0.9266666666666666,
        "mode": 0.98,
        "number_of_frames": 11,
        "start_time": 3.878875,
        "stop_time": 4.295958333333333
      }
    ],
    "tag": {
      "body_style": "Sedan",
      "make": "Audi",
      "model": "S6",
      "model_year": "2011"
    }
  },
  {
    "sequences": [
      {
        "mean": 0.7682857142857143,
        "mode": 0.7,
        "number_of_frames": 34,
        "start_time": 8.425083333333333,
        "stop_time": 9.801458333333333
      },
      {
        "mean": 0.736,
        "mode": 0.78,
        "number_of_frames": 9,
        "start_time": 7.924583333333333,
        "stop_time": 8.25825
      }
    ],
    "tag": {
      "body_style": "Convertible",
      "make": "Bugatti",
      "model": "Veyron 16.4",
      "model_year": "2009"
    }
  }
]

Places 365 Classifier

The places 365 classifier model uses a single tag key and confidence key, however it's top n predictions array requires a traversal to get, which is why root is defined as a JSON object with a single key/value pair; if the predictions array required traversing through another json object, the root would contain that object as well.

https://gist.github.com/zeryx/6d485f30b979548153f0fec1077dc210

Input

{  
  "source":"data://media/extractions/kenny_places365.json",
  "tag_key":"class",
  "confidence_key":"prob",
  "traversal_path":{  
    "predictions":"$ROOT"
  },
  "minimum_confidence":0.25,
  "minimum_sequence_length":5
}

Output

[
  {
    "sequences": [
      {
        "mean": 0.44347607096036273,
        "mode": 0.29036399722099304,
        "number_of_frames": 8,
        "start_time": 8.19706711409396,
        "stop_time": 9.935838926174496
      },
      {
        "mean": 0.383247903415135,
        "mode": 0.267714262008667,
        "number_of_frames": 6,
        "start_time": 16.145738255033557,
        "stop_time": 17.38771812080537
      }
    ],
    "tag": {"class": "cliff"}
  },
  {
    "sequences": [
      {
        "mean": 0.6472491789609194,
        "mode": 0.3923141360282898,
        "number_of_frames": 15,
        "start_time": 3.477543624161074,
        "stop_time": 6.955087248322148
      }
    ],
    "tag": {"class": "waterfall"}
  },
  {
    "sequences": [
      {
        "mean": 0.5122596791812352,
        "mode": 0.39176857471466064,
        "number_of_frames": 6,
        "start_time": 32.29147651006711,
        "stop_time": 33.53345637583893
      }
    ],
    "tag": {"class": "rice_paddy"}
  }