deeplearning / ObjectDetectionCOCO / 0.3.0
Object Detection - COCO
This algorithm is able to discover not only what's in an image, but where it is too! It discovers the location within an image and generates a bounding box annotation. This algorithm uses the latest pre-trained models from Google's Tensorflow object detection project. All available models were all trained on the Common Objects in Context Dataset.
A list of all available labels is below.
Table of Contents
I/O
Input
{
"image": String,
"output": String,
"max_boxes": Integer,
"min_score": Float,
"model": String
}
- image - (required) - a hosted image file, may be a web url (
http
,https
) or a data connector URI (data://
,s3://
, etc). - output - (optional) - the output data connector URI (
data://
,s3://
, etc) for the resultant annotated image. If output is not provided, only the bounding box data is returned. - max_boxes - (optional) - the maximum number of bounding boxes to return in the results. If max_boxes is not defined, it defaults to 20
- min_score - (optional) - the minimum score threshold for bounding box annotations, if a prediction's confidence is less than this minimum, it's not returned in the results. _If min_score is not defined, it defaults to 0.5
- models - (optional) - the pre-trained object detection model to use, may be any of the following:
Model name | compute time per image | COCO accuracy (mAP) |
---|---|---|
ssd_mobilenet_v1 | 4.78 | 21 |
ssd_inception_v2 | 8.75 | 24 |
rfcn_resnet101 | 10.25 | 30 |
faster_rcnn_resnet101 | 11.05 | 32 |
faster_rcnn_inception_resnet_v2_atrous | 16.78 | 37 |
when model is not defined, defaults to ssd_mobilenet_v1
Alternatively you can just pass a url directly to the algorithm as a string.
Output
{
"image": String,
"boxes": [
{
"coordinates": {
"x0": Float,
"y0": Float,
"x1": Float,
"y1": Float
},
"label": String,
"confidence": Float
},
...
]
}
- image - The bounding box annotated image (if
output
was defined) data connector URI. - boxes - a list of all detected objects and their bounding boxes.
- coordinates - the
absolute
cartesian coordinates of the bounding box found in the specimen image. - label - the predicted label/class for the detected object
- confidence - the confidence of the class prediction (0 -> 1)
Examples
Example 1 - Street car
Input
{
"image":"http://i.imgur.com/k67kjlB.jpg",
"output":"data://.algo/temp/streetcar.png",
"model":"ssd_inception_v2"
}
Output
{
"boxes":[
{
"confidence":0.9845596551895142,
"coordinates":{
"x0":276,
"x1":490,
"y0":170,
"y1":386
},
"label":"train"
},
{
"confidence":0.8447293043136597,
"coordinates":{
"x0":36,
"x1":132,
"y0":280,
"y1":314
},
"label":"car"
},
{
"confidence":0.7735579609870911,
"coordinates":{
"x0":520,
"x1":555,
"y0":273,
"y1":305
},
"label":"car"
}
],
"image":"data://.algo/temp/streetcar.png"
}
Example 2 - Dog Park
Input
{
"image":"http://i.imgur.com/1IWZX69.jpg",
"output":"data://.algo/temp/dog_park.png",
"min_score":0.7,
"model":"faster_rcnn_resnet101"
}
Output
{
"boxes":[
{
"confidence":0.9997156262397766,
"coordinates":{
"x0":1484,
"x1":1675,
"y0":221,
"y1":655
},
"label":"person"
},
{
"confidence":0.999693751335144,
"coordinates":{
"x0":1456,
"x1":1752,
"y0":651,
"y1":922
},
"label":"dog"
},
{
"confidence":0.9996084570884703,
"coordinates":{
"x0":478,
"x1":634,
"y0":895,
"y1":1041
},
"label":"dog"
},
{
"confidence":0.9993320107460022,
"coordinates":{
"x0":308,
"x1":508,
"y0":721,
"y1":954
},
"label":"dog"
},
{
"confidence":0.9990590214729308,
"coordinates":{
"x0":105,
"x1":264,
"y0":449,
"y1":988
},
"label":"person"
},
{
"confidence":0.9984574317932128,
"coordinates":{
"x0":775,
"x1":1122,
"y0":735,
"y1":1103
},
"label":"dog"
},
{
"confidence":0.9961158037185668,
"coordinates":{
"x0":644,
"x1":793,
"y0":529,
"y1":781
},
"label":"person"
},
{
"confidence":0.9699774980545044,
"coordinates":{
"x0":782,
"x1":833,
"y0":518,
"y1":660
},
"label":"person"
},
{
"confidence":0.9487058520317078,
"coordinates":{
"x0":1056,
"x1":1124,
"y0":506,
"y1":652
},
"label":"person"
},
{
"confidence":0.9246744513511658,
"coordinates":{
"x0":1005,
"x1":1088,
"y0":589,
"y1":739
},
"label":"person"
},
{
"confidence":0.737799882888794,
"coordinates":{
"x0":344,
"x1":508,
"y0":781,
"y1":945
},
"label":"dog"
}
],
"image":"data://.algo/temp/dog_park.png"
}
Labels
The dataset that this algorithm was trained on has 90 possible labels, here's a list for easy reference:
- person
- bicycle
- car
- motorcycle
- airplane
- bus
- train
- truck
- boat
- traffic light
- fire hydrant
- stop sign
- parking meter
- bench
- bird
- cat
- dog
- horse
- sheep
- cow
- elephant
- bear
- zebra
- giraffe
- backpack
- umbrella
- handbag
- tie
- suitcase
- frisbee
- skis
- snowboard
- sports ball
- kite
- baseball bat
- baseball glove
- skateboard
- surfboard
- tennis racket
- bottle
- wine glass
- cub
- fork
- knife
- spoon
- bowl
- banana
- apple
- sandwich
- orange
- broccoli
- carrot
- hot dog
- pizza
- donut
- cake
- chair
- couch
- potted plant
- bed
- dining table
- toilet
- tv
- laptop
- mouse
- remote
- keyboard
- cell phone
- microwave
- oven
- toaster
- sink
- refigerator
- book
- clock
- vase
- scissors
- teddy bear
- hair drier
- toothbrush
Credits
This algorithm uses modified code from the object detection module from the Google Tensorflow project.
All images sourced from the wikimedia foundation with the creative commons license