deeplearning

deeplearning / ObjectDetectionCOCO / 0.2.1

README.md

Object Detection - COCO

This algorithm is able to discover not only what's in an image, but where it is too! It discovers the location within an image and generates a bounding box annotation. This algorithm uses the latest pre-trained models from Google's Tensorflow object detection project. All available models were all trained on the Common Objects in Context Dataset.

A list of all available labels is below.

Table of Contents

I/O

Input

{  
   "image": String,
   "output": String,
   "max_boxes": Integer,
   "min_score": Float,
   "model": String
}
  • image - (required) - a hosted image file, may be a web url (http, https) or a data connector URI (data://, s3://, etc).
  • output - (optional) - the output data connector URI (data://, s3://, etc) for the resultant annotated image. If output is not provided, only the bounding box data is returned.
  • max_boxes - (optional) - the maximum number of bounding boxes to return in the results. If max_boxes is not defined, it defaults to 20
  • min_score - (optional) - the minimum score threshold for bounding box annotations, if a prediction's confidence is less than this minimum, it's not returned in the results. _If min_score is not defined, it defaults to 0.5
  • models - (optional) - the pre-trained object detection model to use, may be any of the following:
Model namecompute time per imageCOCO accuracy (mAP)
ssd_mobilenet_v14.7821
ssd_inception_v28.7524
rfcn_resnet10110.2530
faster_rcnn_resnet10111.0532
faster_rcnn_inception_resnet_v2_atrous16.7837

when model is not defined, defaults to ssd_mobilenet_v1

Alternatively you can just pass a url directly to the algorithm as a string.

Output

{
    "image": String,
    "boxes": [
        {
            "coordinates": {
                "x0": Float,
                "y0": Float,
                "x1": Float,
                "y1": Float
            },
            "label": String,
            "confidence": Float
        },
    ...
    ]
}
  • image - The bounding box annotated image (if output was defined) data connector URI.
  • boxes - a list of all detected objects and their bounding boxes.
  • coordinates - the absolute cartesian coordinates of the bounding box found in the specimen image.
  • label - the predicted label/class for the detected object
  • confidence - the confidence of the class prediction (0 -> 1)

Examples

Example 1 - Street car

Input

{  
   "image":"http://i.imgur.com/k67kjlB.jpg",
   "output":"data://.algo/temp/streetcar.png",
   "model":"ssd_inception_v2"
}

Output

{  
   "boxes":[  
      {  
         "confidence":0.9845596551895142,
         "coordinates":{  
            "x0":276,
            "x1":490,
            "y0":170,
            "y1":386
         },
         "label":"train"
      },
      {  
         "confidence":0.8447293043136597,
         "coordinates":{  
            "x0":36,
            "x1":132,
            "y0":280,
            "y1":314
         },
         "label":"car"
      },
      {  
         "confidence":0.7735579609870911,
         "coordinates":{  
            "x0":520,
            "x1":555,
            "y0":273,
            "y1":305
         },
         "label":"car"
      }
   ],
   "image":"data://.algo/temp/streetcar.png"
}

Example 2 - Dog Park

Input

{  
   "image":"http://i.imgur.com/1IWZX69.jpg",
   "output":"data://.algo/temp/dog_park.png",
   "min_score":0.7,
   "model":"faster_rcnn_resnet101"
}

Output

{  
   "boxes":[  
      {  
         "confidence":0.9997156262397766,
         "coordinates":{  
            "x0":1484,
            "x1":1675,
            "y0":221,
            "y1":655
         },
         "label":"person"
      },
      {  
         "confidence":0.999693751335144,
         "coordinates":{  
            "x0":1456,
            "x1":1752,
            "y0":651,
            "y1":922
         },
         "label":"dog"
      },
      {  
         "confidence":0.9996084570884703,
         "coordinates":{  
            "x0":478,
            "x1":634,
            "y0":895,
            "y1":1041
         },
         "label":"dog"
      },
      {  
         "confidence":0.9993320107460022,
         "coordinates":{  
            "x0":308,
            "x1":508,
            "y0":721,
            "y1":954
         },
         "label":"dog"
      },
      {  
         "confidence":0.9990590214729308,
         "coordinates":{  
            "x0":105,
            "x1":264,
            "y0":449,
            "y1":988
         },
         "label":"person"
      },
      {  
         "confidence":0.9984574317932128,
         "coordinates":{  
            "x0":775,
            "x1":1122,
            "y0":735,
            "y1":1103
         },
         "label":"dog"
      },
      {  
         "confidence":0.9961158037185668,
         "coordinates":{  
            "x0":644,
            "x1":793,
            "y0":529,
            "y1":781
         },
         "label":"person"
      },
      {  
         "confidence":0.9699774980545044,
         "coordinates":{  
            "x0":782,
            "x1":833,
            "y0":518,
            "y1":660
         },
         "label":"person"
      },
      {  
         "confidence":0.9487058520317078,
         "coordinates":{  
            "x0":1056,
            "x1":1124,
            "y0":506,
            "y1":652
         },
         "label":"person"
      },
      {  
         "confidence":0.9246744513511658,
         "coordinates":{  
            "x0":1005,
            "x1":1088,
            "y0":589,
            "y1":739
         },
         "label":"person"
      },
      {  
         "confidence":0.737799882888794,
         "coordinates":{  
            "x0":344,
            "x1":508,
            "y0":781,
            "y1":945
         },
         "label":"dog"
      }
   ],
   "image":"data://.algo/temp/dog_park.png"
}

Labels

The dataset that this algorithm was trained on has 90 possible labels, here's a list for easy reference:

- person
- bicycle
- car
- motorcycle
- airplane
- bus
- train
- truck
- boat
- traffic light
- fire hydrant
- stop sign
- parking meter
- bench
- bird
- cat
- dog
- horse
- sheep
- cow
- elephant
- bear
- zebra
- giraffe
- backpack
- umbrella
- handbag
- tie
- suitcase
- frisbee
- skis
- snowboard
- sports ball
- kite
- baseball bat
- baseball glove
- skateboard
- surfboard
- tennis racket
- bottle
- wine glass
- cub
- fork
- knife
- spoon
- bowl
- banana
- apple
- sandwich
- orange
- broccoli
- carrot
- hot dog
- pizza
- donut
- cake
- chair
- couch
- potted plant
- bed
- dining table
- toilet
- tv
- laptop
- mouse
- remote
- keyboard
- cell phone
- microwave
- oven
- toaster
- sink
- refigerator
- book
- clock
- vase
- scissors
- teddy bear
- hair drier
- toothbrush

Credits

This algorithm uses modified code from the object detection module from the Google Tensorflow project.

All images sourced from the wikimedia foundation with the creative commons license