character_recognition

character_recognition / TextDetectionCTPN / 0.2.0

README.md

This algorithm detects text or words in any kind of image including both scanned documents and natural images. Great preprocessing step for any OCR pipeline.

If an output file is provided, the algorithm will annotate the source image with bounding boxes.

I/O

Input

{  
   "input": String,
   "output": String
}
  • input - (required) - an input image as either a url, data connector uri (data://, s3://, etc) or a base 64 encoded string.
  • output - (optional) - the output data connector path to where you want to save the annotated image to. if not provided, just the bounding box coordinates are returned.

Output


{  
   "boxes":[  
      ...
      {  
         "confidence":Float,
         "x0":Float,
         "x1":Float,
         "y0":Float,
         "y1":Float
      },
      ...
   ],
   "output":String
}

  • boxes - List of all detected text locations, defined by bounding box coordinates.
  • output - The optional output data connector URL for the annotated image file.
  • confidence - the confidence that text was detected within this bounding box.
  • x0 - the x0 coordinate of the bounding box.
  • x1 - the x1 coordinate of the bounding box.
  • y0 - the y0 coordinate of the bounding box.
  • y1 = the y1 coordinate of the bounding box.

Example

Example 1 - long receipt

Input

{  
   "input":"http://3.media.collegehumor.cvcdn.com/57/40/45795b7aa3650756ad94f776add650fd.jpg",
   "output":"data://.algo/temp/receipt.png"
}

Output

{  
   "boxes":[  
      {  
         "confidence":0.9979859590530396,
         "x0":96,
         "x1":607,
         "y0":1338.624267578125,
         "y1":1362.292236328125
      },
      {  
         "confidence":0.9967637658119202,
         "x0":80,
         "x1":223,
         "y0":541.8438720703125,
         "y1":564.5916748046875
      },
      {  
         "confidence":0.9942678213119508,
         "x0":80,
         "x1":639,
         "y0":1436.00341796875,
         "y1":1464.0400390625
      },
      {  
         "confidence":0.993927001953125,
         "x0":112,
         "x1":223,
         "y0":961.9906005859376,
         "y1":980.5181884765624
      },
      ...
      ...
      ...
      {  
         "confidence":0.7212919592857361,
         "x0":80,
         "x1":143,
         "y0":1486.1190185546875,
         "y1":1504.2633056640625
      }
   ],
   "output":"data://.algo/temp/receipt.png"
}

Example 2 - Bus Station

Input

{  
   "input":"https://media2.fdncms.com/sevendaysvt/imager/u/original/3742276/station.jpg",
   "output":"data://.algo/temp/bus_station.png"
}

Output

{  
   "boxes":[  
      {  
         "confidence":0.9730413556098938,
         "x0":640,
         "x1":719,
         "y0":325.8775329589844,
         "y1":337.6134338378906
      },
      {  
         "confidence":0.962052881717682,
         "x0":464,
         "x1":639,
         "y0":90.61949157714844,
         "y1":112.15149688720705
      },
      {  
         "confidence":0.9355180263519288,
         "x0":464,
         "x1":639,
         "y0":68.34097290039062,
         "y1":88.28927612304688
      },
      {  
         "confidence":0.9239080548286438,
         "x0":672,
         "x1":719,
         "y0":368.6856384277344,
         "y1":380.2655029296875
      },
      {  
         "confidence":0.8317975997924805,
         "x0":304,
         "x1":367,
         "y0":282.3546447753906,
         "y1":303.5064697265625
      }
   ],
   "output":"data://.algo/temp/bus_station.png"
}

Example 3 - Parked Police Car

Input

{  
   "input":"https://upload.wikimedia.org/wikipedia/commons/thumb/9/94/Policja_Opel_Krak%C3%B3w.JPG/1239px-Policja_Opel_Krak%C3%B3w.JPG",
   "output":"data://.algo/temp/police.png"
}

Output

{  
   "boxes":[  
      {  
         "confidence":0.9616782665252686,
         "x0":368,
         "x1":575,
         "y0":58.95378875732422,
         "y1":106.28519439697266
      },
      {  
         "confidence":0.9594982266426086,
         "x0":224,
         "x1":271,
         "y0":367.0335998535156,
         "y1":381.1184387207031
      },
      {  
         "confidence":0.949138641357422,
         "x0":352,
         "x1":575,
         "y0":397.1089782714844,
         "y1":440.8250122070313
      },
      {  
         "confidence":0.9071121215820312,
         "x0":432,
         "x1":687,
         "y0":300.0794677734375,
         "y1":353.77691650390625
      }
   ],
   "output":"data://.algo/temp/police.png"
}

Credits

This algorithm was originally based on the Detecting Text in Natural Image with Connectionist Text Proposal Network paper, with the accompanying source code.

Model built using a modified version of the caffe framework.