character_recognition

character_recognition / SmartTextExtraction / 0.1.1

README.md

This is a general purpose text extraction algorithm that leverages multiple technologies. It's capable of detecting text in an image, isolating each line and predicting the words found in each line separately. By separating each line we're able to achive greater accuracies in natural scene images than most other publicly available OCR algorithms.

Table of Contents

I/O

{  
   "image": String,
   "mode": String
   "language": String
}
  • image - (required) - a hosted image file, may be a web url (http, https) or a data connector uri (data://, s3://, etc).
  • mode - (optional) - the algorithm has multiple possible OCR modes, here is a current list: -- "fallback" - the algorithm tries to use tesseract first, but if tesseract fails to predict anything, try naturalTextNet. -- "tesseract only" - uses only tesseract. -- "naturalTextNet only" - uses only naturalTextNet The mode defaults to "fallback" if none provided.
  • language - (optional) - the language code for the specialized OCR model to use, a full list of language codes can be found here. If using a language code other than 'eng' mode is automatically set to "tesseract only". Alternatively you can just pass a url directly to the algorithm as a string.

Examples

Example 1 - Swiss Receipt

Input

{"image":"http://i.imgur.com/BRVm15K.jpg", "language":"deu"}

Output

{  
   "predictions":[  
      {  
         "box":{  
            "confidence":0.9950003027915955,
            "x0":881.280029296875,
            "x1":1465.739990234375,
            "y0":719.5480346679688,
            "y1":796.46142578125
         },
         "predicted_text":"Familie R."
      },
      {  
         "box":{  
            "confidence":0.9911934733390808,
            "x0":783.3599853515625,
            "x1":1661.5799560546875,
            "y0":529.455078125,
            "y1":621.7941284179688
         },
         "predicted_text":"Grosse Scheidegqg"
      },
      {  
         "box":{  
            "confidence":0.989794909954071,
            "x0":832.3200073242188,
            "x1":1514.699951171875,
            "y0":2517.7783203125,
            "y1":2593.3984375
         },
         "predicted_text":"Tel.: 033 853 67 1C"
      },
      {  
         "box":{  
            "confidence":0.989421546459198,
            "x0":391.67999267578125,
            "x1":682.3800048828125,
            "y0":1307.3974609375,
            "y1":1385.853759765625
         },
         "predicted_text":"1xGloki"
      },
      {  
         "box":{  
            "confidence":0.9888872504234314,
            "x0":1175.0400390625,
            "x1":1906.3800048828125,
            "y0":915.1535034179688,
            "y1":994.3203735351562
         },
         "predicted_text":"30. 07. 2007713 :29: 17"
      },
      {  
         "box":{  
            "confidence":0.9876304268836975,
            "x0":881.280029296875,
            "x1":1514.699951171875,
            "y0":2415.29541015625,
            "y1":2491.83251953125
         },
         "predicted_text":"MwSt Nr. : 430 234"
      },
      {  
         "box":{  
            "confidence":0.9842965006828308,
            "x0":881.280029296875,
            "x1":1514.699951171875,
            "y0":622.0308837890625,
            "y1":695.5274658203125
         },
         "predicted_text":"3818 Grindelwald"
      },
      {  
         "box":{  
            "confidence":0.9773725271224976,
            "x0":391.67999267578125,
            "x1":1857.4200439453125,
            "y0":1214.9713134765625,
            "y1":1290.3023681640625
         },
         "predicted_text":"Macchiato a - 4.50 CHF 9. 00"
      },
      {  
         "box":{  
            "confidence":0.9767477512359619,
            "x0":391.67999267578125,
            "x1":1857.4200439453125,
            "y0":1404.171630859375,
            "y1":1481.178955078125
         },
         "predicted_text":"1ixSchweinschnitze!l a 22.00 CHF - 22. 0C"
      },
      {  
         "box":{  
            "confidence":0.9657369256019592,
            "x0":391.67999267578125,
            "x1":927.1799926757812,
            "y0":1507.2957763671875,
            "y1":1581.3216552734375
         },
         "predicted_text":"1xChasspiat211"
      },
      {  
         "box":{  
            "confidence":0.963786780834198,
            "x0":342.7200012207031,
            "x1":1514.699951171875,
            "y0":2106.211181640625,
            "y1":2188.627197265625
         },
         "predicted_text":"Entspricht in Euro- 36.33 EUR"
      },
      {  
         "box":{  
            "confidence":0.9631685614585876,
            "x0":342.7200012207031,
            "x1":1220.93994140625,
            "y0":2216.739501953125,
            "y1":2291.154052734375
         },
         "predicted_text":"Es bediente Sie: Ursula"
      },
      {  
         "box":{  
            "confidence":0.9583240151405334,
            "x0":440.6400146484375,
            "x1":1759.5,
            "y0":2715.134765625,
            "y1":2792.466552734375
         },
         "predicted_text":"E-mail: grossescheidegg@bluewin. ch"
      },
      {  
         "box":{  
            "confidence":0.9582116603851318,
            "x0":1077.1199951171875,
            "x1":1906.3800048828125,
            "y0":1505.01806640625,
            "y1":1578.428955078125
         },
         "predicted_text":"a 18.50 CHF _- 18.50"
      },
      {  
         "box":{  
            "confidence":0.9548710584640503,
            "x0":1419.8399658203125,
            "x1":1906.3800048828125,
            "y0":1013.7512817382812,
            "y1":1096.07666015625
         },
         "predicted_text":"Tisch - 7/01"
      },
      {  
         "box":{  
            "confidence":0.947666585445404,
            "x0":832.3200073242188,
            "x1":1563.6600341796875,
            "y0":2614.270263671875,
            "y1":2688.2314453125
         },
         "predicted_text":"Fax.: 033 853 67 19"
      },
      {  
         "box":{  
            "confidence":0.9400507211685181,
            "x0":1077.1199951171875,
            "x1":1857.4200439453125,
            "y0":1313.461181640625,
            "y1":1386.76708984375
         },
         "predicted_text":"a - 5.00 CHF _- 5.00"
      },
      {  
         "box":{  
            "confidence":0.9378625750541687,
            "x0":881.280029296875,
            "x1":1465.739990234375,
            "y0":425.6053771972656,
            "y1":518.11181640625
         },
         "predicted_text":"Berghote I"
      },
      {  
         "box":{  
            "confidence":0.9366177320480347,
            "x0":342.7200012207031,
            "x1":1710.5400390625,
            "y0":1902.9322509765625,
            "y1":1980.0235595703125
         },
         "predicted_text":"Incl. 7.6% MwSt - 54.50 CHF: 3.85"
      },
      {  
         "box":{  
            "confidence":0.9237926602363586,
            "x0":342.7200012207031,
            "x1":878.219970703125,
            "y0":915.0635986328125,
            "y1":988.8533325195312
         },
         "predicted_text":"Rech. Nr. 4572"
      },
      {  
         "box":{  
            "confidence":0.8983836770057678,
            "x0":1517.760009765625,
            "x1":1857.4200439453125,
            "y0":1699.5595703125,
            "y1":1789.38525390625
         },
         "predicted_text":"54 50"
      },
      {  
         "box":{  
            "confidence":0.8963740468025208,
            "x0":832.3200073242188,
            "x1":1367.8199462890625,
            "y0":1714.4747314453125,
            "y1":1788.5390625
         },
         "predicted_text":"Total : CHF"
      },
      {  
         "box":{  
            "confidence":0.8577872514724731,
            "x0":1517.760009765625,
            "x1":1955.3399658203125,
            "y0":1622.400390625,
            "y1":1655.4766845703125
         },
         "predicted_text":""
      }
   ]
}

Example 2 - Book Cover

Input

{"image": "http://i.imgur.com/Rnhm4e9.jpg"}

Output

{  
   "predictions":[  
      {  
         "box":{  
            "confidence":0.9889880418777466,
            "x0":127.19999694824219,
            "x1":215.84249877929688,
            "y0":326.5552978515625,
            "y1":358.1163330078125
         },
         "predicted_text":"girl"
      },
      {  
         "box":{  
            "confidence":0.9725095629692078,
            "x0":63.599998474121094,
            "x1":273.0824890136719,
            "y0":254.2978515625,
            "y1":287.0748291015625
         },
         "predicted_text":"popular"
      },
      {  
         "box":{  
            "confidence":0.9689595699310303,
            "x0":63.599998474121094,
            "x1":279.4425048828125,
            "y0":363.63818359375,
            "y1":402.11077880859375
         },
         "predicted_text":"By Angela Brayil"
      },
      {  
         "box":{  
            "confidence":0.9350155591964722,
            "x0":82.68000030517578,
            "x1":254.00250244140625,
            "y0":288.3578186035156,
            "y1":323.608154296875
         },
         "predicted_text":"SCHOOL:"
      }
   ]
}

FAQ

Question: Is this is the perfect general purpose OCR algorithm?

As an algorithm that amalgamates the output from multiple other OCR algorithms it's quite capable at general purpose - properly aligned images. The following caveats exist:

-- Rotated text still cause trouble, we're working on that though!

-- through tesseract we support up to 100 languages, however not all language models are equal. If you run into any that are flat out bad we can notify the tesseract team.

-- Our natural text algorithm works good on short lines, but poorly on longer lines, we intend to improve this with time.

-- highly skewed text can throw off the detector which in turn will throw off the extraction prediction.

-- Non-standard fonts don't work well with Tesseract or NaturalTextNet.

Question: Why does it work on one image but not on others?

OCR is not truly a solved problem on natural images with noisy backgrounds; our approach might be great sometimes but it still makes mistakes. Over time we plan to improve it's performance, if you can provide us with labelled images we can directly use them to improve our quality!

Credits

This algorithm utilizes Text Detection for detecting text lines, tesseract as the primary OCR engine, and finally NaturalTextNet as the secondary OCR engine.

All images sourced from the wikimedia foundation with the creative commons license.