ANaimi

ANaimi / PDFToText / 0.1.4

README.md

This algorithm takes a PDF file with two coordinates and returns the text bounded within the rectangle of the coordinates. 


Input #1 (url, page, x1, y1, x2, y2):
  • URL for PDF file
  • Page number - choose 0 for all pages
  • Coordinate X1 - top left of the rectangle
  • Coordinate Y1 - top left of the rectangle
  • Coordinate X2 - bottom right of the rectangle
  • Coordinate Y2 - bottom right of the rectangle
Input #2 (url, page, space, x1, y1, x2, y2):
Same as above with an additional space parameter to specify the expected width of whitespace character. The default is 2.0.


Input #3 (url, page):
  • URL for PDF file
  • Page number - choose 0 for all pages
  • Use [0, 0, width, height] respectively for Input #1 structure

Output:
Always an array of strings, one element for each page.

Sample Document:
https://algorithmia.com/v1/data/ANaimi/PDFtoText/sample.pdf