brunni

brunni / AddressExtractionFromText / 0.1.3

README.md

Extracts addresses, contact data, company names and other information from text.

Emphasis currently is on Germany, Austria and Switzerland. The quality for other countries varies but can be optimized on demand.

'results' contains the extracted data (may be empty). The algorithm may return several results and several contact persons within a result. Be aware that no result will be returned when no postal code / town combination is found. Each result contains:

country, zip, city, street, company, phone, fax, iban, bic: Should be self-explanatory.

emailhash: SHA-1-Hash of ‘mailto:’ + mail address (to prevent abuse, plain addresses are not available)

vatidnr: VAT ID

regnrde: Trade register number and town of register court (Germany only)

managers: List of contact persons with relevant catchword and distance of name to catchword (currently only supported on german websites)

blz: German bank code (Germany only)

See /brunni/AddressExtraction for a version that crawls websites.