qoolr

qoolr / JSONTypePredictor / 1.0.0

README.md

Overview

Predicts the type of fields in a JSON object

Applicable Scenarios and Problems

We use this algorithm when crawling the web to quickly classify JSON objects to understand if they include data of interest.

Usage

Input

The algorithm expects an input in the form of a single JSON object or a list of JSON objects.

  • In the case of a single object, it will return a result based on that object
  • In the case of a list of objects, it will return a result that is true across all objects in the list

Output

The algorithm returns two dictionaries.

  • The first contains basic types e.g string, integer, float
  • The second dictionary contains a prediction about the type of data in each field

Examples

[
        {"country_name":"France", "people":62000000, "code": "FR", "natioanl_day":"2019-07-14"},
        {"country_name":"Germany", "people":83000000, "code": "DE", "natioanl_day":"2019-10-03"},
        {"country_name":"United States of America", "people":327167434, "code": "US", "natioanl_day":"2019-07-04"},
        {"country_name":"Ireland", "people":6572728, "code": "IE", "natioanl_day":"2019-03-17"}, 
]
{
    "root.country_name": {
        "types": {
            "str": true,
            "int": false,
            "float": false,
            "bool": false,
            "date": false
        },
        "complex_types": {
            "coordinate": 0.0,
            "country_alpha2": 0.0,
            "country_name": 1.0,
            "location_parts": 1.0,
            "language_iso": 0.0,
            "language_name": 0.0,
            "timezone": 0.0,
            "url": 0.0
        }
    },
    "root.people": {
        "types": {
            "str": true,
            "int": true,
            "float": false,
            "bool": false,
            "date": false
        },
        "complex_types": {
            "coordinate": 0.0,
            "country_alpha2": 0.0,
            "country_name": 0.0,
            "location_parts": 0.0,
            "language_iso": 0.0,
            "language_name": 0.0,
            "timezone": 0.0,
            "url": 0.0
        }
    },
    "root.code": {
        "types": {
            "str": true,
            "int": false,
            "float": false,
            "bool": false,
            "date": false
        },
        "complex_types": {
            "coordinate": 0.0,
            "country_alpha2": 1.0,
            "country_name": 0.0,
            "location_parts": 0.0,
            "language_iso": 0.75,
            "language_name": 0.0,
            "timezone": 0.0,
            "url": 0.0
        }
    },
    "root.natioanl_day": {
        "types": {
            "str": true,
            "int": false,
            "float": false,
            "bool": false,
            "date": true
        },
        "complex_types": {
            "coordinate": 0.0,
            "country_alpha2": 0.0,
            "country_name": 0.0,
            "location_parts": 0.0,
            "language_iso": 0.0,
            "language_name": 0.0,
            "timezone": 0.0,
            "url": 0.0
        }
    }
}