allenai

allenai / named_entity_recognition / 0.1.0

README.md

Overview

This algorithm provides state-of-the-art ability to identify named entities in a piece of text.

It is a wrapper around the Named Entity Recognition model put out by the AllenNLP team described in Deep contextualized word representations (2018).

Applicable Scenarios and Problems

Extracting named entities is a critical step in many natural language processing tasks, and is particularly useful in indexing large corpuses of text for later searching.

Usage

By default the algorithm only returns only the parsed input and the ultimate entity tags. However, if you run in debug mode it will return the entire output of AllenNLP's model.

Input

The input JSON blob should have the following fields:

  • sentence: a sentence to extract entities from
  • debug(optional): a boolean indicating whether to be in debug mode

Any additional fields will be passed through into the AllenNLP model.

Output

The following output field will always be present:

  • words: the original sentence parsed into words and other semantic tokens
  • tags: a list of tags indicating whether the words are part of named entities, and if so what class they are

If you run the algorithm in debug mode there will be additional output fields, including:

  • mask:
  • logits:

Examples

Example 1: Default Behavior

Input:

{
  "sentence": "Did Uriah honestly think he could beat The Legend of Zelda in under three hours?"
}

Output:

{
  "tags": ["O", "U-PER", "O", "O", "O", "O", "O", "B-MISC", 
I-MISC", "I-MISC", "L-MISC", "O", "O", "O", "O", "O"],
  "words': ["Did", "Uriah", "honestly", "think", "he", "could", "beat", "The", "Legend", "of", "Zelda", "in", "under", "three", "hours", "?"],
}

Example 2: Debug Mode

Input:

{
  "sentence": "Did Uriah honestly think he could beat The Legend of Zelda in under three hours?",
  "debug": true
}

Output:

{
  "tags": ["O", "U-PER", "O", "O", "O", "O", "O", "B-MISC", 
I-MISC", "I-MISC", "L-MISC", "O", "O", "O", "O", "O"],
  "words': ["Did", "Uriah", "honestly", "think", "he", "could", "beat", "The", "Legend", "of", "Zelda", "in", "under", "three", "hours", "?"],
  "mask": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
  "logits":...

See Also