allenai

allenai / coreference_resolution / 0.1.1

README.md

Overview

This algorithm identifes co-references (corefs) in a body of text: separate places in the text that are referring to the same entity.

It is a wrapper around the Coreference Resolution model put out by the AllenNLP team, which is based on the End-to-End Coreference Resolution (Lee et al, 2017) model.

Applicable Scenarios and Problems

Identifying corefs is an important pre-processing stage for many problems in NLP, such as text summarization and information extraction.

Usage

Input

The input JSON blob should have the following fields:

  • document: the sentence to be examined

Any additional fields will be passed through into the AllenNLP model.

Output

The following output field will always be present:

  • document: The text to be parsed, broken into tokens
  • top_spans: A list of ranges of tokens that refer to entities
  • predicted antecedents:
  • clusters: Groups of spans that refer to the same entity

Examples

Example 1: Default Behavior

Input:

{
  "document": "The woman reading a newspaper sat on the bench with her dog."
}

Output:

{
  'top_spans': [[0, 4], [3, 4], [7, 11], [10, 10], [10, 11]], 
  'predicted_antecedents': [-1, -1, -1, 2, -1],
  'document': ['The', 'woman', 'reading', 'a', 'newspaper', 'sat', 'on', 'the', 'bench', 'with', 'her', 'dog', '.'],
  'clusters': [[[0, 4], [10, 10]]]}

See Also