allenai

allenai / machine_comprehension / 0.1.2

README.md

Overview

This algorithm provides state-of-the-art ability to answer a question based on a piece of text. It takes in a passage of text and a question based on that passage, then returns a substring of the passage that is guessed to be the correct answer.

It is a wrapper around the Machine Comprehension model put out by the AllenNLP team, which is itself a re-implementation of the BiDAF Model (2017).

Applicable Scenarios and Problems

This algorithm is useful in creating natural-language interfaces to extract information from text documents. For example it could feature into the backend of a chatbot, or provide customer support based on a user's manual.

It can also be used to extract structured data from textual documents. For example, a collection of doctors' reports could be turned into a table that says (for every report) what was hurting, what the patient should do, and when they should schedule a follow-up.

Usage

By default the algorithm only returns only the actual answer to the question. However, if you run in debug mode it will return the entire output of AllenNLP's model.

Input

The input JSON blob should have the following fields:

  • passage: a piece of text giving information
  • question: a question to answer give the information in passage
  • debug(optional): a boolean indicating whether to be in debug mode

Any additional fields will be passed through into the AllenNLP model.

Output

The following output field will always be present:

  • best_span_str: A substring of the passage that answers the question

If you run the algorithm in debug mode there will be additional output fields, including:

  • question_tokens: the tokens in the parsed question
  • passage_tokens: the tokens in the parsed passage
  • best_span: the answer to the question, indicated as a range of over the passage_tokens
  • span_start[end]_probs: probability of tokens being the beginning[end] of the best span

Examples

Example 1: Default Behavior

Input:

{
  "passage": "The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano.",
  "question": "Who stars in The Matrix?"
}

Output:

{
  "best_span_str": "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano"
}

Example 2: Debug Mode

Input:

{
  "passage": "The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano.",
  "question": "Who stars in The Matrix?",
  "debug": true
}

Output:

{
  "best_span_str": "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano",
  "best_span": [
    17,
    33
  ],
  "passage_tokens": [
    "The",
    "Matrix",
    "is",
    "a",
    "1999",
    "science",...

See Also