Overview
This algorithm provides state-of-the-art ability to answer a question based on a piece of text. It takes in a passage of text and a question based on that passage, then returns a substring of the passage that is guessed to be the correct answer.
It is a wrapper around the Machine Comprehension model put out by the AllenNLP team, which is itself a re-implementation of the BiDAF Model (2017).
Applicable Scenarios and Problems
This algorithm is useful in creating natural-language interfaces to extract information from text documents. For example it could feature into the backend of a chatbot, or provide customer support based on a user's manual.
It can also be used to extract structured data from textual documents. For example, a collection of doctors' reports could be turned into a table that says (for every report) what was hurting, what the patient should do, and when they should schedule a follow-up.
Usage
By default the algorithm only returns only the actual answer to the question. However, if you run in debug mode it will return the entire output of AllenNLP's model.
Input
The input JSON blob should have the following fields:
- passage: a piece of text giving information
- question: a question to answer give the information in passage
- debug(optional): a boolean indicating whether to be in debug mode
Any additional fields will be passed through into the AllenNLP model.
Output
The following output field will always be present:
- best_span_str: A substring of the passage that answers the question
If you run the algorithm in debug mode there will be additional output fields, including:
- question_tokens: the tokens in the parsed question
- passage_tokens: the tokens in the parsed passage
- best_span: the answer to the question, indicated as a range of over the passage_tokens
- span_start[end]_probs: probability of tokens being the beginning[end] of the best span
Examples
Example 1: Default Behavior
Input:
{
"passage": "The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano.",
"question": "Who stars in The Matrix?"
}
Output:
{
"best_span_str": "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano"
}
Example 2: Debug Mode
Input:
{
"passage": "The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano.",
"question": "Who stars in The Matrix?",
"debug": true
}
Output:
{
"best_span_str": "Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano",
"best_span": [
17,
33
],
"passage_tokens": [
"The",
"Matrix",
"is",
"a",
"1999",
"science",...
See Also
- A web-based demo of the model available on the AllenNLP site
- Documentation of the model's code