cindyxiaoxiaoli

cindyxiaoxiaoli / KeywordExtraction / 0.3.0

README.md

Keyword Extraction from Sentence

Given a sentence, the algorithm extracts a list of keywords from it. Keywords here are defined as words/phrases that represent meaningful topics. It could be especially useful to understand short pieces of text.

The algorithm relies on NLTK for does Part-Of-Speech-Tagging and a stop word list but it is not trained on any pre-defined dataset. It can be used on any domain. It extracts Noun Phrases from the sentence which are valuable in representing meanings.

Features

Extracts keywords that are single word and/or multi-word phrases.

Case of the letters and the form of the words are kept intact.

All keywords in a sentence are extracted.

If a keyword appeared multiple times it will appear multiple times in the result so if you need it for further analysis it may be helpful.

Example

Input:

"Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the interactions between computers and human (natural) languages and, in particular, concerned with programming computers to fruitfully process large natural language corpora. "

Output:

["Natural language processing","NLP","field","computer science","artificial intelligence","computational linguistics","interactions","computers","languages","computers","large natural language corpora"]

Input:

​"I really like the pictures listed on their mobile website and I also like the customer service."

Output:

​["pictures","mobile website","customer service"]