asli / xgboost_basic_sentiment_analysis / 0.1.1

Sentiment Analysis with XGBoost

If you have a lot of user feedback, reviews or any other text that you want to analyze; and going through all of them feels difficult and tedious to you, this algorithm comes to your rescue!

To reproduce the creation of this algorithm, you can check out our demo notebook on Github


This algorithm takes a text string as an input and loads an XGBoost model trained on the Amazon Musical Instrument Reviews Dataset to predict its sentiment. Its output is in two basic categories: positive or negative; symbolized as 1 or 0.

How was this algorithm created?

The training dataset of this algorithm contains users' review texts, summaries and their overall rating related to the mentioned product.

Before fitting an XGBoost classifier on this data, first the texts are preprocessed to remove the English stop words and the punctuations.

After that, to be able to feed the text data as numeric values to our model, they are first converted into a matrix of token counts using a CountVectorizer. Then this count matrix is converted to a normalized tf-idf (term-frequency times inverse document-frequency) representation. Using this transformer, we are scaling down the impact of tokens that occur very frequently, because they convey less information to us since they are very common and they don't help us make the differentiation. On the contrary, we are scaling up the impact of the tokens that occur in a small fraction of the training data because they are more informative to us.

How can you use it?

Input: string, required

Output: JSON

Example I/O

Input: "I am glad that I bought this. It works great!"


Input: "It doesn't work quite as expected. Not worth your money!"


Example calling methods

You can call your algorithm through the CLI:

algo run asli/xgboost_basic_sentiment_analysis/0.1.1 -d '"I am glad that I bought this. It works great!"' --timeout 300

Through curl:

curl -X POST -d '"I am glad that I bought this. It works great!"' -H 'Content-Type: application/json' -H 'Authorization: Simple sim0/CA0mCa6Xz3FAkyoHb45G5I1'

Through Algorithmia Python client:

import Algorithmia

input = "I am glad that I bought this. It works great!"
client = Algorithmia.client('sim0/CA0mCa6Xz3FAkyoHb45G5I1')
algo = client.algo('asli/xgboost_basic_sentiment_analysis/0.1.1')
algo.set_options(timeout=300) # optional

or through the other methods demonstrated on the Overview section.