Sentiment analysis is an automated process that analyzes text data by classifying sentiments as either positive, negative, or neutral. One of the most compelling use cases of sentiment analysis today is brand awareness, and Twitter is home to lots of consumer data that can provide brand awareness insights. If you can understand what people are saying about you in a natural context, you can work towards addressing key problems and improving your business processes. So how exactly can you get that up and running?
The Algorithmia marketplace makes it easy to extract the content you need from Twitter and pipe it into the right algorithms for sentiment analysis. There are a few algorithms on the platform for exploring different information from Twitter (like users, tweets, and followers), and a number for sentiment analysis.
Here’s what our workflow will look like:
- Gather relevant tweets from Twitter
- Preprocessing (stopword removal)
- Apply the right sentiment analysis algorithm
- Analyze the results
- Discuss further improvement and next steps
I’ll be using a Jupyter Notebook and Python, but code snippets will be included below.
Extracting Tweets Using RetrieveTweetsWithKeyword
We’ll start by grabbing the tweets we want from Twitter. It’s as simple as defining your keywords, initializing the Algorithmia client, and calling the right function.
Our tweets are stored in the results variable. Here’s a peek of a few of them:
Preprocessing Tweets with Stopword Removal
Stopwords are words that aren’t integral to the meaning of a text, and are usually removed as part of a Natural Language Processing workflow. Doing this is easy with our nlp/RemoveStopWords algorithm.
The tweets are now reassembled as sentences, but without stopwords. Removing these extra elements should give the sentiment analysis algorithm a better shot.
Analyzing Tweets with Sentiment Analysis
Choosing which sentiment algorithm to use depends on a number of factors: you need to take into account the required level of detail, speed, cost, and accuracy among other things. For a survey of a few different algorithms and their performance, look for our post here.
The nlp/SocialSentimentAnalysis algorithm is a simple implementation of the VADER Sentiment package. It’s specifically tailored towards parsing text from social platforms like Twitter, which means it’s a great fit for our projects. For more information about the algorithm, check out the paper here.
We want to understand the overall feelings towards both of the given topics, in this case Tesla and Comcast. We can take the average sentiment across all of the tweets we gathered for an approximation of that.
Here are Tesla’s results:
And here are Comcast’s:
What we’ve learned:
- Tesla and Comcast have similar averages for negative sentiment, but Comcast’s max of 60% is higher than Tesla’s of 48%
- Tweets about Comcast were generally more positive (13%) than tweets about Tesla were (8%)
- Tweets about Tesla were generally more neutral (84%) than tweets about Comcast were (79%)
- Tweets were generally more positive (average of 10.5%) than negative (8%)
If you were a manager at either of these companies, you might want to focus on increasing positive sentiment about the brand on social. On the bright side though, there’s very little negativity floating around.
Further Improvements and Next Steps
There are a few ways we can improve this pipeline to make it look more like something production ready.
- Gather (a lot) more tweets – the more tweets we have, the more confident we can be in our sentiment estimation. With Algorithmia’s infrastructure, this application can scale up to levels of magnitude worth of more tweets simply and easily.
- Explore other algorithms – depending on the business goal, other algorithms might be better suited to this type of analysis. For example, the TextBlob Python package returns a measure of subjectivity for a given string of text.
For more details about sentiment analysis, check out our long form explanation of the topic here.