This model was trained using a character-level Convolutional Neural Network (CNN) architecture.
The dataset which has been trained on is of high quality (these are basically well written reviews about restaurants and movies).
The model may however poorly perform on user-generated data.
Why using a Character based CNN model instead of a word based one (RNN or CNN based)?
Character based models provide lots of advantages:
- They have a very low memory footprint: the embedding matrix you're storing in your model is an embedding matrix for the alphabet that the language is using.
- They are robust against typos and misspelling (and even new words), hence overcoming the OOV (out-of-vocabulary) issue.
- They require minimal text preprocessing (no tokenization, no stemming, no lemmatization)
- Training is blazingly fast.
Applicable Scenarios and Problems
You may need to mine text data and assess its sentiment for a variety of reasons:
- Marketing studies: How does the release of your product impact your customers?
- Political campaigns: What do people think of the newly elected president?
- Financial industry: How does the news impact the stock prices of a company?
|text||text data up to 600 characters|
|sentiment||a score between O and 1|
Help us improve the model
This model is still under development. It may weirdly behave in some situations; that happens.
Please help us improve it by providing feedbacks and comments.