Types of machine learning
Machine learning comes in three basic types: supervised, unsupervised, and reinforcement learning. Reinforcement learning follows a different paradigm from the other two, so we’ll leave it for another post.
The most common form of machine learning, and the most prototypical, is supervised learning. Supervised learning is exciting because it works well in analogy with the way humans actually learn. In supervised tasks, we present the computer with a collection of labeled data points called a training set (for example a set of readouts from heart and blood pressure monitors on a set of patients along with labels as to whether they’ve experienced a stroke in the past 30 days.)
From such a dataset, a supervised machine learning algorithm could use the labels to recognize commonalities in the examples where a patient had a stroke and commonalities in the cases where the patient remained healthy. Using this insight gained on the training set, the algorithm can then generalize to a selection of unseen, unlabeled data called the test set and (hopefully accurately) predict whether a new patient is likely to experience a stroke, based on the readouts from the monitors.
Supervised learning overview
The central question of supervised learning is how do we best devise a system that will teach an algorithm to recognize useful patterns in the data given the labeled examples of the training set? Most algorithms use something called a cost or loss function in order to obtain a quantitative measurement of how well the algorithm is performing on the labeled data. The loss function takes in two arguments, the correct label of a training example and the label predicted by the machine learning algorithm and computes a loss value corresponding to how well the algorithm performed on the prediction task.
In many ways, this is similar to how we as humans learn. As children, we stumble about in our environments and make mistakes. For example, a toddler who has only ever seen dogs and has never seen a cat, may point at a cat and say “doggy!” In these instances where mistakes occur, parents or teachers intervene and gently correct the child, who learns how to label a cat when he or she sees one in the future.
In the same way, knowing the loss value allows the machine learning algorithm to recompute its parameters so that it can generate better predictions, and produce lower loss values, on the next pass over the training data. This process is repeated until the algorithm finally settles on a minimum loss value past which it can no longer improve.
In a nutshell, that is how supervised learning works. Of course, there are hundreds of different supervised learning algorithms that exist and each exhibits its own particularities, but in most cases the general process remains the same. The domain of supervised learning is huge and includes algorithms such as k nearest neighbors, convolutional neural networks for object detection, random forests, support vector machines, linear and logistic regression, and many, many more.
Unsupervised learning is the opposite of supervised learning. In unsupervised learning, the algorithm tries to learn some inherent structure to the data with only unlabeled examples. Two common unsupervised learning tasks are clustering and dimensionality reduction. In clustering, we attempt to group data points into meaningful clusters such that elements within a given cluster are similar to each other but dissimilar to those from other clusters. Clustering is useful for tasks such as market segmentation. For example, suppose a business has data about customers, such as demographic information and their purchasing behavior. They might want to identify subsegments of the market where a particular product sells extremely well and others where it performs poorly. In this case, they could use an unsupervised clustering algorithm such as k-means or hierarchical clustering to identify those strong and weak customer bases.
Dimensionality reduction use cases
In dimensionality reduction, we are presented with data in a very high dimensional space, and we ultimately want to project that same data into a much lower dimensional space so that it becomes more interpretable. For example, in word2vec, a natural language processing method devised at Google, the algorithm reads in huge corpora (large volumes of text) and creates vectors for each word encountered.
The naive representation would create vectors the size of the vocabulary (tens of thousands of words), but word2vec creates ones of anywhere from 50 to 300 dimensions. It also looks at the words in their textual context and embeds the vectors such that words that share similar contexts are given similar vector representations. This allows the algorithm to capture abstract meaning as conveyed by the texts.
Word2Vec uses a training procedure in which a heuristic, labeled dataset is created from raw, unlabeled data. While this is still unsupervised learning, it’s also often given the special name semi-supervised learning to account for the fact that the algorithm creates its own internal type of supervision.
Another dimensionality reduction algorithm commonly used in practice is called Principal Components Analysis, or PCA. In PCA, the data undergoes a transformation so it’s represented in a new coordinate system where the coordinate axes are called principal components. Projecting along the principal components is equivalent to projecting along the directions of the largest variance in the data, and analysis of these principal components conveys a wealth of information about the dataset.
More examples of unsupervised learning
Other common unsupervised algorithms include Singular Value Decomposition (SVD), Locally Linear Embedding, Gaussian Mixture Models, Variational Autoencoders, and Generative Adversarial Networks (GANs). Many unsupervised learning algorithms attempt to mimic human creativity in some way, and they are used in applications ranging from the recommendation systems employed by companies such as Netflix and Spotify to systems for generating art and 3D models for various applications, such as video games by companies like Nvidia.
Demand forecasting is a common business practice for optimizing workflow in inventory, but it actually has use cases across all industries even if it isn’t immediately clear. Let’s walk through how demand forecasting can be used and explore its value.
What is demand forecasting?
Demand forecasting is a process that takes historical sales data and uses it to make estimations (or forecasts) about customer demand in the future. For enterprises, demand forecasting allows for estimating how many goods or services will sell and how much inventory needs to be ordered.
Demand forecasting lays the foundation for many other critical business assumptions such as turnover, profit margins, cash flow, capital expenditure, and capacity planning. Demand forecasting is often associated with managerial economics and supply chain management, but it applies to every company in every industry.
What are demand forecasting methods?
In order to forecast demand, we must have historical data on the market and past revenue, but the time span, the scope of the market, and other details can change the results. There are six common ways to calculate a demand forecast, but even these methods can be tweaked to meet the needs of a company.
- Passive – Passive demand forecasting is common in small businesses, because it is the simplest way to estimate future demand. In this method, only past demand performance is used to make predictions about future demand. This means it can be potentially inaccurate, but easier to calculate a result (ie. for the last 19 weeks, carrots sold at 13 cents a piece, therefore we can expect for them to sell at 13 cents this next week).
- Active – Active demand forecasting is typically used by companies that are growing and expanding. The active method of predicting demand takes into account aggressive growth plans such as marketing or product development and also the general competitive environment of the industry.
- Short-term – Short-term demand forecasting only predicts demand for three to 12 months in the future. This can give businesses an idea of what to expect within the next few quarters up to a year, but not longer. Seasonal demand is often calculated this way.
- Long-term – Long-term demand forecasting is used to predict demand for more than a year in the future, often up to three or four years out. Marketing and product strategies are often based on this type of demand forecast.
- External macro level – External demand forecasting is based upon the macroeconomics of the market and external environmental factors. These types of predictions drive internal business decisions, such as product portfolio evaluation and expansion and the development of new customer segments.
- Internal business-level – Internal business-level demand forecasting takes into account only internal metrics such as revenue, costs of goods sold, profit margins, cash flow, etc. This does not take external data into account, so it makes forecasts based only on current business processes.
Why is demand forecasting important?
Demand forecasting is a pivotal business process. Many strategic and operational tactics are based on this forecast, such as budgeting, financial planning, sales and marketing plans, and capacity planning. Because so many business decisions are contingent on demand forecasts, it is crucial to get an accurate prediction. Imagine if demand is predicted to grow, and the company is liberal with its yearly budgets as a result, but demand actually shrinks.
Demand forecast calculations rely on a large amount of data, and are custom to a company’s specific situation, often making them proprietary.
Many businesses rely on machine learning models to do the demand forecast calculation. This makes the forecast more accurate and reliable while saving human time that would otherwise be spent on manual calculations.
The great thing about using machine learning for demand forecasting is that once the model is built to calculate a specific formula for future demand, it can update predictions as time passes. That way, there is always a real-time prediction available that includes any new data.
How Algorithmia can help
Demand forecasting is best done using machine learning, and Algorithmia provides the best machine learning framework available. Our serverless microservices architecture can help your team get their ML models up and running as quickly as possible by navigating around common roadblocks.
The time and money invested in ML must demonstrate a business value to be worthwhile, and machine learning is a big investment. That’s why we want to help companies extract value from their ML investments sooner. Our framework can make that happen. Check out our product and see how it can help your organization make more accurate demand forecasts faster.
Machine learning is a vast field, composed of many model types, subsets, and use cases. In our forthcoming 2020 State of Enterpriser Machine Learning report, we dig into the use cases that are used most often by businesses today, but as there are new advances made in ML every day, there are also advances in number and complexity of ML use cases. This post will walk through some common machine learning use cases and how they enable businesses to leverage their data in novel ways.
What is machine learning?
Machine learning is the subset of artificial intelligence that involves the study and use of algorithms and statistical models for computer systems to perform specific tasks without human interaction. Machine learning models rely on patterns and inference instead of manual human instruction. Most any task that can be completed with a data-defined pattern or set of rules can be done with machine learning. This allows companies to automate processes that were previously only possible for humans to perform—think responding to customer service calls, bookkeeping, and reviewing resumes.
To extract machine learning value, a model must be trained to react to certain data in certain ways, which requires a lot of clean training data. Once the model successfully works through the training data and is able to understand the nuances of the patterns its learning, it will be able to perform the task on real data. We’ll walk through some use cases of machine learning to help you understand the value of this technology.
What is machine learning used for?
Machine learning has many potential uses, including external (client-facing) applications like customer service, product recommendation, and pricing forecasts, but it is also being used internally to help speed up processes or improve products that were previously manual and time-consuming. You’ll notice these two types throughout our list of machine learning use cases below.
1. Voice assistants
This consumer-based use for machine learning applies mostly to smart phones and smart home devices. The voice assistants on these devices use machine learning to understand what you say and craft a response. The machine learning models behind voice assistants were trained on human languages and variations in the human voice, because it has to translate what it hears into words and then make an intelligent, on-topic response.
Millions of consumers use this technology, often without realizing the complexity behind the tool. The concept of training machine learning models to follow rules is fairly simple, but when you consider training a model to understand the human voice, interpret meaning, and craft a response, that is a heavy task.
2. Dynamic pricing
This machine–based pricing strategy is most known in the travel industry. Flights, hotels, and other travel bookings usually have a dynamic pricing strategy behind them. Consumers know that the sooner they book their trip the better, but they may not realize that the actual price changes are made via machine learning.
Travel companies set rules for how much the price should increase as the travel date gets closer, how much it should increase as seat availability decreases, and how high it should be relative to competitors. Then, they let the machine learning model run with competitor prices, time, and availability data feeding into it.
3. Email filtering
This is a classic use of machine learning. Email inboxes also have a spam inbox, where your email provider automatically filters unwanted spam emails. But how do they know when an email is spam? They have trained a model to identify spam emails based on characteristics they have in common. This includes the content of the email itself, the subject, and the sender. If you’ve ever looked at your spam inbox, you know that it wouldn’t be very hard to pick out spam emails because they look very different from real emails.
4. Product recommendations
Amazon and other online retailers often list “recommended products” for each consumer individually. These recommendations are based on past purchases, browsing history, and any other behavioral information they have about consumers. Often the recommendations are helpful in finding related items that you need to complement your purchase (think batteries for a new electronic gadget).
However, most consumers probably don’t realize that their recommended products are a machine learning model’s analysis of their behavioral data. This is a great way for online retailers to provide extra value or upsells to their customers using machine learning.
5. Personalized marketing
Marketing is becoming more personal as technologies like machine learning gain more ground in the enterprise. Now that much of marketing is online, marketers can use characteristic and behavioral data to segment the market. Digital ad platforms allow marketers to choose characteristics of the audience they want to market to, but many of these platforms take it a step further and continuously optimize the audience based on who clicks and/or converts on the ads. The marketer may have listed 4 attributes they want their audience to have, but the platform may find 5 other attributes that make users more likely to respond to the ads.
6. Process automation
There are many processes in the enterprise that are much more efficient when done using machine learning. These include analyses such as risk assessments, demand forecasting, customer churn prediction, and others. These processes require a lot of time (possibly months) to do manually, but the insights gained are crucial for business intelligence. But if it takes months to get insights from the data, the insights may already be outdated by the time they are acted upon. Machine learning for process automation alleviates the timeliness issue for enterprises.
Industries are getting more and more competitive now that technology has sped up these processes. Companies can get up-to-date analyses on their competition in real time. This high level of competition makes customer loyalty even more crucial, and machine learning can even help with customer loyalty analyses like sentiment analysis. Companies like Weavr.ai provide a suite of ML tools to enable this type of analysis quickly and deliver results in a consumable format.
7. Fraud detection
Banks use machine learning for fraud detection to keep their consumers safe, but this can also be valuable to companies that handle credit card transactions. Fraud detection can save money on disputes and chargebacks, and machine learning models can be trained to flag transactions that appear fraudulent based on certain characteristics.
Machine learning can provide value to consumers as well as to enterprises. An enterprise can gain insights into its competitive landscape and customer loyalty and forecast sales or demand in real time with machine learning.
If you’re already implementing machine learning in your enterprise or you’d like to start...
In the last 12 months, there have been numerous developments in machine learning (ML) tools, applications, and hardware. Google’s TPUs are in their third generation, the AWS Inferentia chip is a year old, Intel’s Nervana Neural Network Processors are designed for deep learning, and Microsoft is reportedly developing its own custom AI hardware.
This year, Algorithmia has had conversations with thousands of companies in various stages of machine learning maturity. From them we developed hypotheses about the state of machine learning in the enterprise, and in October, we decided to test those hypotheses.
Following the State of Enterprise Machine Learning 2018 report, we conducted a new two-prong survey this year, polling nearly 750 business decision makers across all industries at companies that are actively developing machine learning lifecycles, just beginning their machine learning journeys, or somewhere in between. Sign up to receive the full 2020 report on 12 December 2019 when it publishes.
2020 key findings and report format
The forthcoming 2020 report focuses on seven key findings from the survey. In brief, they are:
- The rise of the data science arsenal for machine learning: most all companies are building data science teams to develop ML use cases. There are discrepancies in team size and agility, however, that will affect how quickly and efficiently ML is applied to business problems.
- Cutting costs takes center stage as companies grow in size: the primary business use cases center on customer service and internal cost reduction. Company size is the differentiator.
- Overcrowding at early maturity levels and AI for AI’s sake: the pool of companies entering the ML arena is growing exponentially but that could bring about an increase in “snake-oil AI” solutions.
- An unreasonably long road to deployment: despite the rapid development in use cases, growth in AI/ML budgets, and data science job openings, there is still a long road to model deployment. We offer several hypotheses why.
- Innovation hubs and the trouble with scale: we anticipate the proliferation of internal AI centers (innovation hubs) within companies designed to quickly develop ML capabilities so the organization can stay current with its competition. Machine learning challenges still exist, however, stymying the last-mile to sophisticated levels of ML maturity.
- Budget and ML maturity, an emerging disparity: AI/ML budgets are growing across all company sizes and industries, but several industries are investing more heavily.
- Determining machine learning success across the org chart: hierarchical levels within companies are determining ML success by two different metrics. The director level will likely play a large role in the future of ML adoption.
The report concludes with a section on the future of machine learning and what we expect in the short-term.
What to expect in the 2020 report
Our findings are presented with our original hypotheses, as well as our analysis of the results. Where possible, we have provided a year-on-year comparison with data from 2018 and included predictions about what is likely to manifest in the ML space in the near term.
We have included graphics throughout to bring the data to life (the banner graphic of this post is a bubble chart depicting the use cases of machine learning and their frequency in the enterprise).
We will continue to conduct this annual survey to increase the breadth of our understanding of machine learning technology in the enterprise and share with the broader industry how ML is evolving. In doing so, we can track trends in ML development across industries over time, ideally making more informed predictions with higher degrees of confidence.
Following the report and future-proofing for machine learning
We will soon make our survey data available on an interactive webpage to foster transparency and a greater understanding of the ML landscape. We are committed to being good stewards of ML technology.
This year’s survey report should confirm for readers that machine learning in the enterprise is progressing at a lightning pace. Though the majority of companies are still in the early stages of ML maturity, it is incorrect to think there is time to delay ML efforts at your company.
If your organization is not currently ML–oriented, know that your competitors are. Now is the time to future-proof your organization with AI/ML.
Sign up to receive the full 2020 State of Enterprise Machine Learning report when it publishes on 12 December.
Sentiment analysis invites us to consider the sentence, You’re so smart! and discern what’s behind it. It sounds like quite a compliment, right? Clearly the speaker is raining praise on someone with next-level intelligence. However, consider the same sentence in the following context.
Wow, did you think of that all by yourself, Sherlock? You’re so smart!
Now we’re dealing with the same words except they’re surrounded by additional information that changes the tone of the overall message from positive to sarcastic.
This is one of the reasons why detecting sentiment from natural language (NLP or natural language processing) is a surprisingly complex task. Any machine learning model that hopes to achieve suitable accuracy needs to be able to determine what textual information is relevant to the prediction at hand, have an understanding of negation, human patterns of speech, idioms, metaphors, etc, and be able to assimilate all of this knowledge into a rational judgment about a quantity as nebulous as “sentiment.”
In fact, when presented with a piece of text, sometimes even humans disagree about its tonality, especially if there’s not a fair deal of informative context provided to help rule out incorrect interpretations. With that said, recent advances in deep learning methods have allowed models to improve to a point that is quickly approaching human precision on this difficult task.
Sentiment analysis datasets
The first step in developing any model is gathering a suitable source of training data, and sentiment analysis is no exception. There are a few standard datasets in the field that are often used to benchmark models and compare accuracies, but new datasets are being developed every day as labeled data continues to become available.
The first of these datasets is the Stanford Sentiment Treebank. It’s notable for the fact that it contains over 11,000 sentences, which were extracted from movie reviews and accurately parsed into labeled parse trees. This allows recursive models to train on each level in the tree, allowing them to predict the sentiment first for sub-phrases in the sentence and then for the sentence as a whole.
The Amazon Product Reviews Dataset provides over 142 million Amazon product reviews with their associated metadata, allowing machine learning practitioners to train sentiment models using product ratings as a proxy for the sentiment label.
The IMDB Movie Reviews Dataset provides 50,000 highly polarized movie reviews with a 50-50 train/test split.
The Sentiment140 Dataset provides valuable data for training sentiment models to work with social media posts and other informal text. It provides 1.6 million training points, which have been classified as positive, negative, or neutral.
Sentiment analysis, a baseline method
Whenever you test a machine learning method, it’s helpful to have a baseline method and accuracy level against which to measure improvements. In the field of sentiment analysis, one model works particularly well and is easy to set up, making it the ideal baseline for comparison.
To introduce this method, we can define something called a tf-idf score. This stands for term frequency-inverse document frequency, which gives a measure of the relative importance of each word in a set of documents. In simple terms, it computes the relative count of each word in a document reweighted by its prevalence over all documents in a set. (We use the term “document” loosely.) It could be anything from a sentence to a paragraph to a longer-form collection of text. Analytically, we define the tf-idf of a term t as seen in document d, which is a member of a set of documents D as:
tfidf(t, d, D) = tf(t, d) * idf(t, d, D)
Where tf is the term frequency, and idf is the inverse document frequency. These are defined to be:
tf(t, d) = count(t) in document d
idf(t, d, D) = -log(P(t | D))
Where P(t | D) is the probability of seeing term t given that you’ve selected document D.
From here, we can create a vector for each document where each entry in the vector corresponds to a term’s tf-idf score. We place these vectors into a matrix representing the entire set D and train a logistic regression classifier on labeled examples to predict the overall sentiment of D.
Sentiment analysis models
The idea here is that if you have a bunch of training examples, such as I’m so happy today!, Stay happy San Diego, Coffee makes my heart happy, etc., then terms such as “happy” will have a relatively high tf-idf score when compared with other terms.
From this, the model should be able to pick up on the fact that the word “happy” is correlated with text having a positive sentiment and use this to predict on future unlabeled examples. Logistic regression is a good model because it trains quickly even on large datasets and provides very robust results.
Other good model choices include SVMs, Random Forests, and Naive Bayes. These models can be further improved by training on not only individual tokens, but also bigrams or tri-grams. This allows the classifier to pick up on negations and short phrases, which might carry sentiment information that individual tokens do not. Of course, the process of creating and training on n-grams increases the complexity of the model, so care must be taken to ensure that training time does not become prohibitive.
More advanced models
The advent of deep learning has provided a new standard by which to measure sentiment analysis models and has introduced many common model architectures that can be quickly prototyped and adapted to particular datasets to quickly achieve high accuracy.
Most advanced sentiment models start by transforming the input text into an embedded representation. These embeddings are sometimes trained jointly with the model, but usually additional accuracy can be attained by using pre-trained embeddings such as Word2Vec, GloVe, BERT, or FastText.
Next, a deep learning model is constructed using these embeddings as the first layer inputs:
Convolutional neural networks
Surprisingly, one model that performs particularly well on sentiment analysis tasks is the convolutional neural network, which is more commonly used in computer vision models. The idea is that instead of performing convolutions on image pixels, the model can instead perform those convolutions in the embedded feature space of the words in a sentence. Since convolutions occur on adjacent words, the model can pick up on negations or n-grams that carry novel sentiment information.
LSTMs and other recurrent neural networks
RNNs are probably the most commonly used deep learning models for NLP and with good reason. Because these networks are recurrent, they are ideal for working with sequential data such as text. In sentiment analysis, they can be used to repeatedly predict the sentiment as each token in a piece of text is ingested. Once the model is fully trained, the sentiment prediction is just the model’s output after seeing all n tokens in a sentence.
RNNs can also be greatly improved by the incorporation of an attention mechanism, which is a separately trained component of the model. Attention helps a model to determine on which tokens in a sequence of text to apply its focus, thus allowing the model to consolidate more information over more timesteps.
Recursive neural networks
Although similarly named to recurrent neural nets, recursive neural networks work in a fundamentally different way. Popularized by Stanford researcher Richard Socher, these models take a tree-based representation of an input text and create a vectorized representation for each node in the tree. Typically, the sentence’s parse tree is used. As a sentence is read in, it is parsed on the fly and the model generates a sentiment prediction for each element of the tree. This gives a very interpretable result in the sense that a piece of text’s overall sentiment can be broken down by the sentiments of its constituent phrases and their relative weightings. The SPINN model from Stanford is another example of a neural network that takes this approach.
Another promising approach that has emerged recently in NLP is that of multi-task learning. Within this paradigm, a single model is trained jointly across multiple tasks with the goal of achieving state-of-the-art accuracy in as many domains as possible. The idea here is that a model’s performance on task x can be bolstered by its knowledge of related tasks y and z, along with their associated data. Being able to access a shared memory and set of weights across tasks allows for new state-of-the-art accuracies to be reached. Two popular MTL models that have achieved high performance on sentiment analysis tasks are the Dynamic Memory Network and the Neural Semantic Encoder.
Sentiment analysis and unsupervised models
One encouraging aspect of the sentiment analysis task is that it seems to be quite approachable even for unsupervised models that are trained without any labeled sentiment data, only unlabeled text. The key to training unsupervised models with high accuracy is using huge volumes of data.
One model developed by OpenAI trains on 82 million Amazon reviews that it takes over a month to process! It uses an advanced RNN architecture called a multiplicative LSTM to continually predict the next character in a sequence. In this way, the model learns not only token-level information, but also subword features, such as prefixes and suffixes. Ultimately, it incorporates some supervision into the model, but it is able to acquire the same or better accuracy as other state-of-the-art models with 30-100x less labeled data. It also uncovers a single sentiment “neuron” (or feature) in the model, which turns out to be predictive of the sentiment of a piece of text.
Moving from sentiment to a nuanced spectrum of emotion
Sometimes simply understanding just the sentiment of text is not enough. For acquiring actionable business insights, it can be necessary to tease out further nuances in the emotion that the text conveys. A text having negative sentiment might be expressing any of anger, sadness, grief, fear, or disgust. Likewise, a text having positive sentiment could be communicating any of happiness, joy, surprise, satisfaction, or excitement. Obviously, there’s quite a bit of overlap in the way these different emotions are defined, and the differences between them can be quite subtle.
This makes the emotion analysis task much more difficult than that of sentiment analysis, but also much more informative. Luckily, more and more data with human annotations of emotional content is being compiled. Some common datasets include the SemEval 2007 Task 14, EmoBank, WASSA 2017, The Emotion in Text Dataset, and the Affect Dataset. Another approach to gathering even larger quantities of data is to use emojis as a proxy for an emotion label. 🙂
When training on emotion analysis data, any of the aforementioned sentiment analysis models should work well. The only caveat is that they must be adapted to classify inputs into one of n emotional categories rather than a binary positive or negative.
MonkeyLearn – A guide to sentiment analysis functions and resources.
Stanford – Reading Emotions From Speech Using Deep Neural Networks, a publication
Coursera – Applied Text Mining in Python video demonstration