Illustrations of a tree, an ice cream cone, and a t-shirt

Types of machine learning

Machine learning comes in three basic types: supervised, unsupervised, and reinforcement learning. Reinforcement learning follows a different paradigm from the other two, so we’ll leave it for another post. 

The most common form of machine learning, and the most prototypical, is supervised learning. Supervised learning is exciting because it works well in analogy with the way humans actually learn. In supervised tasks, we present the computer with a collection of labeled data points called a training set (for example a set of readouts from heart and blood pressure monitors on a set of patients along with labels as to whether they’ve experienced a stroke in the past 30 days.)

From such a dataset, a supervised machine learning algorithm could use the labels to recognize commonalities in the examples where a patient had a stroke and commonalities in the cases where the patient remained healthy. Using this insight gained on the training set, the algorithm can then generalize to a selection of unseen, unlabeled data called the test set and (hopefully accurately) predict whether a new patient is likely to experience a stroke, based on the readouts from the monitors.

Supervised learning overview

The central question of supervised learning is how do we best devise a system that will teach an algorithm to recognize useful patterns in the data given the labeled examples of the training set? Most algorithms use something called a cost or loss function in order to obtain a quantitative measurement of how well the algorithm is performing on the labeled data. The loss function takes in two arguments, the correct label of a training example and the label predicted by the machine learning algorithm and computes a loss value corresponding to how well the algorithm performed on the prediction task. 

In many ways, this is similar to how we as humans learn. As children, we stumble about in our environments and make mistakes. For example, a toddler who has only ever seen dogs and has never seen a cat, may point at a cat and say “doggy!” In these instances where mistakes occur, parents or teachers intervene and gently correct the child, who learns how to label a cat when he or she sees one in the future. 

In the same way, knowing the loss value allows the machine learning algorithm to recompute its parameters so that it can generate better predictions, and produce lower loss values, on the next pass over the training data. This process is repeated until the algorithm finally settles on a minimum loss value past which it can no longer improve.

In a nutshell, that is how supervised learning works. Of course, there are hundreds of different supervised learning algorithms that exist and each exhibits its own particularities, but in most cases the general process remains the same. The domain of supervised learning is huge and includes algorithms such as k nearest neighbors, convolutional neural networks for object detection, random forests, support vector machines, linear and logistic regression, and many, many more.

Unsupervised learning

Unsupervised learning is the opposite of supervised learning. In unsupervised learning, the algorithm tries to learn some inherent structure to the data with only unlabeled examples. Two common unsupervised learning tasks are clustering and dimensionality reduction. In clustering, we attempt to group data points into meaningful clusters such that elements within a given cluster are similar to each other but dissimilar to those from other clusters. Clustering is useful for tasks such as market segmentation. For example, suppose a business has data about customers, such as demographic information and their purchasing behavior. They might want to identify subsegments of the market where a particular product sells extremely well and others where it performs poorly. In this case, they could use an unsupervised clustering algorithm such as k-means or hierarchical clustering to identify those strong and weak customer bases.

Dimensionality reduction use cases

In dimensionality reduction, we are presented with data in a very high dimensional space, and we ultimately want to project that same data into a much lower dimensional space so that it becomes more interpretable. For example, in word2vec, a natural language processing method devised at Google, the algorithm reads in huge corpora (large volumes of text) and creates vectors for each word encountered. 

The naive representation would create vectors the size of the vocabulary (tens of thousands of words), but word2vec creates ones of anywhere from 50 to 300 dimensions. It also looks at the words in their textual context and embeds the vectors such that words that share similar contexts are given similar vector representations. This allows the algorithm to capture abstract meaning as conveyed by the texts. 

Word2Vec uses a training procedure in which a heuristic, labeled dataset is created from raw, unlabeled data. While this is still unsupervised learning, it’s also often given the special name semi-supervised learning to account for the fact that the algorithm creates its own internal type of supervision.

Another dimensionality reduction algorithm commonly used in practice is called Principal Components Analysis, or PCA. In PCA, the data undergoes a transformation so it’s represented in a new coordinate system where the coordinate axes are called principal components. Projecting along the principal components is equivalent to projecting along the directions of the largest variance in the data, and analysis of these principal components conveys a wealth of information about the dataset. 

More examples of unsupervised learning

Other common unsupervised algorithms include Singular Value Decomposition (SVD), Locally Linear Embedding, Gaussian Mixture Models, Variational Autoencoders, and Generative Adversarial Networks (GANs). Many unsupervised learning algorithms attempt to mimic human creativity in some way, and they are used in applications ranging from the recommendation systems employed by companies such as Netflix and Spotify to systems for generating art and 3D models for various applications, such as video games by companies like Nvidia.

Further reading

Introduction to sentiment analysis

More on types of supervised and unsupervised models

Unsupervised learning and emotion recognition