Neural networks, as their name implies, are computer algorithms modeled after networks of neurons in the human brain. Learn more about neural networks.

Neural networks, as their name implies, are computer algorithms modeled after networks of neurons in the human brain. Like their counterparts in the brain, neural networks work by connecting a series of nodes organized in layers, where each node is connected to neighbors in adjacent layers by weighted edges. These weights are applied in a neural network’s forward pass by a matrix multiplication in the case of fully-connected layers. 

Sitting atop most layers, there is something called an activation function, which squashes a layer’s output to some predefined range. Examples of activation functions are RELU layers or sigmoid layers. A RELU layer takes the maximum of the value of each node and 0. A sigmoid layer applies the sigmoid function to the value at each node. 

Sigmoid function:

$$f(x) = \frac{1}{1 + e^{-x}}$$

Convolutional neural networks

While the simplest types of neural networks are constructed as above, more complicated architectures have been created to handle specialized tasks. One such architecture is called a convolutional neural network (CNN) and is used extensively in computer vision applications. These networks comprise convolutional layers, which apply the convolution of a filter with local areas of an image. 

Convolutional neural networks obtain particularly good performance on image data because the filters can detect similar patterns that may occur repeatedly in the image several times. CNNs also use pooling layers which decrease the resolution of a neural network in order to improve training time and enable weight sharing between network nodes. The input layer to such a neural network is often the set of pixels in the representation of an input image, and the output layer might be a vector assigning probabilities to predefined classes, enabling categorization of an image. 

How do neural networks work?

Neural networks learn via backpropagation, meaning that the gradient of a loss function evaluated on training examples is taken with respect to the network’s parameters, which allows the weights to be adjusted in order to better classify future examples. Because the fully-connected layers of neural networks consist of entire matrices of weights, the process of backpropagation is a very computationally intensive one consisting of many matrix multiplications, and as a result, neural networks are very computationally intensive, both in training and in inference. 

Some machine learning researchers have argued that it is unlikely that neurons in the brain perform this expensive step of backpropagation in order to learn, and that some other method of training neural networks is necessary for machine learning engineers to obtain optimum results. While this is an interesting area of research that may bear fruit as Moore’s Law progresses and the nature of cutting edge neural networks changes, for now it is firmly true that the best performing neural networks learn by the differentiation process of backpropagation.

Deep vs shallow neural networks

Like networks of neurons in the brain, computer neural networks can be either deep or shallow, meaning that they may consist of many layers or relatively few layers. In the first and second wave of neural networks in the 1970s and 1990s, neural networks were almost all shallow because computers were still too inadequate to do all the computation necessary for training deep neural networks. 

Deep neural networks

Without sufficient training data, deep neural networks will give extremely poor results, often exceeded by other computer models with hand-engineered features. For this reason it was not until 2012 that deep neural networks first made a splash in either the research or commercial scene, when a deep neural network won the Imagenet image classification competition by a large margin. 

The amount of data needed to train deep neural networks can be truly immense, and in many commercial systems today it can take weeks or months of training to obtain adequate performance. This is even considering that networks are often trained in parallel across many highly optimized machines with specialized hardware such as ASICs and GPUs. This is a significant limitation of deep neural networks – one could imagine the frustration of a deep learning engineer who learns only after weeks of expensive training that there was an error or bug in the design of the neural network that inhibited performance. 

Shallow neural networks

It’s worth noting that even shallow neural networks require relatively more training data than comparable computer models that may have hand-chosen features. This is because without a human in the loop to specify features by hand, more data will be needed by the computer to learn features that form an adequate representation of the problem. This will be true for any algorithm that learns end-to-end, not only neural networks. 

While there is a clear difference between shallow and deep neural networks, it’s worth noting that it’s a common process in machine learning engineering to scale up a shallow neural network or one with fewer weights to either a larger or deeper neural network. 

Capsule neural networks

Capsule neural networks, pioneered by famed machine learning researcher Geoffrey Hinton represent one such effort to mirror the structure of neural networks in the human brain by using fewer weights. Quoting from Geoffrey Hinton’s 2017 paper on capsule neural networks, “a capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or an object part.” 

A capsule neural network is organized much like a regular neural network, except that the nodes of its layers can be capsules rather than individual neurons. While capsule neural networks have yet to obtain the same results as other types of neural networks, they remain a promising area of research that will potentially benefit from increased computational power in the future. There must be a reason, after all, that capsule neural networks figure so prominently into the human brain.

Why are neural networks used?

One may wonder why neural networks are so frequently used commercially despite the fact that they require so much data and time to train. One answer is that for the same reason the brain is organized into a structure very similar to computer neural networks, many machine learning problems are structured in a way that makes them inherently best solved by neural networks. 

Image classification is a clear example of this phenomenon. While neural networks with all their requirements for data may have been too unwieldy for commercial use during their initial development in the 1970s and 1990s, with the advent of processors like ASICs and GPUs specifically designed to perform the computations needed for neural network training, it is today possible to train such systems end-to-end.

Business applications of neural networks

The most common business application of neural networks is in image recognition and categorization. Since the second wave of neural networks in the 1990s, they have been used in ATMs to capture the dollar amount written on the check and record it in a bank’s database. There have also been tens of billions of dollars of investment targeted at using the image recognition capabilities of neural networks to build self-driving cars. 

Beyond this, applications of neural networks include facial recognition. For example, Facebook most likely uses them as part of their algorithm which suggests friends to tag when you upload a photo. 

Available knowledge on neural networks

The internet is full of resources for learning more about neural networks, including Stanford’s excellent online courses on neural networks and the segments of Coursera’s machine learning course devoted to neural networks. 

It is worth noting however that the results of the highly advanced neural network research groups are often kept secret in an effort to preserve the competitive advantage of the companies sponsoring them. For this reason, the power of the true cutting edge of neural network software is unknown to the public. Luminaries such as Elon Musk have warned that the secret neural network technology produced by research labs like Google’s Deepmind is much more advanced than that which is available to the public. 

While the extent to which this is true is a matter of debate, it is certainly true that because of the high computational cost of training deep neural networks, the cutting edge of neural network research is accessible only to highly advanced and expensive research labs of private companies and entities like OpenAI, and cannot be duplicated on the laptop of a member of the general public. 

Thus, if neural networks stand to bring about a truly intelligent AI system, we may not know until it’s too late.

Continue learning

Convolutional neural nets in PyTorch

What is deep learning?

Algorithmia: the fastest time to value for enterprise machine learning