There are many metrics via which one can measure the performance of a model. One possible measure is the mean absolute percent error. It is calculated by taking the mean of the absolute value of the actual values minus the predictions divided by the actual values. Another measure of performance is the receiver operating characteristic (ROC) curve. 

The ROC curve is created by plotting the true positive rate of the classifier against the false positive rate at various threshold settings. One can then perform assessments of model performance based on this curve, where models, which achieve a high AUC (area under the curve) are favored. 

One can also measure performance based on a model’s confusion matrix. In binary classification problems, the confusion matrix is a 2×2 matrix containing the true positive, false positive, true negative, and false negative rates. 

Finally, there is model accuracy, which is an important but shallow way to evaluate a model, and is thus not the greatest measure of a model in the sense that it’s only a single number and can only go so far compared to more rich and complex measurement methods as outlined above. However, it’s important to understand how model accuracy is calculated and what it determines.

What is model accuracy? 

Model accuracy is defined as the number of classifications a model correctly predicts divided by the total number of predictions made. It’s a way of assessing the performance of a model, but certainly not the only way. In fact, a wide variety of rich measurements serving this purpose exist, and when considering many of them at once rather than any single one in isolation, accuracy provides the best perspective on how well a model is performing on a given dataset.

There are many instances in which a first pass at model training is insufficient such that the accuracy needs to be increased. This is normal in machine learning—it’s nearly impossible to create a model that perfectly fits all the objectives you initially set out to achieve on your first try, and some iteration on your initial models is necessary. 

How to take a more precise model accuracy reading 

One way to improve model accuracy is by tuning model parameters. For example, if one has a 3-layer neural network classifier with 100 nodes in its hidden layer, one could retry training the neural network with more or fewer nodes. This is called a parameter search, and is best implemented by first making relatively large changes to the parameter one is tuning, and then successively smaller changes as one settles closer to an optimum value. 

The risks of model accuracy

In addition, one could improve accuracy by using a fundamentally different model altogether. For example, it is now widely known that convolutional neural networks are among the best models for classifying images. If one had started out on an image classification problem with a simple logistic regression over pixel values in the image and wasn’t getting the hoped for performance, one could switch to a completely different model like the convolutional neural network for more accurate readings. 

The danger in this approach is that one may have to throw away all the work done with the previous model or models, and face all the risks and unknowns of creating a completely new model. It’s best to take the risk of constructing a completely new model to improve accuracy only when a substantial improvement in accuracy is obtainable and necessary. 

Sometimes it is not possible to tell without substantial effort and testing whether a new model will provide a substantial improvement, and the risk of investing effort in the new model is not worth taking. On the other hand, sometimes it can be determined that a particular new model can provide a substantial improvement, and the risk is worth taking. 

Retrain on new data 

A final way to improve the accuracy of a model is by improving the data that the model is trained on. Self-driving cars are an example of using better data to improve model accuracy. 

Though many aspects of a self-driving car are not classification problems (like determining a speed for the car), many others are, such as determining whether to turn on the blinkers, or take a left turn. Sometimes it is not the sheer amount of data that must be improved but the diversity of data. 

A major challenge of self-driving cars is coverage of edge cases, or events that happen very rarely. In most modern machine learning models, the only way to achieve high accuracy on edge cases is to have seen those edge cases in training—hence why it’s a challenge. 

One way to improve coverage of edge cases in training a self-training model is to obtain driving data from purposefully eccentric driving environments like San Francisco where edge cases happen more frequently. Sometimes it’s the sheer amount of data that matters to a model’s accuracy. 

The neural networks used in self-driving cars are widely known to be very data-hungry, requiring huge amounts of training data to reach optimal performance. 

A model that doesn’t come close to achieving accuracy objectives on one training set can meet them on a similarly constructed but much larger training set. The precision with which the training data set matches the test data set matters and can improve accuracy. It is for this reason that self-driving training sets are often obtained from actual driving rather than from simulations.

How to think about model accuracy

Model accuracy is a very important component in assessing model performance, but is in no way the only metric. When evaluating a model, it is important to do so holistically, taking into account a variety of metrics and heuristics in order to assess how a model is performing on a specific dataset. Otherwise the accuracy metrics is not complete.