Algorithmia Insights is a flexible integration solution for machine learning (ML) model performance monitoring that provides access to inference and operational metrics for models and can easily be integrated into any monitoring, reporting, and alerting tool to identify and correct model drift, data skews, and negative feedback loops.
Insights is a feature of Algorithmia Enterprise and provides a metrics pipeline that can be used to instrument, measure, and monitor your ML models in production. You can use Insights to instrument your algorithms, calculate model performance metrics, monitor these metrics for changes, and trigger alerts and retraining jobs to mitigate model risk.
In addition to the existing model and cluster health monitoring tools within Algorithmia that help ensure that your models are up and running, Insights helps you analyze how your models are performing with real-world data by streaming model performance metrics into external monitoring systems, observability platforms, and application performance monitoring tools such as Datadog, Grafana, InfluxDB, New Relic, Kibana, and others.
Watch Algorithmia Insights in action in the video below, then continue reading for a step-by-step walk-through of how to get started.
Achieving real-time situational awareness of model performance
Model performance and evaluation metrics are typically measured after training a model by generating predictions against test data sets. However, when serving models in production, you don’t want to lose sight of model drift, uncertainty, and distribution metrics as those models generate predictions from real-world data.
Even then, monitoring model performance metrics in production meant that you had to overload your API payload to include additional metadata, or you had to log model performance metrics to a file and analyze it separately. This manual approach to model performance monitoring meant that it could be weeks or months until you become aware of an issue with a model in production and need to take steps to resolve it.
With Insights, you can specify custom inference metrics in models and stream these metrics separate from your API payload to perform real-time monitoring for scenarios that involve data drift, concept drift, or variations between model versions.
Using a wide range of available monitoring systems and observability tools, you can construct dashboards to monitor metrics in production, perform anomaly detection on metrics, configure alerts on various thresholds, or even trigger CI/CD jobs to automatically retrain and publish new versions of a model.
Benefits of model performance monitoring
Insights works alongside your model catalog in Algorithmia to help you:
- Identify and correct model drift, data skews, and negative feedback loops
- Quickly detect underperforming models and mitigate model decay
- Identify and alert on which models need to be retrained
- Reduce risk of model failure by identifying and prioritizing which models need attention
- More easily comply with internal and external audits and regulations
Capabilities of model performance monitoring
Insights helps you move models from development to production by enabling the following capabilities:
- Alignment with principles and tools from DevOps, monitoring, and automation as part of a standardized software development lifecycle approach to MLOps and model performance monitoring
- Flexible integrations with monitoring and analytics tools via Kafka metrics pipelines
- Control over which algorithms have metrics exported
- Ability to define custom metrics using any ML framework along with the Python, R, Java, and Scala language clients
- Easy integration with common monitoring and reporting platforms, including Datadog, InfluxDB, New Relic, Grafana, Kibana, and others
How model performance monitoring works
Insights builds on top of the high performance and scalability that the Algorithmia platform provides and is designed to handle a large volume of metrics coming across all versions of all algorithms across your entire model catalog.
Insights can be enabled globally by an Algorithmia cluster administrator and then acts as a global metrics pipeline that can be enabled for any algorithm across your model catalog. The configuration in Algorithmia is as easy as pointing to a Kafka broker and includes support for a secure, encrypted mode that uses SASL/SCRAM authentication. You can then configure a number of different integrations from Kafka to external monitoring systems, observability platforms, and application performance monitoring tools.
Data scientists can then instrument any metric of interest in an algorithm, whether it is related to the input data, model attributes, or output predictions. When algorithms that have Insights enabled are queried, each call to an algorithm emits a metrics payload (separate from the REST API response) to the configured Kafka broker.
The metrics payload contains both operational and inference metrics. Operational metrics include the algorithm name, version, owner, duration, session, and request IDs, which can be used to filter and group model performance metrics in downstream systems. Inference metrics include user-defined metrics that are specified by data scientists depending on the particular algorithm and use case.
Three steps to Algorithmia Insights
Getting started with Insights and measuring model performance metrics is a simple process. All you need to do is enable Insights, instrument your models, and get started observing and/or analyzing the metrics.
Step 1: Enable Insights
To enable Insights, an Algorithmia cluster administrator first configures a connection to Kafka and additional integrations.
Step 2: Report Insights
Once the integrations have been enabled, data scientists can instrument their ML models that use Python, R, Java, and Scala with the
report_insights function, then enable Insights when publishing a new version.
Step 3: Monitor Insights
MLOps and DevOps engineers can then configure dashboards and alerts to observe and consume Insights.
That’s it! Now data scientists can instrument any of their algorithms across your entire model catalog in Algorithmia.
Now that we’ve enabled the capability to monitor model performance metrics in Algorithmia, let’s walk through a few example use cases of how you might use this functionality in real-world ML scenarios.
Example: Monitoring for changes in target variables (concept drift)
Deployed models tend to decay and provide less accurate predictions over time due to assumptions in the model parameters and training data. As a result, concept drift can happen when the classification of a target variable changes over time. For example, the relative measure of a high-risk or low-risk credit profile can change due to variations in the economy, buying patterns, or market behavior.
With Insights, you can instrument model predictions and confidence scores along with other custom metrics and have them included as part of the inference metrics that you are monitoring in production.
For example, consider a credit card approval model that takes inputs related to a customer’s demographic information and determines whether they are approved for a credit card application along with their credit risk score. Under typical conditions, you can monitor the approval rate and statistical metrics on the calculated credit risk scores.
Under atypical conditions, a certain economic event or situation can occur for cases that the model or its features don’t account for. If the model is suddenly (incorrectly) classifying a large percentage of applicants as high risk and denying their credit card applications, then we’d want to detect an anomaly in our calculated risk scores and become aware of this situation as soon as possible.
Because of these monitors and alerts, we can more easily identify this issue, then address it by retraining our model and taking into account additional features than we may not have initially considered.
Example: Monitoring for changes in input data (data drift)
Even though you’ve deployed a complex and robust ML model, the inputs to the model from the real world might not behave as well as your training and test data sets do. Input data can change based on seasonality, shifts in customer behavior, product releases in new geographical regions, or a number of other external factors.
With Insights, you can instrument incoming input data and have it included as part of the inference metrics that you are monitoring in production.
For example, consider a diabetes risk model that takes inputs related to a patient’s health condition and predicts their risk in terms of disease progression. Under typical conditions, you could monitor the trends and distributions of incoming input data related to a patient’s age, blood pressure, blood glucose level, and so on.
Under atypical conditions, the model might start receiving a disproportionate amount of inputs that are skewed compared to the training and test data sets. For example, the model might have been trained on a data set that had a maximum patient age of 70 years, but our average input age is coming in higher than the maximum value. We’d want to know if there is a sudden change in the input data such that our model is making predictions for an input that is mostly out of range of the training and test data sets.
Because of these monitors and alerts, we can more easily identify this issue, then address it by retraining our model and taking into account data for patients in a higher age range than we initially considered.
Example: Monitoring for deviations between model versions
Algorithmia makes it easy to deploy a new version of a model while keeping older versions of the model accessible for continuity and testing purposes. Canary models or shadow deployments are often used to determine if a new version of model is ready to be used in a production setting.
With Insights, you can instrument model performance metrics and compare results between different versions of a model in production.
For example, consider a vehicle efficiency model that takes inputs related to a vehicle’s weight and predicts its efficiency in terms of miles traveled per gallon of fuel consumed. Under typical conditions, you could monitor predictions related to vehicle efficiency between two or more versions of a model.
Different versions of a model might have been trained using different data sets or features. We’d want to know if model predictions from a newer version compared to the current version deviate more than a specified threshold. And ideally, we’d want to know about this condition before the newer version of the model starts receiving some or all of the production traffic.
Because of these monitors and alerts, we can more easily identify this issue, then address it by rolling back to an older version of our model while tracking down the cause of the variations in the newer model.
Get started with Algorithmia Insights
Without comprehensive monitoring and centralized metrics collection, organizations struggle with model drift, risk of failure, and inability to meet performance targets in response to shifts in environment and customer behavior. Algorithmia Insights gives you the ability to stream model performance metrics into external monitoring systems, observability platforms, and application performance monitoring tools to utilize features such as anomaly detection and alerts.
Want to see more of Algorithmia Insights? Join us for a deep-dive webinar on November 17, where we’ll demonstrate Algorithmia Insights, explore a variety of use cases, and show you how you can get started today. We look forward to seeing you there!