Algorithmia Blog - Deploying AI at scale

Shorten deployment time with Algorithmia

Pie graph of ML projects deployed

Every year, millions of dollars are wasted planning, cleaning, and training machine learning (ML) models that will never get to production. This means that more than half of data science projects are not fully deployed—and some never will be, resulting in zero generated revenue.

When organizations are asked about their machine learning business challenges, deployment difficulties are cited as the biggest obstacle. 

Reduce waste; increase value

The solution looks simple at first:

  • Make it fast and simple to deploy ML models
  • Reduce the learning curve
  • Stop asking data scientists to do DevOps
  • Automate where possible and measure the results
  • Reduce model deployment time from months (or years) to minutes

But let’s deep dive into how to make these solutions feasible.

Remove barriers to deployment

If a data science department is isolated, it may not have a reliable DevOps team and must learn to deploy its own models. However, when data scientists are tasked with deploying models, they face a number of challenges: 

  • They must learn a wide range of DevOps-related skills that are not part of their core competencies.
  • They will spend a lot of time learning to properly containerize models, implement complex serving frameworks, and design CI/CD workflows.

This pulls them away from their primary mission of designing and building models, and often working on the challenges above have varying degrees of success.

But let’s say an IT or DevOps team is available, now the data scientists are faced with a new set of challenges:

  • IT is used to working with conventional application deployments, which differ from ML in a number of ways, often requiring a unique “snowflake” environment for each individual model. 
  • Information security restrictions further complicate deployment, requiring various levels of isolation and auditing. Because ML models are opaque, and data science teams follow non-standard development practices, IT is often unable to add sufficient tooling, such as fine-grained logging or error-handling. 

From there, developers are typically borrowed from other teams to help solve these problems—for example, writing wrapper code to add company-standard authentication—which can cause further slowdowns and resource consumption.

ML infrastructure layout

Reduce the learning curve

To succeed in ML efforts, companies must reduce the breadth of knowledge each individual team is responsible for, allowing them to specialize in their own practices. When they are able to do so, the learning curve for each team can be reduced and they can quickly scale up their own activities and projects.

Stop asking data scientists to do DevOps

A key mechanism for enabling this is a managed platform for the deployment, serving, and management of ML models. This platform provides the following benefits:

  • Separation of concerns: data scientists can focus on model building and app developers can focus on integration.
  • Low DevOps: managed platforms require minimal oversight and DevOps never need to be involved in the deployment of an individual model.
  • Reduced manual tool-building: authentication, data connectors, monitoring, and logging are built-in.

Selecting the right platform for ML is critical. At minimum, it should:

  • Provide deployment, serving, and model management within a single environment to enhance accessibility and reduce tool thrashing.
  • Allow app developers and other consumers to quickly locate, test, and integrate appropriate models.
  • Support any language and framework so data scientists are not constrained in their development.
  • Allow data scientists to write code, not containers, so they can remain focused at the model level.
  • Not require any deep proprietary changes to models, cause vendor lock-in, or tie model-serving to a specific model-training platform.
  • Embed within a choice of private cloud, with full portability between infrastructure providers.
  • Work with existing choices for CI/CD and other DevOps tooling.

ML infrastructure diagram

Go the last mile

With the problems of model deployment and serving contained, your company can focus on creating and tracking value and ROI.

By providing application developers with a searchable, structured way of locating models, cross-departmental communication barriers can be reduced to zero. The instant that the ML team releases a model for general consumption, it becomes discoverable and usable by developers. Developers can consume the model with a global API and have cut-and-paste code ready to drop into any application or service.

As models are added and consumed, oversight becomes key. Monitoring, logging, and showbacks allow for seamless continuous operation while demonstrating the value of each model. Companies can properly allocate resources across teams and prove ROI on each individual project.

ML infrastructure plugging into apps and code

Start deploying today

Don’t become the alchemy Gartner warned about: “Through 2020, 80 percent of AI projects will remain alchemy, run by wizards whose talents will not scale in the organization” (Gartner). 

Take stock of your company-wide ML initiatives. If you’re not deploying all of your models into production, Algorithmia can help you get there. If your data science teams are running their own DevOps, or your IT team is being overloaded with ML needs, our managed solution is the right tool to get your model productionization back on track.

Algorithmia is the leader in machine learning deployment, serving, and management. Our product deploys and manages models for Fortune 100 companies, US intelligence agencies, and the United Nations. Our public algorithm marketplace has helped more than 90,000 engineers and data scientists deploy their models.

Continuous deployment for machine learning—a re:Invent talk by Diego Oppenheimer

Algorithmia coming to AWS re:Invent

Meet with us!

AWS re:Invent is next month, and we are pleased to announce that Algorithmia CEO, Diego Oppenheimer, will be speaking on the new software development lifecycle (SDLC) for machine learning. Often we get variations on this question: how can we adapt our infrastructure, operations, staffing, and training to meet the challenges of ML without throwing away everything that already works? Diego is prepared with answers. His talk will cover how machine learning (ML) will fundamentally change the way we build and maintain applications.

Currently, many data science and ML deployment teams are struggling to fit an ML workflow into tools that don’t make sense for the job. This session will help clarify the differences between traditional and ML-driven SDLCs, cover common challenges that need to be overcome to derive value from ML, and provide answers to questions about current technological trends in ML software. Finally, Diego will outline how to build a process and tech stack to bring efficiency to your company’s ML development. 

Diego’s talk will be on 4 December at 1:40pm in the Nuvola Theater in the Aria. 

Coming soon: the 2020 State of Enterprise Machine Learning Report

Additionally, Diego will share insights from our upcoming 2020 State of Enterprise Machine Learning survey report, which will be an open-source guide for how the ML landscape is evolving. The report will focus on these findings:

  1. Shifts in the number of data scientists employed at companies in all industries and what that portends for the future of ML
  2. Use case complexity and customer-centric applications in smaller organizations
  3. ML operationalization (having a deployed ML lifecycle) capabilities (and struggles) across all industries
  4. Trends in ML challenges: scale, version-control, model reproducibility, and aligning a company for ML goals
  5. Time to model deployment and wasted time
  6. What determines ML success at the producer level (data scientist and engineer) and at the director and VP level

Pick up a copy of the report at Algorithmia’s booth.

Diego and his team will be available throughout the week to answer questions about infrastructure specifics, ML solutions, and new use cases at Booth 311

Meet with our team

If you or your team will be in Las Vegas for re:Invent this year, we want to meet with you. Our sales engineers would love to cater a demo of Algorithmia’s product for your specific needs and demonstrate our latest features. Book some time with us!

Read the full press report here.

Customer churn prediction with machine learning

Illustration of revolving door with customers leaving

Why is churn prediction important? 

Defined loosely, churn is the process by which customers cease doing business with a company. Preventing a loss in profits is one clear motivation for reducing churn, but other subtleties may underlie a company’s quest to quell it. Most strikingly, the cost of customer acquisition usually starkly outweighs that of customer retention, so stamping out churn also compels from a more subtle financial perspective. 

While churn presents an obvious difficulty to businesses, its remedy is not always immediately clear. In many cases, and without descriptive data, companies are at a loss as to what drives it. Luckily, machine learning provides effective methods for identifying churn’s underlying factors and proscriptive tools for addressing it.

Methods for solving high churn rate

As with any machine learning task, the first, and often the most crucial step, is gathering data. Typical datasets used in customer churn prediction tasks will often curate customer data such as time spent on a company website, links clicked, products purchased, demographic information of users, text analysis of product reviews, tenure of the customer-business relationship, etc. The key here is that the data be high quality, reliable, and plentiful. 

Good results can often still be obtained with sparse data, but obviously more data is usually better. Once the data has been chosen, the problem must be formulated and the data featurization chosen. It’s important that this stage be undertaken with an attention to detail, as churn can mean different things to different enterprises. 

Types of churn

For some, the problem is best characterized as predicting the ratio of churn to retention. For others, predicting the percentage risk that an individual customer will churn is desired. And for many more, identifying churn might constitute taking a global perspective and predicting the future rate at which customers might exit the business relationship. All of these are valid qualifications, but they must be chosen consistently across the customer churn prediction pipeline.

Once the data has been chosen, prepped, and cleaned, modeling can begin. While identifying the most suitable deep learning prediction model can be more of an art than a science, we’re usually dealing with a classification problem (predicting whether a given individual will churn) for which certain models are standards of practice. 

For classification problems such as this, both decision trees and logistic regression are desirable for their ease of use, training and inference speed, and interpretable outputs. These should be the go-to methods in any practitioner’s toolbox for establishing a baseline accuracy before moving onto more complex modeling choices. 

For decision trees, the model can be further tweaked by experimenting with adding random forests, bagging, and boosting. Beyond these two choices, Convolutional Neural Networks, Support Vector Machines, Linear Discriminant Analysis, and Quadratic Discriminant Analysis can all serve as viable prediction models to try. 

Defining metrics with customer data 

Once a model has been chosen, it needs to be evaluated against a consistent and measurable benchmark. One way to do this is to examine the model’s ROC (Receiver Operating Characteristic) curve when applied to a test set. Such a curve plots the True Positive rate against the False Positive rate. By looking to maximize the AUC (area under the curve), one can tune a model’s performance or assess tradeoffs between different models. 

Another useful metric is the Precision-Recall Curve, which, you guessed it, plots precision vs. recall. It’s useful in problems where one class is more qualitatively interesting than the other, which is the case with churn because we’re interested in the smaller proportion of customers looking to leave than those who aren’t (although we do care about them as well). 

In this case, a business would hope to develop potential churners with high precision so as to target potential interventions at them. For example, one such intervention might involve an email blast offering coupons or discounts to those most likely to churn. By carefully selecting which customers to target, businesses can allay the cost of these redemptive measures and increase their effectiveness.

Sifting through insights from model output 

Once the selected model has been tuned, a post-hoc analysis can be conducted. An examination of which input data features were most informative to the model’s success could suggest areas to target and improve. The total pool of customers can even be divided into segments, perhaps by using a clustering algorithm such as k-means. 

This allows businesses to hone in on the particular markets where they may be struggling and custom tailor their churn prevention approaches to meet those markets’ individual needs. They can also tap into the high interpretability of their prediction model (if such an interpretable model was selected) and use it to identify the decisions which led those customers to churn.

Combating churn with machine learning

While churn prediction can look like a daunting task, it’s actually not all that different from any machine learning problem. When looked at generally, the overall workflow looks much the same. However, special care must be given to the feature selection, model interpretation, and post-hoc analysis phases so that appropriate measures can be taken to alleviate churn. 

In this way, the key skill in adapting machine learning to churn prediction lies not in any particular, specialized model to the task but in the domain knowledge of the practitioner and that person’s ability to make knowledgeable business decisions given the black box of a model’s output.

Perfect order fulfillment: a Tevec case study

Shipping containers

Read the case study

Algorithmia is fortunate to work with companies across many industries with varied use cases as they develop machine learning programs. We are delighted to showcase the great work one of our customers is doing and how the AI Layer is able to power their machine learning lifecycle.

Tevec is a Brazil-based company that hosts Tevec.AI, a supply chain recommendation platform that uses machine learning to forecast demand and suggest optimized replenishment/fulfillment order for logistics chains. Put simply, Tevec ensures retailers and goods transport companies deliver their products to the right place at the right time.

In founder Bento Ribeiro’s own words, the “Tevec Platform is a pioneer in the application of machine learning for the recognition of demand behavior patterns, automating the whole process of forecasting and calculation of ideal product restocking lots at points of sale and distribution centers, allowing sales planning control, service level, and regulatory stocks.”

Tevec runs forecasting and inventory-optimization models and customizes user permissions so they can adjust the parameters of their inventory routine, such as lead times, delivery dates, minimum inventory, and service levels. Users can fine-tune the algorithms and adapt for specific uses or priorities. 

The challenge: serving and managing at scale

Initially, Tevec was embedding ML models directly into its platform, causing several issues:

  • Updating: models and applications were on drastically different update cycles, with models changing many times between application updates
  • Versioning: model iterating and ensuring all apps were calling the most appropriate model was difficult to track and prone to error
  • Data integrations: manual integrations and multi-team involvement made customization difficult
  • Model management: models were interacting with myriad endpoints such as ERP, PoS systems, and internal platforms, which was cumbersome to manage

Algorithmia provides the ability to not worry about infrastructure and guarantees that models we put in production will be versioned and production-quality.”  

Luiz Andrade, CTO, Tevec

The solution: model hosting made simple with serverless microservices

Tevec decoupled model development from app development using the AI Layer so it can seamlessly integrate API endpoints, and users can maintain a callable library of every model version. Tevec’s architecture and data science teams now avoid costly and time-consuming DevOps tasks; that extra time can be spent on building valuable new models in Python, “the language of data science,” Andrade reasons. That said, with the AI Layer, Tevec can run models from any framework, programming language, or data connector—future-proofing Tevec’s ML program.

With Algorithmia in place, Tevec’s data scientists can test and iterate models with dependable product continuity, and can customize apps for customers without touching models, calling only the version needed for testing. 

Algorithmia’s serverless architecture ensures the scalability Tevec needs to meet its customers demands without the costs of other autoscaling systems, and Tevec only pays for compute resources it actually uses.

Looking ahead

Tevec continues to enjoy 100-percent year-on-year growth, and as it scales so will its ML architecture deployed on Algorithmia’s AI Layer. Tevec is planning additional products beyond perfect order forecasts and it is evaluating new frameworks for specific ML use cases—perfect for the tool-agnostic AI Layer. Tevec will continue to respond to customer demands as it increases the scale and volume of its service so goods and products always arrive on time at their destinations.

Algorithmia is the whole production system, and we really grabbed onto the concept of serverless microservices so we don’t have to wait for a whole chain of calls to receive a response.”

Luiz Andrade, CTO, Tevec

Read the full Tevec case study.

AI and the Cloud: Cloud Machine Learning

If you’ve been keeping informed of what’s happening in the AI and machine learning world, you’ve probably heard a lot of talk about this nebulous thing called the cloud. While the cloud is often used to describe a variety of offerings for decentralized computing, there’s an underlying similarity between all such services. 

Use cases for cloud machine learning

Simply put, the cloud consists of collections of anonymous servers housed by tech companies in server farms, and the use cases for the cloud are endless. These servers are used to do everything from running the latest high tech machine learning algorithms on your data to hosting your website to serving as cloud storage for your photography collection. 

Using the cloud is a vital component of most tech businesses in this new AI age, and whoever ends up dominating the market will stand to become entrenched for years to come.

Costs and benefits of a cloud AI platform

For AI and machine learning, the key benefit of the cloud to practitioners lies in the fact that for most people, setting up and hosting their own machine learning infrastructure is prohibitively expensive. Entry-level GPU cards for training machine learning models run close to $1,000, and the best cards run 2-4 times that. Of course, for many models you achieve greater training speeds by running cards in parallel, but doing so requires purchasing multiple cards and networking them together—no easy feat. 

On top of this, you need to house the cards in a desktop of some sort with sufficiently powerful cooling capabilities to prevent overheating. Then you need to factor in the costs of supplying power to the system, as training machine learning models is incredibly resource-intensive. After all is said and done, in order to build an elite machine learning hardware setup, you’re looking at startup costs of potentially over $10,000, and this isn’t even taking into account what would be involved if you were interested in using more specialized hardware such as TPUs or FPGAs.

Serverless ML architectures offer potentially infinite scalability when run on cloud services, and their real-time scaling produces minimal waste, generating only the resources needed to respond to demand. For these reasons, serverless is the clear choice for cloud-based machine learning. However, without proper configuration, organizations run the risk of underprovisioning resources in their quest for efficiency.

Using the cloud with trained models

Getting started with training models on the cloud is incomparably simple. Using a cloud provider, you can simply choose a machine with compute power sufficient for your task, spin up an instance, load your libraries and code, and be off to the races. 

Serverless costs range anywhere from a few cents to a few dollars per hour, and you only pay for the time you use. You can shut off the machine whenever you like, and of course you don’t have to deal with all the costs involved in hardware setup, failure, and maintenance. 

Hardware for cloud AI platforms 

Certain cloud providers also give access to niche hardware that’s not available anywhere else. For example, using GCP you can train your machine learning models on TPUs, specialized processors designed to handle complex tensor arithmetic. Other platforms offer access to FPGAs. 

For most people and most workloads, it’s hard to beat the diversity of hardware options and affordable pay-as-you-go model that the cloud provides. That’s not to say that running applications on the cloud will always be inexpensive. For example, it costs OpenAI over $250/hr just to train their latest NLP language model, GPT-2. 

Hosting models in the cloud

The cloud isn’t just for training models—it’s used for hosting them too. Data scientists and developers can package their trained models as services and then deploy them to generate online predictions. Cloud services can also provide useful analytics to hosts about server load and how many times their model was queried.

Avoiding lock-in

For enterprises, choosing a cloud service is an important step in establishing a tech stack because switching providers downstream can often be difficult. Once an organization couples its code, developer team, and infrastructure to a specific framework or service, those choices can be hard to undo, simply due to how hierarchical the development process is. 

Code is built atop code, and making changes in the core libraries often involves rewriting and reworking a sizable portion of the code base. What’s more, many services have specific frameworks and AI platforms tied to their usage. AWS uses SageMaker, and GCP is optimized for use with TensorFlow. GCP also provides a service called Cloud AutoML, which will automate the process of training a machine learning model for you. 

Algorithmia’s AI Layer supports any cloud-based model deployment and serving need so users can avoid vendor lock-in. We have built-in tools for versioning, serving, deployment, pipelining, and integrating with your current workflows. 

The AI Layer integrates with any data connectors your organization is currently using to make machine learning easier, taking you from data collection to model deployment and serving much faster. 

As AI research progresses and becomes more accessible, the only thing that’s clear is that the cloud is a key component of the evolving AI landscape and will continue to be for the foreseeable future. 

Interested in learning more about the AI Layer? Get a demo to see if the AI Layer is the right solution for your organization.