Algorithmia Blog - Deploying AI at scale

Connectivity in Machine Learning Infrastructure 

ML Life Cycle | Connect, deploy, scale, and manage

As companies begin developing use cases for machine learning, the infrastructure to support their plans must be able to adapt as data scientists experiment with new and better processes and solutions. Concurrently, organizations must connect a variety of systems into a platform that delivers consistent results.

Machine learning architecture consists of four main groups:

  • Data and Data Management Systems
  • Training Platforms and Frameworks
  • Serving and Life Cycle Management
  • External Systems 

ML-focused projects generate value only after these functional areas connect into a workflow.

In part 3 of our Machine Learning Infrastructure whitepaper series, “Connectivity,” we discuss how those functional areas fit together to power the ML life cycle. 

It all starts with data

Most data management systems include built-in authentication, role access controls, and data views. In more advanced cases, an organization will have a data-as-a-service engine that allows for querying data through a unified interface. 

Even in the simplest cases, ML projects likely rely on a variety of data formats—different types of data stores from many different vendors. For example, one model might train on images from a cloud-based Amazon S3 bucket, while another pulls rows from on-premises PostgreSQL and SQL Server databases, while a third interprets streaming transactional data from a Kafka pipeline.  

machine learning architecture

Select a training platform

Training platforms and frameworks comprise a wide variety of tools used for model building and training. Different training platforms offer unique features. Libraries like TensorFlow, Caffe, and PyTorch offer toolsets to train models. 

The freedom of choice is paramount, as each tool specializes in certain tasks. Models can be trained locally on a GPU and then deployed or they can be trained directly in the cloud using Dataiku, Amazon, SageMaker, Azure ML Studio, or other platforms or processors.

Life cycle management systems

Model serving encompasses all the services that allow data scientists to deliver trained models into production and maintain them. Such services include the abilities to ingest models, catalog them, integrate them into DevOps workflows, and manage the ML life cycle. 

Fortunately, each ML architecture component is fairly self-contained, and the interactions between those components are fairly consistent:

  • Data informs all systems through queries.
  • Training systems export model files and dependencies.
  • Serving and life cycle management systems return inferences to applications and model pipelines, and export logs to systems of record.
  • External systems call models, trigger events, and capture and modify data.

It becomes easy to take in data and deploy ML models when these functions are grouped together. 

External systems

External Systems can consume model output and integrate it in other places. Based on the type of deployment, we can create different user interfaces. For example, the model output can integrate into a REST API or another web application. RESTful APIs assist us in calling our output from any language and integrating it into new or existing project. 

Connectivity and machine learning sophistication

Data have made the jobs of business decision makers easier. But data is only useful after models interpret it, and model inference only generates value when external apps can integrate and consume it. That journey toward integration has two routes: horizontal integration and loosely coupled, tight integration.  

The quickest way to develop a functioning ML platform is by supporting only a subset of solutions from each of the functional groups to more quickly integrate each into a horizontal platform. Doing so requires no additional workforce training and adds speed to workflows already in place. 

Unfortunately, horizontal integration commits an organization to full-time software development rather than building and training models to add business value. An architecture that allows each system to evolve independently, however, can help organizations choose the right components for today without sacrificing the flexibility to rethink those choices tomorrow. 

To enable a loosely coupled, tightly integrated approach, a deployment platform must support three kinds of connectivity: 

  • Publish/Subscribe 
  • Data Connectors
  • RESTful APIs


Publish/subscribe (pub/sub) is an asynchronous, message-oriented notification pattern. In such a model, one system acts as a publisher, sending events to a message broker. Through the message broker, subscriber systems explicitly enroll in a channel, and the hub forwards and verifies delivery of publisher notifications, which can then be used by subscribers as event triggers. 

Algorithmia’s AI Layer has configurable event listeners that allow users to trigger actions based on input from pub/sub systems. 

Pub/sub approach

Data connectors

While the model is the engine of any machine learning system, data is both the fuel and the driver. Data feeds the model during training, influences the model in production, then retrains the model in response to drift. 

As data changes, so does its interaction with the model, and to support that iterative process, an ML deployment and management system must integrate with every relevant data connector.


Because there is a variety of requesting platforms and high unpredictability therein, a loose coupling is, again, the most elegant answer. RESTful APIs are the most elegant implementation, due to these required REST constraints:

  • Uniform interface: requests adhere to a standard format
  • Clint-Server: the server only interacts with the client through requests
  • Stateless: all necessary information must be included within a request
  • Layered system: the REST client passes any layers between itself and the server
  • Cacheable: Developers can store certain responses

To learn more about how connectivity feeds into the machine learning life cycle, download the full whitepaper.

And visit our website to read parts 1 and 2 of the Machine Learning Infrastructure whitepaper series.

Enrich Data in Tableau with Machine Learning using Algorithmia

World map showing where earthquakes occur and the time of occurrence

Tableau combines the ease of drag-and-drop visual analytics with the flexibility to dynamically integrate data from analytical systems. Algorithmia lets analysts go even further, extending worksheets with machine learning (ML) and allowing for the execution of Java, Node.js, Python, R, Ruby, Rust, and Scala code directly from within Tableau. 

Take advantage of Algorithmia’s broad catalog of prebuilt algorithms or upload your own ML model or utility code into Algorithmia’s public or enterprise private cloud offerings, then integrate them right into your Tableau Worksheets.

In this blog, we’ll explore a concrete example to show you how to leverage Algorithmia algorithms in your Tableau workflow. 

How to Leverage Algorithmia Algorithms in Tableau

Let’s dive in with an example. The United States Geological Survey (USGS) provides an excellent database of global earthquake data. With Tableau, we can quickly display the location, time, and magnitude of these events on a map. But earthquakes that occur at night may carry a higher risk of injuries/fatalities, since escaping or taking shelter within a collapsing building when visibility is low is harder. 

Time of day can act as a rough proxy for visibility, but a much better measure is the angle of the sun, which is affected by geolocation, date, and time of day. As an extreme example, the North Pole is sunless at high noon on December 21.

To determine visibility during an earthquake, we will enrich our worksheet using the SunMoonCalculator algorithm hosted on This algorithm makes use of a package called SunCalc, written in Node.js, which allows us to easily connect to our worksheet using TabPy and Algorithmia.

PART 1: Using Data From USGS, Create a Map in Tableau

  1. Download the data: The USGS maintains an excellent Earthquake Catalog API that we can use to retrieve earthquake event data, including the geolocation, date/time, and magnitude of each event. This data is available in a variety of formats, including GeoJSON, but for now we’ll use a simple CVS export. Let’s start by downloading a small initial dataset: all magnitude 6+ quakes in 2019.
  2. Open Tableau and create a new Worksheet: Select the menu item Data → New Data Source, and click “Text File.” Select the file you just downloaded and take a minute to explore the raw data. When you’re ready, click the “Sheet 1” tab in the bottom left to continue building the Worksheet.

Connect data source in Tableau

Earthquakes data USGS

  1.  Assign geographic roles: Under “Measures” on the left, you should now see the fields “Longitude” and “Latitude.” Tableau automatically assigns these fields Geographic Roles. You can confirm this by clicking on the pill and picking “Geographic Role” to see how it was assigned.

Define latitude and longitude

  1. Convert dimensions: Click the drop-down menu next to “Longitude,” but this time, select “Convert to Dimension.” Repeat this for Latitude.

Convert to dimension

  1. Create a map: Double-click Longitude then Latitude. Tableau will add these fields to Columns and Rows and automatically render the points on a geographic map.

Create a map

  1. Show size of quakes: To get a feel for the size of these quakes, drag the “Mag” measure onto the “Size” box under “Marks”, This should cause each point to size itself according to the magnitude of the earthquake, though the size differences won’t be particularly extreme since these are all magnitude 6-9 events.

Show earthquake size

PART 2: Locally Install and Connect to TabPy, Tableau’s Python Server

Tableau supports a set of functions that you can use to pass expressions to external services. One of these external services, Tableau Python Server (TabPy), makes it possible to use Python scripts in Tableau calculated fields. This can be run as a local server or installed centrally for your organization. For this example, we’ll use the local server.

  1. Download and install TabPy according to the instructions found at:
  2. Ensure that the Algorithmia library is available in the Python environment under which TabPy will be running (‘pip install algorithmia’ in the relevant Python env or venv).
  3. Start the TabPy server and make note of the port on which it is listening (e.g. the console will read “Web service listening on port 9004”).
  4. Return to Tableau and connect it to the TabPy server: from the menu, pick Help → Settings and Performance → Manage External Service Connections. Choose “TabPy/External API” and make sure the port matches the one shown in the prior step. Test the connection and click OK.

Connect to TabPy server

PART 3: Use Algorithmia to Enrich the View

Now that we have a way to execute Python code from Tableau, we can use Algorithmia’s Python Client to connect to run jhurliman/SunMoonCalculator on our data.

  1. If you do not already have a free Algorithmia account, head here and use promo code “tabpy.” This will put an initial $50 into your account, on top of the 5,000 free credits all users receive every month.
  2. On, click on your avatar, then “Manage API Keys.” Copy your default API key beginning with “sim…”
  3. Back in Tableau, click on Analysis → Create Calculated Field: name it “Sun Altitude.”
  4. Paste the following code, replacing ALGORITHMIA_API_KEY with the key from step 2:
import Algorithmia
import math
client = Algorithmia.client('ALGORITHMIA_API_KEY')
algo = client.algo('jhurliman/SunMoonCalculator/0.1.0')
if _arg1[0] and _arg2[0]:
    input = {'lat': _arg1, 'lon': _arg2, 'time': _arg3}
    response = algo.pipe(input)
    rads = response.result['sun_altitude']
    return int(math.degrees(rads))
    return 0

Before continuing, let’s examine the code to understand what’s going on. 

SCRIPT_INT() is Tableau’s wrapper function for executing Python code via TabPy; there are others, such as SCRIPT_REAL(), but in this case we’ll be returning an integer. The first argument is a String containing the actual Python that should be run. The remaining arguments—ATTR([Measure])—extract data from our Worksheet and pass it into Python as _arg1, _arg2, etc. All passed arguments are provided as List objects to the Python script.

Inside the Python code, we begin by bringing in the Algorithmia client library (as well as the standard math library). Next, we construct an Algorithmia Client instance using our user-specific API Key, which can be used to call Algorithmia functions. Then, we create an Algorithm instance, which is a callable reference to a function hosted on Algorithmia…in this case, jhurliman/SunMoonCalculator.

Then, after checking that we’ve been provided non-empty inputs, we assemble the argument into a single dict corresponding to the expected input format shown in jhurliman/SunMoonCalculator’s documentation, and pass this input to the Algorithm via .pipe(input).

All responses from Algorithmia’s functions contain a .result and a .metadata property, so we’ll descend into the .result and grab the ‘sun_altitude’ value (again, as shown on jhurliman/SunMoonCalculator). This is provided in radians, but Tableau renders integers a bit faster, so we’ll convert to degrees and truncate our return value to the whole integer.

  1. Click OK to save the script, then mouseover the “Sun Altitude” measure on the left and click the drop-down arrow. Pick “Convert to Continuous.” This lets Tableau know the resulting angles represent a range of numbers, so it will colorize them on a continuous range in the next step.
  2. Now that we have a way of calculating sun altitude from Lat/Long/Time values, we want to display them on our map. Tableau makes this easy: just drag “Sun Altitude” onto Color. Wait for the calculations to complete.
  3. Our points are now colorized according to time-of-day, but the colors aren’t quite right. Click Color → Edit Colors, pick “Sunrise-Sunset Diverging”, and check “use full color range.”

Adjust color

  1. If all has gone well, your map points will now be blue for nighttime (very negative) values, yellow near dawn/dusk (values near zero), and red for midday (highly positive):

Time of day plotted by color

PART 4 (optional): Making the Integration More Secure, Efficient, and Ergonomic

While this method of embedding scripts is functional, it has a few flaws: the API Key is exposed inside the Worksheet, the Algorithmia client is re-created on each execution, and the script itself is longer than necessary. Fortunately, we can resolve all these problems by making use of a powerful feature in TabPy—deployed functions.

Open up a Python environment, modify the following code to use your ALGORITHMIA_API_KEY (and your local TabPy server port if not 9004), and run it:

import Algorithmia
import tabpy_client
TABPY_SERVER_URL = 'http://localhost:9004/'
DEBUG = True
def algorithmia(algorithm_name, input):
    if DEBUG: print("algorithm: %sinput: %s\n"%(algorithm_name,input))
        client = Algorithmia.client(ALGORITHMIA_API_KEY)
        algo = client.algo(algorithm_name)
        result = algo.pipe(input).result
    except Exception as x:
        if DEBUG: print(x)
        raise Exception(str(x))
    if DEBUG: print("result: %s"%result)
    return result
tabpy_conn = tabpy_client.Client(TABPY_SERVER_URL)
tabpy_conn.deploy('algorithmia', algorithmia, 'Run a function on Algorithmia: algorithmia(algorithm_name, input)', override=True)

Also note the DEBUG value: keeping this on will print some useful debugging information to your TabPy console, but you’ll probably want to re-run this with DEBUG=False in your production environment.

Head back to Tableau and change your Sun Altitude function to read:

import math
algoname = 'jhurliman/SunMoonCalculator/0.1.0'
if _arg1[0] and _arg2[0]:
    input = {'lat': _arg1, 'lon': _arg2, 'time': _arg3}
    rads = tabpy.query('algorithmia', algoname, input)['response']['sun_altitude']
    return int(math.degrees(rads))
    return 0

Now, the API Key is embedded in your TabPy server instead of being exposed in your script, Algorithmia endpoints are easier to call because they don’t require manual construction of the Algorithmia client each time, and errors will be logged to the TabPy console. If desired, you can modify this code further to use a single global Client instance, or to acquire the API Key from env instead hard-coding it into in the deployed function.


Tableau allows the rapid development and deployment of visualizations so you can take insights generated by advanced analytics and put them into the hands of decision makers. With Algorithmia, you can take those visualizations to another level, enhancing them with machine learning models developed in-house or provided by independent developers on the marketplace.

Note: the code samples included in this post can also be found at:

Algorithmia’s AI Layer makes it easy to deploy models as scalable services, regardless of framework, language, or data source. The AI Layer empowers organizations to:

  • Deploy models from a variety of frameworks, languages, and platforms.
  • Connect popular data sources, orchestration engines, and step functions.
  • Scale model inference on multiple infrastructure providers.
  • Manage the ML life cycle with tools to iterate, audit, secure, and govern.

Best Practices in Machine Learning Infrastructure

Topographic map with binary tree

Developing processes for integrating machine learning within an organization’s existing computational infrastructure remains a challenge for which robust industry standards do not yet exist. But companies are increasingly realizing that the development of an infrastructure that supports the seamless training, testing, and deployment of models at enterprise scale is as important to long-term viability as the models themselves. 

Small companies, however, struggle to compete against large organizations that have the resources to pour into the large, modular teams and processes of internal tool development that are often necessary to produce robust machine learning pipelines.

Luckily, there are some universal best practices for achieving successful machine learning model rollout for a company of any size and means. 

The Typical Software Development Workflow

Although DevOps is a relatively new subfield of software development, accepted procedures have already begun to arise. A typical software development workflow usually looks something like this:

Software Development Workflow: Develop > Build > Test > Deploy

This is relatively straightforward and works quite well as a standard benchmark for the software development process. However, the multidisciplinary nature of machine learning introduces a unique set of challenges that traditional software development procedures weren’t designed to address. 

Machine Learning Infrastructure Development

If you were to visualize the process of creating a machine learning model from conception to production, it might have multiple tracks and look something like these:

Machine Learning Development Life Cycle

Data Ingestion

It all starts with data.

Even more important to a machine learning workflow’s success than the model itself is the quality of the data it ingests. For this reason, organizations that understand the importance of high-quality data put an incredible amount of effort into architecting their data platforms. First and foremost, they invest in scalable storage solutions, be they on the cloud or in local databases. Popular options include Azure Blob, Amazon S3, DynamoDB, Cassandra, and Hadoop.

Often finding data that conforms well to a given machine learning problem can be difficult. Sometimes datasets exist, but are not commercially licensed. In this case, companies will need to establish their own data curation pipelines whether by soliciting data through customer outreach or through a third-party service.

Once data has been cleaned, visualized, and selected for training, it needs to be transformed into a numerical representation so that it can be used as input for a model. This process is called vectorization. The selection process for determining what’s important in the dataset for training is called featurization. While featurization is more of an art then a science, many machine learning tasks possess associated featurization methods that are commonly used in practice.

Since common featurizations exist and generating these features for a given dataset takes time, it behooves organizations to implement their own feature stores as part of their machine learning pipelines. Simply put, a feature store is just a common library of featurizations that can be applied to data of a given type. 

Having this library accessible across teams allows practitioners to set up their models in standardized ways, thus aiding reproducibility and sharing between groups.

Model Selection

Current guides to machine learning tend to focus on standard algorithms and model types and how they can best be applied to solve a given business problem. 

Selecting the type of model to use when confronted with a business problem can often be a laborious task. Practitioners tend to make a choice informed by the existing literature and their first-hand experience about which models they’d like to try first. 

There are some general rules of thumb that help guide this process. For example, Convolutional Neural Networks tend to perform quite well on image recognition and text classification, LSTMs and GRUs are among the go-to choices for sequence prediction and language modeling, and encoder-decoder architectures excel on translation tasks.

After a model has been selected, the practitioner must then decide which tool to implement the chosen model. The interoperability of different frameworks has improved greatly in recent years due to the introduction of universal model file formats such as the Open Neural Network eXchange (ONNX), which allow for the porting of models trained in one library to be exported for use in another. 

What’s more, the advent of machine learning compilers such as Intel’s nGraph, Facebook’s Glow, or the University of Washington’s TVM promise the holy grail of being able to specify your model in a universal language of sorts and have it be compiled to seamlessly target a vast array of different platforms and hardware architectures.

Model Training

Model training constitutes one of the most time consuming and labor-intensive stages in any machine learning workflow. What’s more, the hardware and infrastructure used to train models depends greatly on the number of parameters in the model, the size of the dataset, the optimization method used, and other considerations.

In order to automate the quest for optimal hyperparameter settings, machine learning engineers often perform what’s called a grid search or hyperparameter search. This involves a sweep across parameter space that seeks to maximize some score function, often cross-validation accuracy. 

Even more advanced methods exist that focus on using Bayesian optimization or reinforcement learning to tune hyperparameters. What’s more, the field has recently seen a surge in tools focusing on automated machine learning methods, which act as black boxes used to select a semi-optimal model and hyperparameter configuration. 

After a model is trained, it should be evaluated based on performance metrics including cross-validation accuracy, precision, recall, F1 score, and AUC. This information is used to inform either further training of the same model or the next iterate in the model selection process. Like all other metrics, these should be logged in a database for future use.


Model visualization can be integrated at any point in the machine learning pipeline, but proves especially valuable at the training and testing stages. As discussed, appropriate metrics should be visualized after each stage in the training process to ensure that the training procedure is tending towards convergence. 

Many machine learning libraries are packaged with tools that allow users to debug and investigate each step in the training process. For example, TensorFlow comes bundled with TensorBoard, a utility that allows users to apply metrics to their model, view these quantities as a function of time as the model trains, and even view each node in a neural network’s computational graph.

Model Testing

Once a model has been trained, but before deployment, it should be thoroughly tested. This is often done as part of a CI/CD pipeline. Each model should be subjected to both qualitative and quantitative unit tests. Many training datasets have corresponding test sets which consist of hand-labeled examples against which the model’s performance can be measured. If a test set does not yet exist for a given dataset, it can often be beneficial for a team to curate one. 

The model should also be applied to out-of-domain examples coming from a distribution outside of that on which the model was trained. Often, a qualitative check as to the model’s performance, obtained by cross-referencing a model’s predictions with what one would intuitively expect, can serve as a guide as to whether the model is working as hoped. 

For example, if you trained a model for text classification, you might give it the sentence “the cat walked jauntily down the street, flaunting its shiny coat” and ensure that it categorizes this as “animals” or “sass.”


After a model has been trained and tested, it needs to be deployed in production. Current practices often push for the deploying of models as microservices, or compartmentalized packages of code that can be queried through and interact via API calls.

Successful deployment often requires building utilities and software that data scientists can use to package their code and rapidly iterate on models in an organized and robust way such that the backend and data engineers can efficiently translate the results into properly architected models that are deployed at scale. 

For traditional businesses, without sufficient in-house technological expertise, this can prove a herculean task. Even for large organizations with resources available, creating a scalable deployment solution is a dangerous, expensive commitment. Building an in-house solution like Uber’s Michelangelo just doesn’t make sense for any but a handful of companies with unique, cutting-edge ML needs that are fundamental to their business. 

Fortunately, commercial tools exist to offload this burden, providing the benefits of an in-house platform without signing the organization up for a life sentence of proprietary software development and maintenance. 

Algorithmia’s AI Layer allows users to deploy and serve models from any framework, language, or platform and connect to most all data sources. We scale model inference on multi-cloud infrastructures with high efficiency and enable users to continuously manage the machine learning life cycle with tools to iterate, audit, secure, and govern.

No matter where you are in the machine learning life cycle, understanding each stage at the start and what tools and practices will likely yield successful results will prime your ML program for sophistication. Challenges exist at each stage, and your team should also be primed to face them. 

Face the challenges

From Crawling to Sprinting: Advances in Natural Language Processing 

Gif highlighting different Natural Language Processing branches e.g. prediction, translation

Natural language processing (NLP) is one of the fastest evolving branches in machine learning and among the most fundamental. It has applications in diplomacy, aviation, big data sentiment analysis, language translation, customer service, healthcare, policing and criminal justice, and countless other industries.

NLP is the reason we’ve been able to move from CTRL-F searches for single words or phrases to conversational interactions about the contents and meanings of long documents. We can now ask computers questions and have them answer. 

Algorithmia hosts more than 8,000 individual models, many of which are NLP models and complete tasks such as sentence parsing, text extraction and classification, as well as translation and language identification. 

Allen Institute for AI NLP Models on Algorithmia 

The Allen Institute for Artificial Intelligence (Ai2), is a non-profit created by Microsoft co-founder Paul Allen. Since its founding in 2013, Ai2 has worked to advance the state of AI research, especially in natural language applications. We are pleased to announce that we have worked with the producers of AllenNLP—one of the leading NLP libraries—to make their state-of-the-art models available with a simple API call in the Algorithmia AI Layer.

Among the algorithms new to the platform are:

  • Machine Comprehension: Input a body of text and a question based on it and get back the answer (strictly a substring of the original body of text).
  • Textual Entailment: Determine whether one statement follows logically from another
  • Semantic role labeling: Determine “who” did “what” to “whom” in a body of text

These and other algorithms are based on a collection of pre-trained models that are published on the AllenNLP website.  

Algorithmia provides an easy-to-use interface for getting answers out of these models. The underlying AllenNLP models provide a more verbose output, which is aimed at researchers who need to understand the models and debug their performance—this additional information is returned if you simply set debug=True.

The Ins and Outs of the AllenNLP Models 

Machine Comprehension: Create natural-language interfaces to extract information from text documents. 

This algorithm provides the state-of-the-art ability to answer a question based on a piece of text. It takes in a passage of text and a question based on that passage, and returns a substring of the passage that is guessed to be the correct answer.

This model could feature into the backend of a chatbot or provide customer support based on a user’s manual. It could also be used to extract structured data from textual documents, such as a collection of doctors’ reports could be turned into a table that says (for every report) the patient’s concern, what the patient should do, and when they should schedule a follow-up appointment.

Machine Comprehension screenshot

Screenshot from Machine Comprehension model on Algorithmia.

Entailment: This algorithm provides state-of-the-art natural language reasoning. It takes in a premise, expressed in natural language, and a hypothesis that may or may not follow up from. It determines whether the hypothesis follows from the premise, contradicts the premise, or is unrelated. The following is an example:

The input JSON blob should have the following fields:

  • premise: a descriptive piece of text
  • hypothesis: a statement that may or may not follow from the premise of the text

Any additional fields will pass through into the AllenNLP model.

The following output field will always be present:

  • contradiction: Probability that the hypothesis contradicts the premise
  • entailment: Probability that the hypothesis follows from the premise
  • neutral: Probability that the hypothesis is independent from the premise

Screenshot from Entailment model on Algorithmia.

Semantic role labeling: This algorithm provides state-of-the-art natural language reasoning—decomposing a sentence into a structured representation of the relationships it describes.

The concept of this algorithm is considering a verb and the entities involved in it as its arguments (like logical predicates). The arguments describe who or what does the action of this verb, to whom or what it is done, etc.

Semantic role labeling

Screenshot from Semantic role labeling model on Algorithmia.

NLP Moving Forward

NLP applications are rife in everyday life, and applications will only continue to expand and improve because the possibilities of a computer understanding written and spoken human language and executing on it are endless. 

CTA for browsing NLP algorithms

Use Event-Driven Model Serving on Algorithmia

“Our machine learning infrastructure is a great big Frankenstein of one-offs,” said one data scientist at our Seattle Roadshow. Heads nodded. Every time his data-driven organization needs to integrate with a new system, software development teams hardcode dependencies and schedule jobs inside their model deployment system, creating a collection of ad-hoc integrations that runs a very real risk of breaking if either connected system changes.

Event-driven model serving

Hardcoded connectors to multiple applications.

Machine learning workflows are complex, but connecting all the pieces doesn’t need to be. Data generated by many sources feed many different models. These, in turn, are used by a variety of applications (or even by other models in a pipeline). That’s a lot of moving pieces for any one data scientist to consider. This article will discuss the strategies needed to implement an event-driven architecture.

What Is An Event-Driven Model?

Event-driven programming means that the program execution flow is determined by user-action event processing. Whether it’s a key press, mouse click, or any other user input, the program will react according to the action. For example, an application could be programmed to send event notifications every time a user clicks a certain button. An event-driven approach can be valuable, but it can also be tedious. Next, we’ll discuss some of the options for event-driven architecture.

Loosely Coupled, Event-Driven: The Pub/Sub Pattern

Hardcoding every interaction between a model serving platform and its connected systems is far more work than data scientists should have to handle. Loosely coupled, event-driven architecture, such as the publish/subscribe (pub/sub) pattern, are asynchronous, message-oriented notification patterns commonly found in traditional enterprise software. It’s time they became more common in machine learning.

Event driven model serving | the pub/sub model

The pub/sub model

In a pub/sub model, one system acts as a publisher, sending messages to a message broker, such as Amazon SNS. Through that message broker, subscriber systems explicitly subscribe to a channel, and the broker forwards and verifies delivery of publisher messages, which can then be used by subscribers as event triggers.

Traditional vs. Pub/Sub Implementation

The following is a simple user story describing a routine process an organization might want to follow as it ingests scanned documents:

As a records manager, when new documents are added to an S3 bucket, I want to run them through an OCR model.

A traditional implementation might look like this:

  1. Create a document insertion trigger in the database.
  2. Create a custom, hardcoded function to invoke a model run via API, including delivery verification, failure handling, and retry logic.
  3. Maintain that function as end-user requirements change.
  4. Deprecate that function when it is no longer needed.

Over time, as other business apps want to build onto the document insertion event, a fifth step will be added:

  1. Repeat steps 2 through 4 for every other application of the trigger.

Pros of traditional implementation:

  • Near real-time communication
  • No scheduling or communication burden on ML deployment system

Cons of traditional implementation:

  • Database development team becomes a service organization.
  • Database development team must have knowledge of all downstream use cases involving document ingestion.
  • Architectural changes on either side of the exchange could break existing integrations.
Event Driven Model: A pub/sub implementation using Algorithmia's Event Listeners and an SQS queue

A pub/sub implementation using the Enterprise AI Layer’s Event Listeners and an SQS queue

The pub/sub implementation might look like this:

  1. Create a document insertion trigger in the database.
  2. When the trigger fires, send an event notification to a message broker.

At that point, the owners of the model deployment system would do the following:

  1. Subscribe to the published event feed.
  2. Build a function to invoke a model run via API.

Any other systems wishing to use that event as a trigger would follow the same process independently.

Pros of pub/sub implementation:

  • Near real-time communication
  • No scheduling or communication burden on ML deployment systems
  • Database development team can focus on database development
  • Downstream development teams can build without dependencies
  • Architectural changes far less likely to break existing integrations

The pub/sub pattern’s loose coupling is highly scalable, centralizing the burden of communication and removing it from dependent apps. It is also extremely flexible. By abstracting communications between publishers and subscribers, each side operates independently. This reduces the overhead of integrating any number of systems and allows publishers and subscribers to be swapped at any time, with no impact on performance.

Pub/sub advantages:

  • Flexible, durable integration: Upgrades to component systems won’t break communications workflows. As long as a published system continues to send messages to the appropriate queue, the receiving systems shouldn’t care how it generates those messages.
  • Developer independence: Decoupling publishers and subscribers means teams can iterate at their own paces and ship updates to components without introducing breaking changes.
  • Increased performance: Removing messaging from the application allows Ops to dedicate infrastructure to queueing, removing the processing burden from system components.
  • Modularity: Because components integrate through an intermediary message queue, any component can be added or swapped at any time without impacting the system.
  • Scalability: Since any number of applications can subscribe to a publisher’s feed, multiple systems can react to the same events.

Using Pub/Sub With Algorithmia’s Event Listeners

Screenshot from Algorithmia platform of Event Listeners

Screenshot from Algorithmia’s Enterprise AI Layer platform.

The Enterprise AI Layer provides configurable event listeners, also called an event handler, so users can trigger actions based on input from pub/sub event-driven database systems. In concert with the AI Layer’s automatic API generation, integrated versioning, and multi-language SDKs, this means your ML infrastructure is able to grow. Systems can trigger any version of any model, written in any programming language, trained with any framework. That’s critical as an ML program grows in size and scope and touches every part of a business.

The first of our supported event sources is Amazon SQS, with more to come. Read the Event Listeners tutorial in our Developer Center, give it a try, and let us know what you think!