Algorithmia Blog - Deploying AI at scale

Deploying on Algorithmia with ONNX Runtime

Topographic map with three trails leading to a central point illustrating that a lot of paths lead to productionization

Simplifying model deployment 

Deploying models should be an uncomplicated endeavor. To that end, Algorithmia continually aims to ensure machine learning’s value is seen, which means getting to model deployment as quickly and as painlessly as possible. Fortunately, there are many paths to get there.

A lot of roads lead to productionization

As a data scientist or machine learning engineer, you’ve learned to use the best tool for the job. You might be using PyTorch for easy debugging and its handling of variable input lengths, or Caffe2 for deploying on mobile or edge devices. Unfortunately, these and other deep learning models all use a different serialization format so you must make inferences in the same language that you saved the model. 

The Open Neural Network Exchange (ONNX) format aims to solve this issue by creating a unified serialization format for deep learning frameworks such as Caffe2, Microsoft Cognitive Toolkit, MXNet, and PyTorch with connectors to more languages and frameworks.

Of course, with the Algorithmia platform, you can deploy your models that have been trained in various deep learning frameworks and serve them anywhere you can consume an API in the language of your choice! 

However, if you already have a model saved in the ONNX format, you can easily deploy it on Algorithmia using the ONNX Runtime library. The ONNX Runtime module was created by Microsoft in order to make it easy to utilize ONNX in a variety of languages.

The onnxruntime-gpu module is downloaded via PyPi and by following the steps below, you can deploy your ONNX model in a Python algorithm in just a few minutes.

ONNX Runtime walkthrough

If you’ve never created an algorithm before, then we suggest going through our Getting Started Guide. Once you work through that, or if you already have experience deploying your models on Algorithmia, you can create your Python algorithm:

And once that’s created, you can add `onnxruntime` to your dependency file in your algorithm:

Note, that because we created a GPU enabled algorithm, we are importing the GPU version of ONNX Runtime from PyPi versus the CPU version.

And now, import `onnxruntime` as usual in your algorithm:

Note that we added a few other imports to process our image data, but to run an onnx model, all you need is to add:  `import onnxruntime` to your algorithm.

On line 13 in the above image, you’ll notice that we are passing in our model that we loaded using the Algorithmia Data API into `onnxruntime.InferenceSession()`.

Once you publish your algorithm, you can make inferences on data passed to your model via our API. Check out our client guides to learn how to serve your model via our API in the language of your choice.

To see this example running on our public instance of Algorithmia check out the full demo for deploying your ONNX model on Algorithmia as well as the docs in our Developer Center

Algorithmia makes it simple to deploy your ONNX model into production in a few easy steps. We handle the scaling, dependency management, and offer a centralized repository for your models. Let us know how we can help bridge the gap between your data scientists and DevOps teams! Let us know how we can help enable your data science teams to deploy their models today.