Learn how to use Algorithmia and Determined AI together in a streamlined workflow to train a deep learning model and put it into production at scale.

Algorithmia and Determined AI: How to train and deploy deep learning models with the Algorithmia-Determined integration

Editor’s note: Today’s post is a guest post by Hoang Phan, Solutions Engineer at Determined AI, in collaboration with Aslı Sabancı, Applied Machine Learning Engineer at Algorithmia. It was originally published on the Determined AI blog.

Determined AI is an open-source deep learning training platform that makes building models fast and easy, featuring seamless distributed training and efficient hyperparameter tuning. With its focus on the training portion of model development, Determined works with other best in breed tools for serving models after they have been trained in Determined. Algorithmia, a leading machine learning operations (MLOps) platform, integrates perfectly with Determined to allow users of both platforms to train their models and easily serve them at scale, delivering value from AI for their businesses.

This blog post will show you how to use Algorithmia and Determined AI together in a streamlined workflow to train a deep learning model and then put it into production at scale.

Introduction to the Algorithmia and Determined AI Integration

The machine learning pipeline consists of many components—in particular, model development and model serving. These components typically present a challenge to both infrastructure and model developer teams, as they require a balance between managing complex hardware while allowing users flexibility in developing and serving models. Determined AI and Algorithmia both tackle these challenges by simplifying management of the underlying infrastructure of training and serving models, respectively, while still enabling advanced capabilities in these areas for users.

Determined AI provides a platform to manage a cluster that can train your models quickly with distributed training, tune it with advanced hyperparameter search, and manage the most performant checkpoints for your trained models. Although using these checkpoints locally is straightforward, most users will want to deploy their best models to a more scalable endpoint. This is where Algorithmia comes in.

Algorithmia is MLOps software that manages all stages of the ML lifecycle within existing operational processes. With Algorithmia, teams and enterprises can put models into production quickly, securely, and cost-effectively. Algorithmia automates ML deployment, optimizes collaboration between operations and development, leverages existing SDLC and CI/CD systems, and provides advanced security and governance—so companies can get their models out of the lab and into production, delivering value from AI for their businesses.

Algorithmia enables fast deployment of serverless code, which simplifies deploying your machine learning models at scale. Since Algorithmia manages the hardware for serving behind the scenes, you simply provide the code you’d like to execute and Algorithmia can scale that execution on CPU and GPU enabled hardware.

The Algorithmia-Determined integration facilitates the interaction between Determined and Algorithmia. Instead of only running inference locally or manually setting up an endpoint and maintaining it, the Algorithmia-Determined integration allows you to seamlessly deploy a model trained on Determined to an endpoint on Algorithmia. This provides a few key advantages:

  • Reduce complexity: Users can easily create a pipeline that trains and tunes a model in Determined, then deploy it to a scalable endpoint without needing to maintain additional complex infrastructure on the serving end.
  • Manage models effectively: Model training and tuning happens natively in Determined, where users can easily query the most performant models and manually or automatically push it to a serving endpoint.
  • Inference code can be standardized: Code used locally to test a checkpoint during training can be reused for the serving endpoint after training is complete.

The way the integration works is straightforward: Users continue to train and tune models in Determined as they wish. Then, once the model is trained, the final model checkpoint is pushed to Algorithmia, along with inference code, to create the serving endpoint.

Train on Determined, deploy with Algorithmia

Determined provides best in class model training—producing model artifacts that can be used in many downstream applications. Paired with Algorithmia, you can train a deep learning model and serve it at scale in a few easy steps.

In this example, we’ll show how you can get started training an object detection model on Determined and then create an endpoint on Algorithmia to scale out serving the model.

Example overview

The example first reviews how to deploy a Determined cluster and train your first model on Determined. Once you have a model trained, you can retrieve the best checkpoint for the model with Determined’s checkpoint API:

checkpoint = Determined().get_experiment(experiment_id).top_checkpoint()
model = Determined().create_model(MODEL_NAME)
model.register_version(checkpoint.uuid)

Then, it’s a straightforward process to run prediction locally using this checkpoint and Determined’s predict function:

model = Determined().get_model(MODEL_NAME)
trial = model.get_version().load()
inference_model = trial.model

from predict import predict
predict(inference_model, 'test.jpg', inference="local")

However, what we really want to do is scale out our serving. To do this, we can create a new algorithm on Algorithmia:

algo_utility.create_algorithm("pytorch-1.5.x")

Then, we can clone it locally:

algo_utility.clone_algorithm_repo()

With the repo cloned, we can update the serving code with our predict function and push it to finalize the serving endpoint:

algo_utility.push_algo_script_with_dependencies(filenames=[
    f"{ALGORITHM_NAME}.py",
    "predict.py",
])

Once the algorithm code has been pushed, we can easily make a prediction, with the endpoint being hosted by Algorithmia on the back end:

algo_result = algo_utility.call_latest_algo_version({
    "img_path": TEST_IMG_PATH
})

Get started with the Algorithmia-Determined integration

Ready to try out Determined and Algorithmia for yourself? Get started with our most recent example, which walks you through training an object detection model on the Determined platform and serving the model on Algorithmia.

If you’re interested in learning more about Determined, check us out on Github or join our Slack community. We’re always looking to provide more examples to help users integrate Determined with other tools, so if you have any requests or suggestions, let us know!

And to learn more about Algorithmia, check out this additional step-by-step tutorial for deploying a model from a Jupyter notebook into production at scale and explore the product in greater depth.

Aslı Sabancı