Developer Center

Resources to get you started with Algorithmia

XGBoost

Updated

Welcome to deploying your XGBoost model on Algorithmia!

There are multiple ways to deploy your models on Algorithmia, depending on your workflow. This guide first shows you how to use the web UI to create and deploy your Algorithm. If you prefer a code-only approach to deployment, you can review the sample notebook tutorial at the end, to train and deploy a model to Algorithmia from scratch.

Table of Contents

Deploying through Web UI

Prerequisites

Before you get started deploying your pre-trained model on Algorithmia, there are a few things you’ll want to do first:

Save your Pre-Trained Model

You’ll want to do the training and saving of your model on your local machine, or the platform you’re using for training, before you deploy it to production on the Algorithmia platform.

Create a Data Collection

Host your data where you want and serve it to your model with Algorithmia’s Data API.

In this guide we’ll use Algorithmia’s Hosted Data Collection, but you can host it in S3 or Dropbox as well. Alternatively, if your data lies in a database, check out how we connected to a DynamoDB database.

First, you’ll want to create a data collection to host any data associated with your model and your XGBoost model itself.

  • Log into your Algorithmia account and create a data collection via the Data Collections page.

  • Click on “Add Collection” under the “My Collections” section.

  • After you create your collection you can set the read and write access on your data collection.

Create a data collection

For more information check out: Data Collection Types.

Note, that you can also use the Data API to create data collections and upload files.

Host Your Model File

Next, upload your serialized model to your newly created data collection.

  • Load model by clicking box “Drop files here to upload”

  • Note the path to your data collection and the zip folder: data://user_name/collections_name/model.zip

Create a data collection

Create your Algorithm

Hopefully you’ve already followed along with the Getting Started Guide for algorithm development. If not, you might want to check it out in order to understand the various permission types, how to enable a GPU environment, and use the CLI.

Once you’ve gone through the Getting Started Guide, you’ll notice that when you’ve created your algorithm, there is boilerplate code in the editor that returns “Hello” and whatever you input to the console.

The main thing to note about the algorithm is that it’s wrapped in the apply() function.

The apply() function defines the input point of the algorithm. We use the apply() function in order to make different algorithms standardized. This makes them easily chained and helps authors think about designing their algorithms in a way that makes them easy to leverage and predictable for end users.

Go ahead and remove the boilerplate code below that’s inside the apply() function on line 6, but leave the apply() function intact:

Algorithm console Python

Set your Dependencies

Now is the time to set your dependencies that your model relies on.

  • Click on the “Dependencies” button at the top right of the UI and list your packages under the required ones already listed and click “Save Dependencies” on the bottom right corner.

Set your dependencies

numpy
xgboost
joblib

Load your Model

Here is where you load and run your model which will be called by the apply() function.

When you load your model, our recommendation is to preload your model in a separate function external to the apply() function.

This is because when a model is first loaded it can take time to load depending on the file size.

Then, with all subsequent calls only the apply() function gets called which will be much faster since your model is already loaded.

If you are authoring an algorithm, avoid using the ‘.my’ pseudonym in the source code. When the algorithm is executed, ‘.my’ will be interpreted as the user name of the user who called the algorithm, rather than the author’s user name.

Note that you always want to create valid JSON input and output in your algorithm. For examples see the Algorithm Development Guides.

Using the SavedModel Method

This is where we’ll show how to deploy your saved model to make predictions on the sample data.

import Algorithmia
import pickle
import numpy as np
import xgboost as xgb

client = Algorithmia.client()

def load_model():
  file_path = "data://YOUR_USERNAME/scikit_xgboost_demo/xgboost_boston_model"
  model_file = client.file(file_path).getFile().name
  with open(model_file, 'rb') as f:
      model = pickle.load(f)
      return model

model = load_model()

def process_input(input):
    # Create numpy array from csv file passed as input in apply()
    if input.startswith('data:'):
        file_url = client.file(input).getFile().name
        try:
            np_array = np.genfromtxt(file_url, delimiter=',')
            print(np_array)
            return np_array
        except Exception as e:
            print("Could not create numpy array from data", e)
            sys.exit(0)

# API calls will begin at the apply() method, with the request body passed as 'input'
# For more details, see https://algorithmia.com/developers/algorithm-development/languages
def apply(input):
	# Expects a csv file
    np_data = process_input(input)
    prediction = model.predict(np_data)
    return "hello {}".format(prediction)

Publish your Algorithm

Last is publishing your algorithm. The best part of deploying your model on Algorithmia is that users can access it via an API that takes only a few lines of code to use! Here is what you can set when publishing your algorithm:

On the upper right hand side of the algorithm page you’ll see a purple button “Publish” which will bring up a modal:

Publish an algorithm

In this modal, you’ll see a Changes tab, a Sample I/O tab, and one called Versioning.

If you don’t recall from the Getting Started Guide how to go through the process of publishing your model, check that out before you finish publishing.

If you want to have a better idea of what a finished XGBoost algorithm looks like loading a XGBoost model, check out: scikitlearnxgboostdemo.

That’s it for hosting your XGBoost model on Algorithmia!

Deploying From Within Jupyter Notebook

This section demonstrates how you can create an algorithm on Algorithmia, push your algorithm script, your dependency file and your saved model file to Algorithmia all programmatically from within a Jupyter notebook.

If you’d like to follow along this tutorial and reproduce the steps, you can clone the example notebook under the repository at XGBoost Jupyter Notebook Demo. This example contains:

  • The training data
  • Python utility functions to help with our programmatic interactions with Algorithmia
  • The runnable Jupyter notebook

Before you get started, you’ll also want to make sure that you have the official Algorithmia Python Client installed on your development environment:

pip install algorithmia

For more information on using the Python Client you can go to the Algorithmia API docs.

For this example, we will also use some utility functions defined on our Algorithmia utility script here This script encapsulates the related calls to Algorithmia, through its Python API.

You can import both of these packages as follows:

import Algorithmia
import algorithmia_utils

Creating an Algorithm

First start by providing your username and Algorithmia API key:

If you aren’t logged in, make sure to replace YOUR_USERNAME with your name & YOUR_API_KEY with your API key.

username = "YOUR_USERNAME
"
api_key = "YOUR_API_KEY
"

And continue with defining the name of the algorithm you want to create and your local path to clone the repository. An example definition would be:

algo_name = "xgboost_basic_sentiment_analysis"
local_dir = "../algorithmia_repo"

Now you can use the utility functions to create the algorithm and clone it on your configured path:

algo_utility = algorithmia_utils.AlgorithmiaUtils(api_key, username, algo_name, local_dir)
algo_utility.create_algorithm()
algo_utility.clone_algorithm_repo()

Creating the Algorithm Script and Dependencies

The following code programmatically creates the algorithm script that handles our requests, and the dependency file that is used when building the container for our algorithm on the Algorithmia environment.

For this we will use the %%writefile macro, but you can always use another editor to edit and save your files.

%%writefile $algo_utility.algo_script_path
import Algorithmia
import joblib
import numpy as np
import pandas as pd
import xgboost

# Do not forget to update this line with your username!
model_path = "data://YOUR_USERNAME/xgboost_demo/musicalreviews_xgb_model.pkl"
client = Algorithmia.client()
model_file = client.file(model_path).getFile().name
loaded_xgb = joblib.load(model_file)

# API calls will begin at the apply() method, with the request body passed as 'input'
# For more details, see algorithmia.com/developers/algorithm-development/languages
def apply(input):
    series_input = pd.Series([input])
    result = loaded_xgb.predict(series_input)
    # Returning the first element of the list, as we'll be taking a single input for our demo purposes
    # As you'll see while building the model: 0->negative, 1->positive
    return {"sentiment": result.tolist()[0]}
%%writefile $algo_utility.dependency_file_path
algorithmia>=1.0.0,<2.0
scikit-learn
pandas
numpy
joblib
xgboost

Pushing Algorithm Files to Git

With the following function call, you will be uploading your changes to the remote repo on Algorithmia and your algorithm will be built on the Algorithmia servers.

algo_utility.push_algo_script_with_dependencies()

Building the XGBoost Model

The steps to train and validate an XGBoost model are omitted here but you can check them out on our fully working example repository.

Programmatically Uploading the Model to Algorithmia

Assuming that you have a saved model file after your training and validation steps, you can call the Algorithmia utility function to take your saved model from its local path and put it on a data collection on Algorithmia.

algorithmia_data_path = "data://YOUR_USERNAME/xgboost_demo"
algo_utility.upload_model_to_algorithmia(local_path, algorithmia_data_path, model_name)

Testing Deployed Algorithm

Now you are up and ready with a perfectly scalable algorithm on Algorithmia, waiting for its visitors!

Below is an example call to our sentiment analysis example algorithm, using our utility function to call its latest version.

test_input = "It doesn't work quite as expected. Not worth your money!"
algo_result = algo_utility.call_latest_algo_version(pos_test_input)
print(algo_result.metadata)
print("Sentiment result: {}".format(algo_result.result["sentiment"]))
Metadata(content_type='json',duration=0.020263526,stdout=None)
Sentiment for the given text is: 0

Working Model

You can check out the final working algorithm in action at Basic Sentiment Analysis with XGBoost