Before you get started learning about Scala algorithm development, make sure you go through our Getting Started Guide to learn how to create your first algorithm, understand permissions available, versioning, using the CLI, and more.
Algorithmia makes a number of libraries available to make algorithm development easier. The full Java 11 language and standard library is available for you to use in your algorithms. Furthermore, algorithms can call other algorithms and manage data on the Algorithmia platform via the Algorithmia Scala Client.
Algorithmia supports adding 3rd party dependencies via Maven packages. Specifically, any packages from Maven Central can be added to algorithms. On the algorithm editor page, click Options and select Manage Dependencies.
Add dependencies by adding lines of the following form:
libraryDependencies += GroupID % ArtifactID % Version
For example, to add Apache Commons Math version 3.4.1:
libraryDependencies += "org.apache.commons" % "commons-math3" % "3.4.1"
Automatic JSON parsing
By default, Algorithmia uses Google’s GSON library for converting JSON to and from native Java objects. You can specify the input and output types of your algorithm simply by setting the parameters and return type of your
GSON is a pure java library and does not support many scala native types. For example, List[Int] does not automatically parse, but Array[Int] will. This is because Array in scala is actually a Java array. Similarly, java.util.Map will parse correctly, but scala.collection.Map will not.
This example shows a function that takes two parameters, a Map from Strings to Strings (dict) and another String (key), and returns another String.
Algorithmia can automatically parse many types of native Java objects to and from JSON: Integers, Lists, Arrays, Maps, and many others. In many cases it can also parse arbitrary user-defined Java Classes to and from JSON. See the Gson User Guide for reference.
Custom JSON parsing
If you want more control over parsing, then use a single apply method accepting a
String and give it the
@AcceptsJson annotation (from the
On the other hand, if GSON doesn’t serialize your output response to JSON correctly (or you want to do some custom serialization) you can add an
@ReturnsJson to your apply method and return a serialized JSON String.
Advanced Serialization Techniques
Not every algorithm is stateless, and sometimes you need to preserve state in the data API. Ensuring that your algorithm state can be downloaded and deserialized quickly and efficiently is critical for ensuring that your algorithm executes in a reasonable time frame.
For state serialization in scala, we recommend boopickle as it allows you to serialize and deserialize into binary faster than any equivalent json parser, and serializes to a much smaller footprint than the equivalent JSON.
Algorithms can throw any exception, and they will be returned as an error via the Algorithmia API. If you want to throw a generic exception message, use an
Writing files for the user to consume
Sometimes it is more appropriate to write your output to a file than to return it directly to the caller. In these cases, you may need to create a temporary file, then copy it to a Data URI (usually one which the caller specified in their request, or a Temporary Algorithm Collection):
Working with directories
While running, algorithms have access to a temporary filesystem located at
/tmp, the contents of which do not persist across calls to the algorithm. While the Data API allows you to get the contents of the files you want to work with as JSON, a string, or raw bytes, in some cases you might need your algorithm to read and write files locally. This can be useful as a temporary location to store files downloaded from Hosted Data, such as raw data for processing or models to be loaded into your algorithms. It can also be used to write new files before uploading them via the Data API.
For reference, this gist provides an example of iterating over data in a directory, processing it, and writing new data to a file, while this template for ALBERT and Tensorflow provides an example of using the
/tmp directory to load a model.
Calling Other Algorithms and Managing Data
To call other algorithms or manage data from your algorithm, use the Algorithmia & Scala which is automatically available to any algorithm you create on the Algorithmia platform. For more detailed information on how to work with data see the Data API docs and learn about Algorithmia’s Hosted Data Source.
When designing your algorithm, don’t forget that there are special data directories,
.algo, that are available only to algorithms to help you manage data over the course of the algorithm execution.
You may call up to 24 other algorithms, either in parallel or recursively.