All posts in Machine learning

Time series classification and DevOps stability

Colorful shapes background with text "DevOps engineers don't have to manage the vast infrastructure required for a successful ML implementation.

There is no doubt that more needs to be said about how time series data analysis advances DevOps. Time series classification is a tertiary aspect of time series data itself. By harnessing performance benefits from the powerful capabilities of a machine learning deployment platform, multiple types of objects are processed. The objects are classified using feature extraction to represent the data for our consumption in new ways.

Objects processed for time series classification include images, text, and audio. However, a wider range of applications from financial, security, or even medical diagnoses take full advantage of this form of deep learning AI. We may see DevOps use time series classification to identify the success of a product launch using the site’s social media data.

DevOps can use time series classification to ensure stability

Data scientists are currently engineering ML models designed to do sentiment analysis. By using this AI technology, the objects being processed are categorized by how an end-user may feel about a scenario like a product launch. Looking at the end users’ collaboration on internal and external support forums is one such source of data.

From the time a product or upgrade is released to a production environment, initial data starts flowing from the major social media outlets. Additional log and environmental data are combined that share the same timeline. The culmination of this information is further processed by deep learning AI.

By harnessing the results, DevOps can immediately start making decisions toward additional stability modifications or even a rollback to a prior state. The goal of which is to prevent customer impact as much as possible. This also relieves stress on sometimes already stretched support staff.

Microservices generate mass metrics

With the emergence of microservices, log files and other data are no longer a single stream. They include information from a large number of services hosted on serverless computing platforms. This information is in quantities multitudes greater and more complex than anything DevOps engineers usually encounter.

For example, a simple website API service may normally be hosted on a single IIS server. This service has a number of log files that show traffic patterns as well as problems that the end user may be experiencing. Current DevOps tooling includes software that helps visualize and filter these log files. However, the amount of data coming from a fully scaled implementation is far too great for most current tools in use, today.

By storing this large amount of critical information directly to cloud storage, the data is ready for more intense processing by new advancements in artificial intelligence. Since most companies have made the switch to microservices in the cloud, the application and the storage area for the application’s logs are contained in the same environment.

Algorithmia makes data analysis simple

Algorithmia is working diligently to make the working lives of DevOps engineers and data scientists easier. Our platform provides a means to access the large amounts of big data companies amass from today’s microservices and the Internet of Things. Direct access to data backed by a scalable data science platform means the possibilities for innovation are endless.

Choosing Algorithmia will allow your data scientists to focus on solving big data challenges with a scalable and highly customizable platform. All the while, DevOps engineers no longer have to manage the vast infrastructure required for a successful implementation. Their focus can remain on adding additional security, stability, and automation processes.

When team members are allowed to focus on their specializations instead of wearing multiple hats, the results can only benefit the project on which they are collaborating. Getting real results from information that would normally sit dormant is today’s new standard for the science of data analysis.

Each tier of Algorithmia’s platform includes existing pre-trained models you can build upon to solve complex problems without having to reinvent the wheel. By referencing these, or your own models, a complete solution can be developed that benefits from everything Algorithmia has to offer. 

Continue learning

An introduction to time series forecasting

Time series data analysis advances DevOps

Introduction to Time Series Data Analysis

Supporting our customers during the COVID-19 outbreak

As a Seattle-based company, it feels like we’ve been hearing and talking about the Coronavirus (COVID-19) for a while now. And one thing is very clear, it is not going away quickly and will continue to impact individuals, families, communities, and businesses. 

We are very mindful that this is a time of flux. However, we want to let you know what we’re doing to keep our business operating at the highest level to best serve our customers while keeping our employees safe, healthy, and available to support the business. 

What we’re doing as a company:

Business continuity plan.  We have a BCP in place and are operating at 100 percent support capacity as a result of our plan. If you need anything, we’re all here to help.

Work from home for all employees. We’ve been a remote-friendly company since employee #4, and now we’re all a part of the remote team. Beginning 5 March, we asked all employees to work from home and not come into the office. We are continually assessing this requirement, and right now this is planned through 27 March.

Virtual meetings. We have transitioned to an all-virtual meeting space both internally and externally, and we are accommodating time zones and working hours.

Business travel reductions. Beginning 1 March, we canceled all work-related international travel. For domestic travel, we have asked employees to consider virtual options where possible and use their best judgment.

Hygiene and illness practices. We have always actively implored employees to stay home when they are sick and we encourage them now to take precautions to prevent spreading any illnesses. We’re continuing to remind everyone to practice good hand hygiene and follow local social interaction mandates. 

We want to thank you for your continued business and we hope that you and your community are staying safe. If there is anything we can do to support you and your team during this time, please don’t hesitate to reach out to us.

Thank you,

The Algorithmia Team

Algorithmia on VMWare makes any-prem ML a reality

Algorithmia on VMWare is our newest on-premises product.

Here at Algorithmia, we have written and talked extensively about the challenges of deploying machine learning models at scale and the importance of data access in general machine learning development and infrastructure. When it comes down to it, ML is critical for any modern company to remain competitive. 

Machine learning has the most impact on a company’s core line of business applications that are often behind the firewall, particularly in regulated industries like financial services, insurance, health care, and laboratory sciences. 

For ML infrastructure to serve those industries, an on-premise product is a requirement. And users need low latency, high throughput, data-driven applications in a modern data-driven world. To those ends, we are thrilled to announce that Algorithmia Enterprise is now on VMWare!

Go where the data and users are 

Data and security go hand in hand, which is why concerns around ML security are the natural product of that relationship. 

One concern is the security implications of moving data between systems. Another is that data is expensive and difficult to move, so building and running ML models close to the data source is a preferred practice as it reduces costs, increases iteration speed, and satisfies security, compliance, and privacy requirements that many businesses have. 

Announcing Algorithmia Enterprise on VMWare

The general availability of Algorithmia Enterprise on VMware, the next version of our enterprise on-premises product, means customers can run Algorithmia on their existing VMWare infrastructure in their data center with the lowest latency and highest security for their ML–enabled applications. 

By providing a fully integrated solution for connecting, deploying, scaling and managing, we are enabling enterprises to leverage ML in a way they could not before.

Multi-cloud is in our DNA 

Customers faced with the challenges of multi-cloud sometimes try to build their own complex systems using native or in-house services across many cloud providers. This creates massive variability and volatility in deployment, upgrades, performance, and customer experience. And don’t forget that the engineering and support matrix grows with each variant. 

From the early days at Algorithmia, we knew that multi-cloud was critical to enabling our customer’s success, given the vastly different infrastructure choices one could make. So we focused on getting the foundation right as we knew the speed and quality of the deployment experience is a crucial advantage for customers.

And we also know that the customer experience must be fantastically consistent across any platform. By delivering on a truly multi-cloud platform that has UX, feature, and operational parity, we solve these problems for our customers and ensure a delightfully consistent experience.

Why VMWare?

The market has spoken. VMWare has won the on-premises cloud war and serves the majority of the private cloud/hypervisor market. The rest of the landscape is fractured, the number of variants and incompatibilities too high to navigate. The next largest vendor adoption is low, less than 10 percent. 

Services via VMWare are the standard offered in nearly every IT environment. By choosing VMWare as the preferred on-premises infrastructure platform, again we are enabling the greatest number of companies to achieve their full potential through the use of AI and ML.

Multi-cloud and any-prem

Now with Algorithmia Enterprise on VMware, multi-cloud ML deployment across public and private clouds is not just a wish, it is a reality. Companies that leverage the benefits of having their ML workload close to the data it needs and users that need it will realize that multi-cloud is a true differentiator for their business. 

Algorithmia and BERT language modeling

A photo of a red fox

Natural language processing has been one of the most poignant and visible uses of machine learning capabilities in recent years. From the basics of recurrent neural network architectures that were able to detect the first named entity pairings, to now where transformers are able to look at an entire paragraph or book simultaneously using parallel processing on GPUs, we’ve clearly seen some tremendous improvements. 

However nothing has been quite as dramatic for the field as a new architecture, Bidirectional Encoding Representations from Transformers or BERT.

In this post, we’ll walk through what BERT is and provide an easy way to use it on Algorithmia.

How recurrent neural networks work

Before we talk about BERT itself, let’s start with one of the cornerstone building blocks of many machine learning architectures—Recurrent Neural Networks (RNNs).

RNN model depictionSource:

Many datasets containing signals we’d like to extract are sequential, meaning that for each element x(t) in an input sequence (in the graphic above), y(t) depends not only on x(t), but x(t-1), and x(t-n). 

A great example of this is language—imagine this sentence: The quick brown fox jumps over the lazy ___. You may have an idea what that last word is supposed to be. This is due to how language is constructed—each word adds context and morphs the meaning of the sentence. To consider any of the  words individually, without context, would make it difficult to predict that the last word was dog

Using context for more accurate predictions 

Recurrent neural networks are a unique architecture that allow for the state of previous operations to be preserved inside the model. This means that I could design an RNN model that, given each word individually (a, quick, brown, fox, …), I could train the architecture to successfully predict dog and many other things, which is a simplistic description of how RNNs work. Let’s take a look at what some downsides to recurrent architectures are. 

Challenges in recurrent architectures

Vanishing gradient problem

One drawback is called the Vanishing Gradient Problem, which stems from how information is stored in RNNs. As mentioned, information from x(t-n) is stored in the network to help predict y(t), however when n gets to be a very large number, that information eventually starts to leak out.

There have been improvements to reduce this impact, such as Long-Short Term Memory layers (LSTMs) or Gradient Recurrent Units (GRUs), however this problem continues to persist in very, very long-range information sharing.

Information processing

The second problem stems from how information is processed. As mentioned, information from x(t-n) is used to help predict y(t). This means we need to calculate the value of y(t-n) before we can even start work on y(t), which can make parallelizing the training/inference processes quite difficult if not impossible for many tasks. 

This isn’t always a problem, however, especially for some smaller architectures, but if you intend to use scaled deep learning models, you will very quickly run into a brick wall in how fast you can train the model. 

This is one of the reasons why researchers have historically preferred to focus on other ML projects like image processing, as the power of deep learning was unable to provide any value to many RNN models.

Transfer learning

The third problem is a difficulty with transfer learning. The concept of transfer learning is the process of taking an ML model pre-trained on some generic dataset and re-training it on a specialized object dataset for the specific project or your problem. 

This kind of process is very common in the image processing world but has proven to be quite challenging for even relatively standard sequential tasks, such as Natural Language Processing. This is because any model you are planning to use for transfer learning must have been trained with the same type of objective as the one you plan on tackling. 

Transfer learning requires a shared set of necessary transformations between model objectives, which is where we see benefits in training time  and model / accuracy.

In the field of image classification,  we’re almost always looking for objects in an image, generally a natural photograph (like family vacation pictures from the bBahamas, etc). However if you attempted to reuse a general classification model to classify artifacts in x-ray stereographs, your model will really struggle to provide any value.

This kind of scenario has plagued NLP algorithms since it’s inception, as many NLP tasks are disparate and have objectives (such as Named Entity Recognition, or tText pPrediction) that are very difficult to leverage transfer learning for from one task to another.

This is where BERT comes in, and why it’s so special. BERT uses a multi-headed objective system that takes the most common NLP objectives and trains a model that’s capable of being successful in all of them. We’ll look at BERT models more in-depth below.

Other types of RNNs 

Attention networks

A new architecture was created by Google researchers a couple of years ago that approaches sequential problems in a different way.

A depiction of a recurrent neural network with an attention layer

With attention networks, we’re processing every variable in our sequence (x(0) all the way to x(t)) at once, rather than one at a time. 

We’re able to do this because the attention layer is able to view all the data at once using its limited number of weights to focus on only the parts of the input that matter for the next prediction. This means we’re able to parallelize training our model and also take advantage of GPUs.

Transformer networks

As a progression on attention networks, transformers have multiple “sets” of weights per attention layer that are able to focus on different parts of an attention vector. These are called transformer heads. 

Other than that, the big difference between attention and transformer networks is the concept of stacking attention/linear layers on top of each other (while taking some concepts from residual network architectures) in a similar way to convolutional neural networks. This creates the paradigm of deep learning, which allows us to avoid the vanishing gradient problem by ensuring that information from previous layers always bubbles up to the last layer of the network. 

These networks have become state of the art for natural language processing, considered jointly with the fact that they can be trained effectively using GPUs and TPUs, which allows researchers to make them even deeper.

A depiction of a transformer network model

Bidirectional Encoding Representations from Transformers (BERT)

Attention architectures allow us to solve two of the biggest problems of working with RNNs and be able to train much faster due to the parallelization attention models provide. With the introduction of transformers, using residual connections and multiple transformer heads, we can avoid the vanishing gradient problem, allowing us to construct deeper models and take advantage of the deep learning paradigm. 

But we’re still missing something; we haven’t addressed a third problem—NLP models are terrible for transfer learning.

This is where BERT comes in. It’s trained on two different objectives to normalize the parameters to be more general-purpose. Like many NLP architectures, a model is first trained to predict missing words and then to encode them into an internal representation using the “bag of words” metric. 

Unlike with typical training systems however, BERT is provided with not just one representation of a block of text, but two—one right-left, the other left-right. Hence it’s a bidirectional encoder. 

This phrase “embedding encoder” is also much deeper and contains significantly more parameters than earlier encoding systems such as word2vec or GLoVe

Besides that, the word “encoding” is not independent of the context, which allows BERT to have a very deep and rich understanding of the vocabulary used in the training corpus.

Diagrams of BERT models in semi- and supervised learning environments

Once a word encoder internal model is trained, a classifier is stacked on top of the model, which can be trained for a variety of tasks. In the pre-trained examples, a simple Spam/Not Spam binary classifier is constructed, but obviously this could be used for other systems as well, such as Named Entity Recognition of sentiment analysis, to name a few.

BERT and Algorithmia

A big benefit of BERT is that it generates very rich encodings of word representations that can be used for tasks involving large documents with many sentences. This is helpful because one model can be used to construct many downstream applications of varying complexity, such as document classification or semi-supervised document topic clustering.

Algorithmia has deployed two examples of BERT models on Algorithmia, one in TensorFlow, and the other on PyTorch. As you can see, the source code is also available using the new Github for Algorithmia integration, which allows you to more easily use the code you’d like.

Both of these models are able to provide rich representations of a sentence, and can be used as a first stage for many NLP downstream tasks that are specialized for your business case.

What is artificial intelligence engineering?

According to LinkedIn’s 2020 Emerging Jobs Report, the demand for “Artificial Intelligence Specialists” (comprised of a few related roles), has grown 74 percent in the last four years. With more companies than ever (even those outside of the tech) relying on AI tasks as part of their everyday business, demand for practitioners with this skill will only rise. 

In our 2020 state of enterprise machine learning report, we noted that the number of data science–related workers is relatively low but the demand for those types of skills is great and growing exponentially. 

If you’ve been curious about how to become an AI engineer or if you’re interested in shifting your current engineering role into one more focused on AI, you’ve come to the right place. 

By the end of this post you’ll understand: 

  • The role of an AI engineer.
  • The educational requirements to be an AI engineer.
  • The knowledge requirements to be an AI engineer.
  • The AI engineering career landscape. 

What is an AI engineer?

An artificial intelligence engineer is an individual who works with traditional machine learning techniques like natural language processing and neural networks to build models that power AI–based applications. 

The type of applications created by AI engineers include: 

  • Contextual advertising based on sentiment analysis
  • Language translation 
  • Visual identification or perception

Is an AI engineer a data engineer or scientist? 

You may be wondering how the role of an AI engineer differs from that of a data engineer or a data scientist. While all three roles work together within a business, they do differ in several ways: 

  • Data engineers write programs to extract data from sources and transform it so that it can be manipulated and analyzed. They also optimize and maintain data pipelines.  
  • Data scientists build machine learning models meant to support business decision making. They are often looking at the business from a higher strategic point than an AI engineer typically would.

What does it take to be an AI engineer?

AI engineering is a relatively new field, and those who currently hold this title come from a range of backgrounds. The following are some of the traits that many have in common. 


Many AI engineers moved over from previous technical roles and often have undergraduate or graduate degrees in fields that are required for those jobs. These include: 

  • Computer science
  • Statistics
  • Applied mathematics
  • Linguistics 
  • Cognitive science 

Most of the above degrees have some relevance to artificial intelligence and machine learning. 

Technical skills

Two of the most important technical skills for an AI engineer to master are programming and math/statistics. 

  • Programming: Software developers moving into an AI role or developers with a degree in computer science likely already have a grasp on a few programming languages. Two of the most commonly used languages in AI, and specifically machine learning, are Python and R. Any aspiring AI engineer should at least be familiar with these two languages and their most commonly used libraries and packages.
  • Math/statistics: AI engineering is more than just coding. Machine learning models are based on mathematical concepts like statistics and probability. You will also need to have a firm grasp on concepts like statistical significance when you are determining the validity and accuracy of your models.

Soft skills

AI engineers don’t work in a vacuum. So while technical skills will be what you need for modeling, you’ll also need the following soft skills to get your ideas across to the entire organization. 

  • Creativity – AI engineers should always be on the lookout for tasks that humans do inefficiently and machines could do better. You should stay abreast of new AI applications within and outside of your industry and consider if they could be used in your company. In addition, you shouldn’t be afraid to try out-of-the-box ideas. 
  • Business knowledge – It’s important to remember that your role as an AI engineer is meant to provide value to your company. You can’t provide value if you don’t really understand your company’s interest and needs from a strategic and tactical level. 

A cool AI application doesn’t mean much if it isn’t relevant to your company or can’t improve business operations in any way. You’ll need to understand your company’s business model, who the target customers and targets are, and if it has any long- or short-term product plans. 

  • Communication – In the role of an AI engineer, you’ll have the opportunity to work with groups all over your organization, and you’ll need to be able to speak their language. For example, for one project you’ll have to: 
    • Discuss your needs with data engineers so they can deliver the right data sources to you.
    • Explain to finance/operations how the AI application you’re developing will save costs in the long run or bring in more revenue.
    • Work with marketing to develop customer-focused collateral explaining the value of a new application.
  • Prototyping – Your ideas aren’t necessarily going to be perfect on the first attempt. Success will depend on your ability to quickly test and modify models until you find something that works.

Can I turn my current engineering role into an AI role?

Yes. Experienced software developers are well-suited to make the transition into AI engineering. You presumably have the command of more than one programming language and the foundational knowledge to learn another. It’s also likely that you’ve already worked with machine learning models in some capacity possibly by incorporating them into other applications. 

If you are interested in pursuing an AI engineering role within an organization where you already work, your knowledge of the business and knowledge of how the engineering team works will be crucial. 

How much does an artificial intelligence engineer earn in salary?

Artificial intelligence engineers are in high demand, and the salaries that they command reflect that. According to estimates from job sites like Indeed and ZipRecuiter, an AI engineer can make anywhere between $90,000 and $200,000 (and possibly more) depending on their qualifications and experience. 

Another factor that will determine salary is location. According to the LinkedIn Emerging Jobs Report mentioned earlier, most AI engineering jobs are located in the San Francisco Bay area, Los Angeles, Seattle, Boston, and New York City. 

Continue learning

Big data and artificial intelligence: a quick comparison

The best AI programming languages to use

Developing your own machine learning projects