In the last 12 months, there have been numerous developments in machine learning (ML) tools, applications, and hardware. Google’s TPUs are in their third generation, the AWS Inferentia chip is a year old, Intel’s Nervana Neural Network Processors are designed for deep learning, and Microsoft is reportedly developing its own custom AI hardware.
This year, Algorithmia has had conversations with thousands of companies in various stages of machine learning maturity. From them we developed hypotheses about the state of machine learning in the enterprise, and in October, we decided to test those hypotheses.
Following the State of Enterprise Machine Learning 2018 report, we conducted a new two-prong survey this year, polling nearly 750 business decision makers across all industries at companies that are actively developing machine learning lifecycles, just beginning their machine learning journeys, or somewhere in between. Sign up to receive the full 2020 report on 12 December 2019 when it publishes.
2020 key findings and report format
The forthcoming 2020 report focuses on seven key findings from the survey. In brief, they are:
- The rise of the data science arsenal for machine learning: most all companies are building data science teams to develop ML use cases. There are discrepancies in team size and agility, however, that will affect how quickly and efficiently ML is applied to business problems.
- Cutting costs takes center stage as companies grow in size: the primary business use cases center on customer service and internal cost reduction. Company size is the differentiator.
- Overcrowding at early maturity levels and AI for AI’s sake: the pool of companies entering the ML arena is growing exponentially but that could bring about an increase in “snake-oil AI” solutions.
- An unreasonably long road to deployment: despite the rapid development in use cases, growth in AI/ML budgets, and data science job openings, there is still a long road to model deployment. We offer several hypotheses why.
- Innovation hubs and the trouble with scale: we anticipate the proliferation of internal AI centers (innovation hubs) within companies designed to quickly develop ML capabilities so the organization can stay current with its competition. Machine learning challenges still exist, however, stymying the last-mile to sophisticated levels of ML maturity.
- Budget and ML maturity, an emerging disparity: AI/ML budgets are growing across all company sizes and industries, but several industries are investing more heavily.
- Determining machine learning success across the org chart: hierarchical levels within companies are determining ML success by two different metrics. The director level will likely play a large role in the future of ML adoption.
The report concludes with a section on the future of machine learning and what we expect in the short-term.
What to expect in the 2020 report
Our findings are presented with our original hypotheses, as well as our analysis of the results. Where possible, we have provided a year-on-year comparison with data from 2018 and included predictions about what is likely to manifest in the ML space in the near term.
We have included graphics throughout to bring the data to life (the banner graphic of this post is a bubble chart depicting the use cases of machine learning and their frequency in the enterprise).
We will continue to conduct this annual survey to increase the breadth of our understanding of machine learning technology in the enterprise and share with the broader industry how ML is evolving. In doing so, we can track trends in ML development across industries over time, ideally making more informed predictions with higher degrees of confidence.
Following the report and future-proofing for machine learning
We will soon make our survey data available on an interactive webpage to foster transparency and a greater understanding of the ML landscape. We are committed to being good stewards of ML technology.
This year’s survey report should confirm for readers that machine learning in the enterprise is progressing at a lightning pace. Though the majority of companies are still in the early stages of ML maturity, it is incorrect to think there is time to delay ML efforts at your company.
If your organization is not currently ML–oriented, know that your competitors are. Now is the time to future-proof your organization with AI/ML.
Sign up to receive the full 2020 State of Enterprise Machine Learning report when it publishes on 12 December.
Sentiment analysis invites us to consider the sentence, You’re so smart! and discern what’s behind it. It sounds like quite a compliment, right? Clearly the speaker is raining praise on someone with next-level intelligence. However, consider the same sentence in the following context.
Wow, did you think of that all by yourself, Sherlock? You’re so smart!
Now we’re dealing with the same words except they’re surrounded by additional information that changes the tone of the overall message from positive to sarcastic.
This is one of the reasons why detecting sentiment from natural language (NLP or natural language processing) is a surprisingly complex task. Any machine learning model that hopes to achieve suitable accuracy needs to be able to determine what textual information is relevant to the prediction at hand, have an understanding of negation, human patterns of speech, idioms, metaphors, etc, and be able to assimilate all of this knowledge into a rational judgment about a quantity as nebulous as “sentiment.”
In fact, when presented with a piece of text, sometimes even humans disagree about its tonality, especially if there’s not a fair deal of informative context provided to help rule out incorrect interpretations. With that said, recent advances in deep learning methods have allowed models to improve to a point that is quickly approaching human precision on this difficult task.
Sentiment analysis datasets
The first step in developing any model is gathering a suitable source of training data, and sentiment analysis is no exception. There are a few standard datasets in the field that are often used to benchmark models and compare accuracies, but new datasets are being developed every day as labeled data continues to become available.
The first of these datasets is the Stanford Sentiment Treebank. It’s notable for the fact that it contains over 11,000 sentences, which were extracted from movie reviews and accurately parsed into labeled parse trees. This allows recursive models to train on each level in the tree, allowing them to predict the sentiment first for sub-phrases in the sentence and then for the sentence as a whole.
The Amazon Product Reviews Dataset provides over 142 million Amazon product reviews with their associated metadata, allowing machine learning practitioners to train sentiment models using product ratings as a proxy for the sentiment label.
The IMDB Movie Reviews Dataset provides 50,000 highly polarized movie reviews with a 50-50 train/test split.
The Sentiment140 Dataset provides valuable data for training sentiment models to work with social media posts and other informal text. It provides 1.6 million training points, which have been classified as positive, negative, or neutral.
Sentiment analysis, a baseline method
Whenever you test a machine learning method, it’s helpful to have a baseline method and accuracy level against which to measure improvements. In the field of sentiment analysis, one model works particularly well and is easy to set up, making it the ideal baseline for comparison.
To introduce this method, we can define something called a tf-idf score. This stands for term frequency-inverse document frequency, which gives a measure of the relative importance of each word in a set of documents. In simple terms, it computes the relative count of each word in a document reweighted by its prevalence over all documents in a set. (We use the term “document” loosely.) It could be anything from a sentence to a paragraph to a longer-form collection of text. Analytically, we define the tf-idf of a term t as seen in document d, which is a member of a set of documents D as:
tfidf(t, d, D) = tf(t, d) * idf(t, d, D)
Where tf is the term frequency, and idf is the inverse document frequency. These are defined to be:
tf(t, d) = count(t) in document d
idf(t, d, D) = -log(P(t | D))
Where P(t | D) is the probability of seeing term t given that you’ve selected document D.
From here, we can create a vector for each document where each entry in the vector corresponds to a term’s tf-idf score. We place these vectors into a matrix representing the entire set D and train a logistic regression classifier on labeled examples to predict the overall sentiment of D.
Sentiment analysis models
The idea here is that if you have a bunch of training examples, such as I’m so happy today!, Stay happy San Diego, Coffee makes my heart happy, etc., then terms such as “happy” will have a relatively high tf-idf score when compared with other terms.
From this, the model should be able to pick up on the fact that the word “happy” is correlated with text having a positive sentiment and use this to predict on future unlabeled examples. Logistic regression is a good model because it trains quickly even on large datasets and provides very robust results.
Other good model choices include SVMs, Random Forests, and Naive Bayes. These models can be further improved by training on not only individual tokens, but also bigrams or tri-grams. This allows the classifier to pick up on negations and short phrases, which might carry sentiment information that individual tokens do not. Of course, the process of creating and training on n-grams increases the complexity of the model, so care must be taken to ensure that training time does not become prohibitive.
More advanced models
The advent of deep learning has provided a new standard by which to measure sentiment analysis models and has introduced many common model architectures that can be quickly prototyped and adapted to particular datasets to quickly achieve high accuracy.
Most advanced sentiment models start by transforming the input text into an embedded representation. These embeddings are sometimes trained jointly with the model, but usually additional accuracy can be attained by using pre-trained embeddings such as Word2Vec, GloVe, BERT, or FastText.
Next, a deep learning model is constructed using these embeddings as the first layer inputs:
Convolutional neural networks
Surprisingly, one model that performs particularly well on sentiment analysis tasks is the convolutional neural network, which is more commonly used in computer vision models. The idea is that instead of performing convolutions on image pixels, the model can instead perform those convolutions in the embedded feature space of the words in a sentence. Since convolutions occur on adjacent words, the model can pick up on negations or n-grams that carry novel sentiment information.
LSTMs and other recurrent neural networks
RNNs are probably the most commonly used deep learning models for NLP and with good reason. Because these networks are recurrent, they are ideal for working with sequential data such as text. In sentiment analysis, they can be used to repeatedly predict the sentiment as each token in a piece of text is ingested. Once the model is fully trained, the sentiment prediction is just the model’s output after seeing all n tokens in a sentence.
RNNs can also be greatly improved by the incorporation of an attention mechanism, which is a separately trained component of the model. Attention helps a model to determine on which tokens in a sequence of text to apply its focus, thus allowing the model to consolidate more information over more timesteps.
Recursive neural networks
Although similarly named to recurrent neural nets, recursive neural networks work in a fundamentally different way. Popularized by Stanford researcher Richard Socher, these models take a tree-based representation of an input text and create a vectorized representation for each node in the tree. Typically, the sentence’s parse tree is used. As a sentence is read in, it is parsed on the fly and the model generates a sentiment prediction for each element of the tree. This gives a very interpretable result in the sense that a piece of text’s overall sentiment can be broken down by the sentiments of its constituent phrases and their relative weightings. The SPINN model from Stanford is another example of a neural network that takes this approach.
Another promising approach that has emerged recently in NLP is that of multi-task learning. Within this paradigm, a single model is trained jointly across multiple tasks with the goal of achieving state-of-the-art accuracy in as many domains as possible. The idea here is that a model’s performance on task x can be bolstered by its knowledge of related tasks y and z, along with their associated data. Being able to access a shared memory and set of weights across tasks allows for new state-of-the-art accuracies to be reached. Two popular MTL models that have achieved high performance on sentiment analysis tasks are the Dynamic Memory Network and the Neural Semantic Encoder.
Sentiment analysis and unsupervised models
One encouraging aspect of the sentiment analysis task is that it seems to be quite approachable even for unsupervised models that are trained without any labeled sentiment data, only unlabeled text. The key to training unsupervised models with high accuracy is using huge volumes of data.
One model developed by OpenAI trains on 82 million Amazon reviews that it takes over a month to process! It uses an advanced RNN architecture called a multiplicative LSTM to continually predict the next character in a sequence. In this way, the model learns not only token-level information, but also subword features, such as prefixes and suffixes. Ultimately, it incorporates some supervision into the model, but it is able to acquire the same or better accuracy as other state-of-the-art models with 30-100x less labeled data. It also uncovers a single sentiment “neuron” (or feature) in the model, which turns out to be predictive of the sentiment of a piece of text.
Moving from sentiment to a nuanced spectrum of emotion
Sometimes simply understanding just the sentiment of text is not enough. For acquiring actionable business insights, it can be necessary to tease out further nuances in the emotion that the text conveys. A text having negative sentiment might be expressing any of anger, sadness, grief, fear, or disgust. Likewise, a text having positive sentiment could be communicating any of happiness, joy, surprise, satisfaction, or excitement. Obviously, there’s quite a bit of overlap in the way these different emotions are defined, and the differences between them can be quite subtle.
This makes the emotion analysis task much more difficult than that of sentiment analysis, but also much more informative. Luckily, more and more data with human annotations of emotional content is being compiled. Some common datasets include the SemEval 2007 Task 14, EmoBank, WASSA 2017, The Emotion in Text Dataset, and the Affect Dataset. Another approach to gathering even larger quantities of data is to use emojis as a proxy for an emotion label. 🙂
When training on emotion analysis data, any of the aforementioned sentiment analysis models should work well. The only caveat is that they must be adapted to classify inputs into one of n emotional categories rather than a binary positive or negative.
AWS re:Invent is next month, and we are pleased to announce that Algorithmia CEO, Diego Oppenheimer, will be speaking on the new software development lifecycle (SDLC) for machine learning. Often we get variations on this question: how can we adapt our infrastructure, operations, staffing, and training to meet the challenges of ML without throwing away everything that already works? Diego is prepared with answers. His talk will cover how machine learning (ML) will fundamentally change the way we build and maintain applications.
Currently, many data science and ML deployment teams are struggling to fit an ML workflow into tools that don’t make sense for the job. This session will help clarify the differences between traditional and ML-driven SDLCs, cover common challenges that need to be overcome to derive value from ML, and provide answers to questions about current technological trends in ML software. Finally, Diego will outline how to build a process and tech stack to bring efficiency to your company’s ML development.
Diego’s talk will be on 4 December at 1:40pm in the Nuvola Theater in the Aria.
Coming soon: the 2020 State of Enterprise Machine Learning Report
Additionally, Diego will share insights from our upcoming 2020 State of Enterprise Machine Learning survey report, which will be an open-source guide for how the ML landscape is evolving. The report will focus on these findings:
- Shifts in the number of data scientists employed at companies in all industries and what that portends for the future of ML
- Use case complexity and customer-centric applications in smaller organizations
- ML operationalization (having a deployed ML lifecycle) capabilities (and struggles) across all industries
- Trends in ML challenges: scale, version-control, model reproducibility, and aligning a company for ML goals
- Time to model deployment and wasted time
- What determines ML success at the producer level (data scientist and engineer) and at the director and VP level
Pick up a copy of the report at Algorithmia’s booth.
Diego and his team will be available throughout the week to answer questions about infrastructure specifics, ML solutions, and new use cases at Booth 311.
Meet with our team
If you or your team will be in Las Vegas for re:Invent this year, we want to meet with you. Our sales engineers would love to cater a demo of Algorithmia’s product for your specific needs and demonstrate our latest features. Book some time with us!
Read the full press report here.
Algorithmia is fortunate to work with companies across many industries with varied use cases as they develop machine learning programs. We are delighted to showcase the great work one of our customers is doing and how the AI Layer is able to power their machine learning lifecycle.
Tevec is a Brazil-based company that hosts Tevec.AI, a supply chain recommendation platform that uses machine learning to forecast demand and suggest optimized replenishment/fulfillment order for logistics chains. Put simply, Tevec ensures retailers and goods transport companies deliver their products to the right place at the right time.
In founder Bento Ribeiro’s own words, the “Tevec Platform is a pioneer in the application of machine learning for the recognition of demand behavior patterns, automating the whole process of forecasting and calculation of ideal product restocking lots at points of sale and distribution centers, allowing sales planning control, service level, and regulatory stocks.”
Tevec runs forecasting and inventory-optimization models and customizes user permissions so they can adjust the parameters of their inventory routine, such as lead times, delivery dates, minimum inventory, and service levels. Users can fine-tune the algorithms and adapt for specific uses or priorities.
The challenge: serving and managing at scale
Initially, Tevec was embedding ML models directly into its platform, causing several issues:
- Updating: models and applications were on drastically different update cycles, with models changing many times between application updates
- Versioning: model iterating and ensuring all apps were calling the most appropriate model was difficult to track and prone to error
- Data integrations: manual integrations and multi-team involvement made customization difficult
- Model management: models were interacting with myriad endpoints such as ERP, PoS systems, and internal platforms, which was cumbersome to manage
Algorithmia provides the ability to not worry about infrastructure and guarantees that models we put in production will be versioned and production-quality.”
Luiz Andrade, CTO, Tevec
The solution: model hosting made simple with serverless microservices
Tevec decoupled model development from app development using the AI Layer so it can seamlessly integrate API endpoints, and users can maintain a callable library of every model version. Tevec’s architecture and data science teams now avoid costly and time-consuming DevOps tasks; that extra time can be spent on building valuable new models in Python, “the language of data science,” Andrade reasons. That said, with the AI Layer, Tevec can run models from any framework, programming language, or data connector—future-proofing Tevec’s ML program.
With Algorithmia in place, Tevec’s data scientists can test and iterate models with dependable product continuity, and can customize apps for customers without touching models, calling only the version needed for testing.
Algorithmia’s serverless architecture ensures the scalability Tevec needs to meet its customers demands without the costs of other autoscaling systems, and Tevec only pays for compute resources it actually uses.
Tevec continues to enjoy 100-percent year-on-year growth, and as it scales so will its ML architecture deployed on Algorithmia’s AI Layer. Tevec is planning additional products beyond perfect order forecasts and it is evaluating new frameworks for specific ML use cases—perfect for the tool-agnostic AI Layer. Tevec will continue to respond to customer demands as it increases the scale and volume of its service so goods and products always arrive on time at their destinations.
Algorithmia is the whole production system, and we really grabbed onto the concept of serverless microservices so we don’t have to wait for a whole chain of calls to receive a response.”
Luiz Andrade, CTO, Tevec
Read the full Tevec case study.
As companies begin developing use cases for machine learning, the infrastructure to support their plans must be able to adapt as data scientists experiment with new and better processes and solutions. Concurrently, organizations must connect a variety of systems into a platform that delivers consistent results.
Machine learning architecture consists of four main groups:
- Data and Data Management Systems
- Training Platforms and Frameworks
- Serving and Life Cycle Management
- External Systems
ML-focused projects generate value only after these functional areas connect into a workflow.
In part 3 of our Machine Learning Infrastructure whitepaper series, “Connectivity,” we discuss how those functional areas fit together to power the ML life cycle.
It all starts with data
Most data management systems include built-in authentication, role access controls, and data views. In more advanced cases, an organization will have a data-as-a-service engine that allows for querying data through a unified interface.
Even in the simplest cases, ML projects likely rely on a variety of data formats—different types of data stores from many different vendors. For example, one model might train on images from a cloud-based Amazon S3 bucket, while another pulls rows from on-premises PostgreSQL and SQL Server databases, while a third interprets streaming transactional data from a Kafka pipeline.
Select a training platform
Training platforms and frameworks comprise a wide variety of tools used for model building and training. Different training platforms offer unique features. Libraries like TensorFlow, Caffe, and PyTorch offer toolsets to train models.
The freedom of choice is paramount, as each tool specializes in certain tasks. Models can be trained locally on a GPU and then deployed or they can be trained directly in the cloud using Dataiku, Amazon, SageMaker, Azure ML Studio, or other platforms or processors.
Life cycle management systems
Model serving encompasses all the services that allow data scientists to deliver trained models into production and maintain them. Such services include the abilities to ingest models, catalog them, integrate them into DevOps workflows, and manage the ML life cycle.
Fortunately, each ML architecture component is fairly self-contained, and the interactions between those components are fairly consistent:
- Data informs all systems through queries.
- Training systems export model files and dependencies.
- Serving and life cycle management systems return inferences to applications and model pipelines, and export logs to systems of record.
- External systems call models, trigger events, and capture and modify data.
It becomes easy to take in data and deploy ML models when these functions are grouped together.
External Systems can consume model output and integrate it in other places. Based on the type of deployment, we can create different user interfaces. For example, the model output can integrate into a REST API or another web application. RESTful APIs assist us in calling our output from any language and integrating it into new or existing project.
Connectivity and machine learning sophistication
Data have made the jobs of business decision makers easier. But data is only useful after models interpret it, and model inference only generates value when external apps can integrate and consume it. That journey toward integration has two routes: horizontal integration and loosely coupled, tight integration.
The quickest way to develop a functioning ML platform is by supporting only a subset of solutions from each of the functional groups to more quickly integrate each into a horizontal platform. Doing so requires no additional workforce training and adds speed to workflows already in place.
Unfortunately, horizontal integration commits an organization to full-time software development rather than building and training models to add business value. An architecture that allows each system to evolve independently, however, can help organizations choose the right components for today without sacrificing the flexibility to rethink those choices tomorrow.
To enable a loosely coupled, tightly integrated approach, a deployment platform must support three kinds of connectivity:
- Data Connectors
- RESTful APIs
Publish/subscribe (pub/sub) is an asynchronous, message-oriented notification pattern. In such a model, one system acts as a publisher, sending events to a message broker. Through the message broker, subscriber systems explicitly enroll in a channel, and the hub forwards and verifies delivery of publisher notifications, which can then be used by subscribers as event triggers.
Algorithmia’s AI Layer has configurable event listeners that allow users to trigger actions based on input from pub/sub systems.
While the model is the engine of any machine learning system, data is both the fuel and the driver. Data feeds the model during training, influences the model in production, then retrains the model in response to drift.
As data changes, so does its interaction with the model, and to support that iterative process, an ML deployment and management system must integrate with every relevant data connector.
Because there is a variety of requesting platforms and high unpredictability therein, a loose coupling is, again, the most elegant answer. RESTful APIs are the most elegant implementation, due to these required REST constraints:
- Uniform interface: requests adhere to a standard format
- Clint-Server: the server only interacts with the client through requests
- Stateless: all necessary information must be included within a request
- Layered system: the REST client passes any layers between itself and the server
- Cacheable: Developers can store certain responses
To learn more about how connectivity feeds into the machine learning life cycle, download the full whitepaper.
And visit our website to read parts 1 and 2 of the Machine Learning Infrastructure whitepaper series.