# Time series data analysis advances DevOps

Time series data, the key data points that have an associated timestamp allowing indexing in time order, are in most cases INSERT-intensive, requiring specialized time series databases as opposed to traditional relational practice as seen in SQL.

Prior to advancements in machine learning, much of the time series data analysis completed by DevOps engineers was limited to simple averages of key metrics with associated timestamps. By setting thresholds on those metrics in conjunction with timestamps, simple alert systems were born. Now, DevOps engineers are using time series data in ways that benefit from enhancements in the field of artificial intelligence.

## DevOps strives for 100% uptime using historical time series data

While standard alerts are useful for determining if a service or system is close to failure, DevOps now has the ability to see valuable trends in time series data. Rather than being reactionary, engineers are adding methods to their tool belts to prevent system outages and prepare for events based on historical data.

This proactive approach is one of the key tenets of today’s ML DevOps methodology. Rather than focusing on thresholds, DevOps can utilize anomalies in time series data found by the introduction of machine learning models.

### Time series in action

Let’s look at an example of some time series data that DevOps engineers are already familiar with.

127.0.0.1 user-identifier frank [10/Oct/2000:13:55:36 -0700] “GET /apache_pb.gif HTTP/1.0” 200 2326

Above, we see an HTTP log entry with a number of data points. Information like IP address, user information, request result, and above all, timestamp data are all collected.

DevOps can use this information to identify application failures or broken links to various assets. Engineers can act on these insights and resolve issues they encounter.

### DevOps analytics and time series data

Alternatively, they are identifying trends in this time series data that allow for proactive capacity planning. If the data shows an increase in the number of requests during a holiday or other event, DevOps engineers can use scaling techniques on the production resources to ensure a good user experience. By using this type of data for capacity planning, outages due to lack of resources are minimized; which consequently, saves time and possibly lost revenue during a preventable outage.

## Artificial intelligence identifies trends in economic time series data

Just as DevOps engineers are able to take advantage of advancements in machine learning, economics is also benefiting from the new technology. A great example of this is to see how time series data identifies trends in the stock market. AI is creating new ways to conduct risk analysis so that investors have a clearer picture of historical trends for individual companies as well as the market as a whole.

Time series data can also provide more in-depth cost-benefit analysis, including forecasting based on data used during the training of various ML models. Ultimately, this gives insight into additional scenarios with feedback to support it when presenting to stakeholders. This type of data was rarely available prior to the introduction of artificial intelligence, making it quite valuable.

What makes today’s big data challenges more complicated, however, is the need for data scientists to have access to large datasets alongside the models they use. When working with data that involves transactions, it is critical to have an appropriate layer of security as well.

Algorithmia allows a team’s DevOps engineers to implement a solution based on proven DevOps processes. At the same time, Algorithmia allows data scientists to branch out and innovate with their own machine learning models, or those already deployed in the Algorithmia platform.

### Time series data benefits from specialized database formats

Due to the nature of time series datasets, the database chosen must be scalable and highly available. Typical databases do not provide the throughput or storage needs for the large amounts of data surrounding ecommerce.

Specialized database formats are available that take advantage of advancements in software engineering, making them perfect for the types of intense analysis needed to make sense of large amounts of data. By hosting time series data in appropriate formats, data scientists and DevOps engineers also benefit from a usability standpoint.

Many functions for data retention, aggregation based on time elements, and common query tasks are built in, thus, eliminating the need for additional DevOps processes around maintenance. The result of having the right data in the right place is an increase in efficiencies across the board.

## Algorithmia provides a full solution for time series data analysis

Algorithmia recognizes the need for storing specialized datasets. Additionally, time series data is often stored with major cloud providers as denoted by business needs. You can seamlessly connect many major cloud-platform storage accounts for use in ML models, all while providing a single point of integration that handles all aspects of security and scalability.

Algorithmia’s Public instance Includes ML models that fully utilize the way today’s time series data is stored. Using these models in combination with yours and those of the Algorithmia community, fosters innovation needed to advance AI for today’s big data needs. Data scientists can focus on their jobs and DevOps can ensure capacity and uptime remains at appropriate service levels.

# The best AI programming languages to use

Computer coding must be involved to implement any type of AI system, and there is a variety of programming languages that lend themselves to specific AI or machine learning tasks. Let’s look at which programming languages will be the most beneficial for your specific use cases.

We have composed a simple list showing which five programming languages are best to learn if you want to be successful in the artificial intelligence industry. Each has its own particular strengths and weaknesses for a given project, so consider your end goals before selecting a language.

These programming languages include:

• Python
• R
• Java
• Scala
• Rust

## Python

Python is by far the most popular programming language used in artificial intelligence today because it has easy to learn syntaxes, massive libraries and frameworks, dynamic applicability to a plethora of AI algorithms, and is relatively simple to write.

Python supports multiple orientation styles; including functional, object-oriented, and procedural. In addition, its massive community helps to keep this language at the forefront of the computer science industry.

The disadvantages of Python include its lack of speed compared to some of the other languages, its less than optimal mobile coding capabilities, and the difficulty it has with memory-intensive tasks.

## R

R is another machine learning programming language, that is relatively easy to understand. The most common uses of R are for data analysis, big data modeling, and data visualization. R’s abundance of package sets and variety of materials make it easy to work with on data-centric tasks.

The disadvantages of R includes its excess use of memory, lack of basic security (unable to embed into web applications), and the fact that it is rooted in an older programming language, S.

## Java

Java is object-oriented and includes strengths such as working well with search algorithms—a simplified framework that supports large-scale projects efficiently—and its ease of debugging code. In addition, it is supported by a well-established community and has a myriad of open-source libraries.

The disadvantages of Java include its lack of performance speed compared to other languages and the inefficient use of memory that comes with running on top of the Java Virtual Machine. These two shortcomings generally result in a third: the increased cost of hardware.

## Scala

Scala is a highly scalable programming language that can handle large amounts of big data. Being multi-paradigm, Scala supports both object-oriented and functional styles of programming. Due to its concise code, Scala can be more readable and easier to write than other languages, similar to Java. Its speed and efficiency are what makes this language stand out for machine learning and AI models, with relatively error-free coding that is easy to debug when necessary.

The disadvantages of Scala include side effects that come with fulfilling both object-oriented and functional styles. Since this language is a combination of both programming styles, it can make understanding type-information more difficult. In addition, the option to switch back to an object-oriented style can be seen as a downside, as you won’t be forced to think functionally while you code.

## Rust

Rust is a systems-level programming language. It was created with the intention of writing “safe” code, meaning that objects are managed in the program itself. This relieves the programmer of doing pointer arithmetic or having to independently manage memory. The inability to use excess memory often results in cleaner code, potentially making it easier to program.

The disadvantages of Rust include a slower compiler than other languages, no garbage collection, and codes that cannot be developed at the same rate as other programming languages, such as Python.

## With Algorithmia, you can use multiple languages within one AI software

Algorithmia provides a machine learning architecture that invites programmers to pipeline models together, even if they’re written in different languages. This removes any need to translate algorithms into a certain language to be compatible with the rest of the algorithms in a monolithic architecture.

You can also reuse pieces of the software by calling them into the application whenever they’re needed, without copying and pasting them. Algorithmia helps organizations create better software, faster in this way.

Watch our video demo to learn how Algorithmia can help your organization increase efficiency in the last-mile machine learning process: deploying, serving, and managing models.

Six open-source machine learning tools you should know

Explanation of roles: machine learning engineers vs data scientists

# Linear regression for machine learning

Linear regression in machine learning is a supervised learning technique that comes from classical statistics. However, with the rapid rise of machine learning and deep learning, its use has surged as well, because neural networks with linear (multilayer perceptron) layers perform regression.

This regression is typically linear, but when the use of non-linear activation functions are incorporated into these networks, then they become capable of performing non-linear regression.

Nonlinear regression models the relationship between input and output using some form of non-linear function, for example a polynomial or an exponential. Non-linear regressors can be used to model common relationships in science and economics, like as examples, the exponential decay of a radioactive molecule or the trend in stock market performance in accordance with the overall global economy.

## How does linear regression work?

Stepping back from the neural network view, we can specify linear regression models as a simple mathematical relationship. Succinctly put, linear regression models a linear dependency between an input and an output variable. Depending upon what context you’re working in, these inputs and outputs are referred to by different terms.

Most commonly, we have a training dataset with $k$ examples, each having $n$ input components, $x_1, \ldots, x_n$, called the regressors, covariates, or exogenous variables. The output vector $\mathbf{y}$ is called the response variable, output variable, or the dependent variable. In multivariate linear regression, there can be multiple such output variables. The parameters of the model, $w_0, w_1, \ldots, w_n$, are called the regression coefficients, or in the deep learning context, the weights. The model has the form

$$\mathbf{y} = w_0 + w_1x_1 + \cdots + w_nx_n$$

for a single training example $\mathbf{x} = [x_1, \ldots, x_n]$. We can also make this notation compact by compressing the training data into a matrix $X \in \mathbb{R}^{k \times n+1}$, ${\displaystyle X={\begin{pmatrix}\mathbf {x} _{1}^{\mathsf {T}}\\\mathbf {x} _{2}^{\mathsf {T}}\\\vdots \\\mathbf {x} _{n}^{\mathsf {T}}\end{pmatrix}}={\begin{pmatrix}1&x_{11}&\cdots &x_{1p}\\1&x_{21}&\cdots &x_{2p}\\\vdots &\vdots &\ddots &\vdots \\1&x_{n1}&\cdots &x_{np}\end{pmatrix}},}$

and the weights into a vector, $\mathbf{w} = [w_0, w_1, \ldots, w_n]^{\top}$. The weights form the core of the model. They encode the linear relationship between input and output, placing more emphasis on data features, which are important and down-weighting those that are not. Note that we add a “hidden component” to each of the rows of $X$ that has value 1. This allows us to compute a dot product with $\mathbf{w}$, which has a bias term, $w_0$. The bias term allows the model to shift the linear hyperplane it computes off of the origin, permitting it to model relationships in data that are not zero-centered. The simplified model can then be expressed as

$$y = X\mathbf{w}$$

This is the basic model that underlies most implementations of linear regression; however, there are many variations that can exist on top of this basic structure, each conferring its own drawbacks and benefits. For example, there’s a version of linear regression called Bayesian linear regression, which introduces a Bayesian perspective by placing prior distributions on the weights of the model. This makes it easier to reason about what the model’s doing and subsequently makes its results more interpretable.

## Training a linear regression model

So how do we train a linear regression model? Well, the process is similar to what’s used with most machine learning models. We have a training set $$\mathcal{D} = \{(x^{(1)}, y^{(1)}),\ldots, (x^{(n)}, y^{(n)})\}$$ and our task is to model this relationship as closely as possible without affecting the model’s ability to predict on new examples. To that end, we define a loss, or objective, function $J_{\mathbf{w}}(\hat{y}, y)$ which takes in the true output $y$ and the predicted output $\hat{y}$ and measures “how well” the model is doing at predicting $y$ given $\mathbf{x}$. We use the subscript $\mathbf{w}$ to indicate that the output of $J$ is dependent on and parameterized by the model’s weights, $\mathbf{w}$, via the prediction $\mathbf{y}$, even though those weight values don’t explicitly show up in the function’s calculation. For linear regression, we typically use the mean-squared error (MSE) loss function. It is defined as

$$J_\mathbf{w}(\hat{y}, y) = \frac{1}{2} \sum_{i=1}^n (\hat{y}^{(i)} – y^{(i)})^2$$

We can then optimize this loss function using one of a variety of techniques. We could use something like gradient descent, the de facto standard for training neural networks, but this is actually not necessary for linear regression. This is because we can actually solve the optimization problem directly in order to find the optimum value for the weights, $\mathbf{w}^*$.

Since we want to optimize this for $\mathbf{w}$, we take the gradient with respect to $\mathbf{w}$, set the result to 0, then solve for $\mathbf{w}^*$, the optimal setting of $\mathbf{w}$. We have

\begin{align*} \nabla_{\mathbf{w}} J_\mathbf{w}(\hat{y}, y) &= \nabla_{\mathbf{w}} (y – X\mathbf{w})^\top(y – X\mathbf{w}) \\ &= \nabla_\mathbf{w} \left(y^\top y – y^\top X \mathbf{w} – \mathbf{w}^\top X^\top y + \mathbf{w}^\top X^\top X \mathbf{w}\right) \\ &= -2 y^\top X + 2 \mathbf{w}^\top X^\top X \\ \end{align*}

Now we set the gradient equal to 0 and solve for $\mathbf{w}$

\begin{align*} 0 &= -2 y^\top X + 2 \mathbf{w}^\top X^\top X \\ y^\top X &= \mathbf{w}^\top X^\top X \\ \mathbf{w}^* &= (X^\top X)^{-1}y^\top X \end{align*}

This is the optimal setting of $\mathbf{w}$ that will give the model with the best results. As you can see, it’s computed solely using products of $X$ and $y$. However, it requires a matrix inversion of $X^\top X$ which can be computationally difficult when $X$ is very large or poorly conditioned. In these cases, you could use an inexact optimization method like gradient descent or techniques designed to approximate the matrix inverse without actually computing it.

## Regularization

Probably the most commonly used variants of linear regression are those models which involve added regularization. Regularization refers to the process of penalizing model weights which are large in absolute value. Usually this is done by computing some norm of the weights as a penalty term added onto the cost function. The purpose of regularization is usually to mitigate overfitting, the tendency of a model to too closely replicate the underlying relationship in its training data, which prevents it from generalizing well to unseen examples. There are two basic types of regularization for linear regression models: L1 and L2.

$$\|\mathbf{w}\|_1 = |w_0 + w_1 + \cdots + w_n|$$

Regression models which employ L1 regularization are said to perform lasso regression.

In contrast, L2 regularization adds the L2 norm of the weight vector $\mathbf{w}$ as a penalty term to the objective function. The L2 norm is defined as

$$\|\mathbf{w}\|_2 = w_0^2 + w_1^2 + \cdots + w_n^2$$

Regression models regularized using L2 regression are said to perform ridge regression.

So how do these regularization penalties qualitatively affect the model’s results (outputs)? Well, it turns out that L2 regularization produces weight coefficients which are small but diffuse. That’s to say, it tends to produce models where each of the coefficients $w_0, \ldots, w_n$ are relatively small and relatively similar in magnitude.

In contrast, L1 regularization tends to be more specific about the way in which it penalizes coefficients. Certain of these coefficients tend to be penalized heavily and driven towards values of 0, whereas some remain relatively unchanged. The weights that L1 regularization produces are often said to be sparse.

In that vein, some also contend that L1 regularization actually performs a sort of soft feature selection, i.e. the selection of features (components in the data) which are the most important for producing the desired result. By driving certain weights to 0, the model is indicating that these variables are actually not particularly helpful or explanatory in its action.

## Uses of linear regression

Linear regression can be used just about anywhere a suspected linear relationship in data exists. For businesses, this could come in the form of sales data. For example, a business might introduce a new product into the market but be unsure of the price point at which to sell it.

By testing customer response in the form of gross sales at a few selected price points, a business can extrapolate the relationship between price and sales using linear regression in order to determine the optimal point at which to sell their product.

Similarly, linear regression can be employed at many stages in a product’s sourcing and production pipeline. A farmer, for example, might want to model how changes in certain environmental conditions such as rainfall and humidity affect overall crop yield. This can help her determine an optimized system for growing and rotating crops in order to maximize profits.

Ultimately, linear regression is an invaluable tool for modeling simple relationships in data. While it’s not as fancy or as complex as more modern machine learning methods, it’s often the right tool for many real-world datasets in which a straightforward relationship exists. Not to mention, the ease of setting up regression models and the quickness with which they can be trained make them the tool of choice for businesses that want to prototype quickly and efficiently.

# The 2020 state of enterprise machine learning experience: an interactive data visualization

Following the release of the 2020 State of Enterprise Machine Learning report, we created an interactive data visualization so anyone can explore the survey data, conduct analysis, and see how a company’s machine learning efforts compare to others like it.

The State of Enterprise Machine Learning (ML) experience shares eight questions that were posed in our survey and the associated results. After exploring the data, download the full report to read our assessments and predictions about where ML development is headed.

## Explore the data three different ways

Our report shares findings from nearly 750 survey respondents whom we polled in the fall of 2019. However, if you want to see how other companies of a similar size to yours are using machine learning, the interactive experience allows you to test your own hypotheses and arrive at findings tailored to you.

The interactive experience lets you filter by industry, company size, or job title to see “slices” of the data.

### Using the interactive experience

1. Scroll down the page to see the graphic visualizations and how the data break down generally.
2. Then apply a filter (specify it further, if desired). Scroll to see how the data changes.
3. Hover over graphs for specific percentages.
4. Refer to the report to glean more insight into the current and future state of ML.

We intend the interactive experience to provide details about how long it takes a specific job type or industry to deploy a machine learning model, what the state of ML maturity is for companies of a specific size, and whether or not certain industries are leading the ML charge.

Insights like these can be the impetus for new machine learning stories and journeys.

## Where is machine learning development headed?

The 2020 survey data and report confirms that ML in the enterprise is progressing at a lightning pace. Though the majority of companies are still in the early stages of developing ML maturity, it is incorrect to think there is time to delay ML development at your company.

Algorithmia is committed to adding to this interactive experience every year after we conduct our State of Enterprise Machine Learning survey. Read the full report for a year-over-year comparison with 2018; patterns are already starting to emerge. And stay tuned for next year’s data.

If your organization is not currently ML–oriented, know that your competitors are. Now is the time to future-proof your organization with AI/ML. Get ahead of the competition with Algorithmia

# 2020 machine learning predictions and the shortage of data scientists

In the last year alone, there have been countless developments in machine learning (ML) tooling and applications. Facial recognition and other computer vision applications are more sophisticated, natural language processing applications like sentiment analysis are increasingly complex, and the number of ML models in development is staggering.

In 2019, we spoke with thousands of companies in various stages of machine learning maturity, and we developed hypotheses about the state of machine learning and the traction it’s gaining in the enterprise across all industries. In October, we undertook a massive survey effort, polling nearly 750 business decision makers from organizations thinking about, developing, and implementing robust machine learning efforts.

We analyzed the data we gathered, gleaning insight into various ML use cases, roadmaps, and the changes companies had seen in recent months in budget, R&D, and head count.

## Data science: modern-day gold rush

We put together seven key findings from our analysis and published them in our 2020 State of Enterprise Machine Learning report. The first finding is likely not at all surprising: the field of data science is undergoing tremendous flux as word of demand, potential salaries, quick bootcamps, and open positions bounce around the internet.

But let’s dig into what we found in our survey data to get a better picture of what’s happening in the field.

## The rise of the data science arsenal

One of the pieces of data we collected was the number of data scientists employed at the respondent’s place of work. We hear repeatedly from companies that management is prioritizing hiring for the data science role above many others, including traditional software engineering, IT, and DevOps.

Half of people polled said their companies employ between one and 10 data scientists. This is actually down from 2018 (we polled in 2018 as well) where 58 percent of respondents said their companies employ between one and 10 data scientists. Like us, you might wonder why. We would have expected more companies to have one to 10 data scientists because investment in AI and ML is known to be growing (Gartner).

### Movement in the data science realm

However, In 2018, 18 percent of companies employed 11 or more data scientists. This year, however, 39 percent of companies have 11 or more, suggesting that organizations are ramping up their hiring efforts to build data science arsenals of more than 10 people.

Another observation from 2018 was that barely 2 percent of companies had more than 1,000 data scientists; today that number is just over 3 percent, indicating small but significant growth. Companies in this data science bracket are likely the big FAANG tech giants—Facebook, Apple, Amazon, Netflix, and Google (Yahoo); their large data science teams are working hard to derive sophisticated insight from the vast amounts of data they store.

## Demand for data scientists

Between 2012 and 2017, the number of data scientist jobs on LinkedIn increased by more than 650 percent (KDnuggets). The talent deficit and high demand for data science skills mean hiring and maintaining data science teams will only become more difficult for small and mid-sized companies that cannot offer the same salary and benefits packages as the FAANG companies.

As demand for data scientists grows, we may see a trend of junior-level hires having less opportunity to structure data science and machine learning efforts within their teams, as much of the structuring and program scoping may have already been done by predecessors who overcame the initial hurdles.

## New roles, the same data science

We will likely also see the merging of traditional business intelligence and data science roles in order to fill immediate requirements in the latter talent pool since both domains use data modeling (BI work uses statistical methods to analyze past performance, and data science makes predictions about future events or performance).

Gartner predicts that the overall lack of data science resources will result in an increasing number of developers becoming involved in creating and managing machine learning models (Gartner CIO survey). This blending of roles, will likely lead to another phenomenon related to this finding: more names and job titles for the same work. We are seeing an influx of new job titles in data science such as Machine Learning Engineer, ML Developer, ML Architect, Data Engineer, Machine Learning Operations (ML Ops), and AI Ops as the industry expands and companies attempt to distinguish themselves and their talent from the pack.

## The 2020 report and predicting an ML future

The strategic takeaway from the 2020 State of Enterprise Machine Learning survey for us was that a growing number of companies are entering the early stages of ML development, but of those that have moved beyond the initial stages, are encountering challenges in deployment, scaling, versioning, and other sophistication efforts. As a result, we will likely see a boom in the number of ML companies providing services to overcome these obstacles in the near term.

We will do a deeper dive into the other key findings in the coming weeks. In the meantime, we invite you to read the full report and to interact with our survey data in our 2020 State of Enterprise Machine Learning interactive experience.