All posts in Blog Posts

Improving Customer Retention Analytics With Machine Learning

ML allows companies to base their product and marketing retention strategies on predictive customer analytics

Customers have an abundance of options when it comes to products for purchase. This excess of options, however, increases the risk of poor customer retention. Since acquiring new customers costs much more than keeping current customers, a higher retention rate is always better.

Customer retention represents the number of customers who continue purchasing from a company after their first purchase. This is usually measured as the customer retention rate, which is the percentage of customers your company has retained over a certain time period. The opposite of retention rate is churn rate, which represents the percentage of customers a company has lost over a given time period.

Customer retention analytics can be done through machine learning, allowing companies to base their product and marketing strategies on predictive customer analytics rather than less reliable predictions made manually.

In a survey of more than 500 business decision-makers that Algorithmia conducted in the fall of 2018, 59 percent of large companies said that customer retention was their primary use case for machine learning technology. 

What Is Customer Retention Analysis?

Customer retention analysis is the application of statistics in order to understand how long customers are retained before churning out and to identify trends in customer retention. This type of analysis discerns how long customers usually stick around, whether or not seasonality affects customer retention, and discovers behaviors and factors that differentiate retained customers from churned customers.

Why Is Customer Retention Analysis Important For Your Company?

Customer retention analysis is important for your company because it helps you understand which personas have higher retention rates and discern which features impact retention. This provides actionable insights that can help you make more effective product and marketing decisions. 

It can be difficult for a product or sales team to know how well a product is actually performing with the target audience. They may think that features and messaging is on brand and clear because acquisition numbers are growing. However, just because new customers are purchasing a product does not necessarily mean customers like the product or service enough to stick around. 

That is where customer retention analytics comes in. Every company needs data in order to make effective business and marketing decisions. Machine learning makes this easier than it has ever been before, which is great news for companies that wish to leverage this data.

How Do You Analyze Customer Retention?

Use past customer data to predict future customer behavior

Machine learning for customer retention analytics uses past customer data to predict future customer behavior. This is done using big data. In today’s data-driven world, companies can track hundreds of data points about thousands of customers. Therefore, the input data for the customer retention model could be any combination of the following:

  • Customer demographics
  • Membership/loyalty rewards
  • Transaction/purchase history
  • Email/phone call history
  • Any other relevant customer data

During the model training process, this data will be used to find correlations and patterns to create the final trained model to predict customer retention. Not only does this tell you the overall churn risk of your customer base, but it can determine churn risk down to the individual customer level. You could use this data to proactively market to those customers with higher churn risk or find ways to improve your product, customer service, messaging, etc. in order to lower your overall churn rate.

How Do You Improve Retention?

To improve retention, you have to first understand the cause of your retention issues. As discussed, machine learning models are a very efficient way to analyze customer retention to determine risks and solutions. 

Data science teams can build the machine learning models necessary for this type of predictive analytics, but there are challenges associated with developing machine learning processes. For example, deploying models written in different languages is not easy, to say the least. Algorithmia’s AI Layer solves these issues using a serverless microservice architecture, which allows each service to be deployed independently with options to pipeline them together. 

Another challenge is overcoming the cost of time lost to building, training, testing, deploying, and managing a model, let alone multiple in a machine learning program. 

Improving customer retention is one of the main uses Algorithmia’s early adopters focused on because it is one of the simpler machine learning models to build and use, and it’s even easier with the serverless microservices framework provided by the AI Layer. Our platform has built-in tools for versioning, deployment, pipelining, and integrating with customers’ current workflows. The AI Layer integrates with any technology your organization is currently using, fitting in seamlessly to make machine learning easier, getting you from data collection to model deployment and analysis much faster. 

To learn more about how the AI Layer can benefit your company, watch a demo to see how much easier your machine learning projects can be.

Algorithmia is TensorFlow 2.0 Compatible

Algorithmia and TensorFlow compatible for model deployment

TensorFlow 2.0 shipped today, 30 September 2019, with new features, such as faster debugging and iteration with Eager Execution, a TensorFlow-enhanced implementation of the Keras API, and simplification and compatibility improvements across its APIs. TensorFlow 2.0 is a major upgrade that should increase performance, streamline workflows, and provide more compatibility for new or updated models.

We offer day 1 support

At Algorithmia, we believe data scientists should be able to deploy and serve models from any framework and keep up with the pace of tool development. To that end, we’re eager to announce that we support model deployments in the TensorFlow 2.0 framework—Google’s latest version that was released today. 

Effective immediately, our Cloud AI Layer customers can host, serve, and deploy pre-trained TensorFlow 2.0 models the same way they do with previous versions

Our Enterprise customers will receive the same support in their next product update.

Accessing TensorFlow 2.0 packageset on Algorithmia

While TensorFlow 2.0 includes a conversion tool for existing 1.x models, those conversions will not be fully automatic. Rest assured that the AI Layer will remain fully backward-compatible with all previous versions of TensorFlow—and the more than 15 other frameworks we support.

What’s next

We won’t stop there. We want to provide users with the freedom to choose the best tool for every job, and that means immediate support for future versions of TensorFlow and other frameworks in development. If you have any questions about framework support or our rollout schedule, please contact your account manager or send us a message.

Happy model deployment!

Five Machine Learning Applications for the Back Office

While consumer-facing applications of machine learning (ML) have gotten a lot of attention (Netflix, Uber, and Amazon) the back office deserves some recognition. Enterprise-level systems that run the business—think finance, robo-advisors, accounting, operations, human resources, and procurement—tend to be large, complex, and process-centric. But, they also use and produce large amounts of both structured and unstructured data that can be handled in new ways to save time and money. 

Machine learning combined with solution-specific software can dramatically improve the speed, accuracy, and effectiveness of back-office operations and help organizations reimagine how back-office work gets done. 

A current trend among mid and large organizations is to implement Robotic Process Automation (RPA) in the back office to minimize manual tasks and achieve efficiencies. While there are specific use cases that make RPA an appropriate technology, there are significant differences with a machine learning approach.

Robotic Process Automation and artificial intelligence

Robotic Process Automation is software that mimics human actions while AI is built to simulate human intelligence. As an example, an RPA bot can be programmed to receive invoices via email (triggered on a specific subject line), download the invoice and put in a specific folder. An AI activity would be to “read” the invoice and extract the pertinent information like amount, invoice number, supplier name, due date, etc.

One of the more interesting downsides of RPA as outlined by Garter in the Magic Quadrant Report on RPA Vendors is that RPA automations create long-term technical debt, rather than overcoming it. 

As you overlay RPA onto current technology and tasks, you are locking yourself into those technologies instead of updating and evolving. 

Organizations must manually track the systems, screens, and fields that each automation touches in each third-party application and update the RPA scripts as those systems change. This is very challenging in a SaaS world in which product updates happen much more regularly than on-prem.

As such, the shift toward AI, and specifically ML, is to improve process, not just speed. Here are five specific applications of ML that can be used to improve back-office operations:

Account reconciliation (finance)

Account reconciliations are painful and error-prone. They are also critical to every business to ensure the proper controls are in place to close the books accurately and on-time. Many companies do this manually (which really means using Excel, macros, pivot tables, and Visual Basic) or have invested in RPA, which doesn’t get you very far, or in a Boolean rules-based system, which is expensive to set up and not super accurate. 

Challenges in Account Reconciliation

An ML approach is ideal for account reconciliations, specifically matching reconciliations, because you have ground-truth data—previous successful matched transactions and consistent fields in subsequent reconciliations. The challenge has been that for large and complex data sets, the combinatorial problem of matching is really hard. Companies like Sigma IQ have focused on this problem and solved it with a combination of machine learning and point-solution software as a hosted platform.

Invoice processing/accounts payable (accounting)

We introduced invoice processing earlier in this article as a use case for ML in the back office as a way to understand the differences between RPA and ML. The reality is that every business deals with invoices at some level, and as natural language processing (NLP) and ML advance, these improvements will roll down from the enterprise level to small businesses. 

Aberdeen Group indicates that well-implemented accounts payable systems can reduce time and costs by 75 percent, decrease error rates by 85 percent, and improve process efficiency by 300 percent, so it makes sense to pursue, right?

Using ML to augment accounting

Machine learning models automate accounting

Companies like AODocs are extending their NLP and ML capabilities to take some of the pain out of invoice management by automatically capturing information from invoices and triggering the appropriate workflow. These types of solutions can greatly reduce or eliminate manual data entry, increase accuracy, and match invoice to purchase order.

Employee attrition detection (HR)

There are many applications of AI in the HR function, including applicant tracking and recruiting (resume scanning and skills analysis), attracting talent before hiring, individual skills management/performance development (primarily via regular assessment analysis), and enterprise resource management.

Using ML to track employee satisfaction

One interesting use case from an ML−NLP perspective is employee attrition. Hiring is expensive, and retaining employees and keeping them happy is imperative to sustainable growth. Identifying attrition risk requires source data—like a consistently applied employee survey that uses unstructured data analysis for the open field comments. Overlaying this data with factors such as tenure, time since last pay raise/promotion, sick days used, scores on performance reviews, skill set competitiveness with market, and generally available employment market data can help assess probability of satisfaction.

Predicting repairs and upkeep for machinery (operations)

The influx of sensors into all types of equipment including trucks, oil rigs, assembly lines, and trains means an explosion of data on usage, wear, and tear of such equipment. Pairing this data with historical records on when certain types of equipment need certain preemptive maintenance means expensive machinery can be scheduled for downtime and repair not just based on number of hours used or number of miles driven, but what actual usage is.

Predix is a General Electric company that powers industrial apps to process the historic performance data of equipment. Its sensors and signals can be used to discern a variety of operational outcomes such as when machinery might fail so that you can plan for—or even prevent—major malfunctions and downtime.

Predictive analytics for stock in transit (procurement)

For companies that spend a lot of money on hard goods that need to be moved for either input into manufacturing or delivery to a retail shelf, stock in transit is a major source of opportunity for applying ML models to predict when goods will arrive at a destination.

Machine learning models predict when goods will arrive at a destination

Item tracking has improved dramatically with sensors, but it is only a point-in-time solution that doesn’t predict when the goods will arrive or when they should arrive. Weather, traffic, type of transport, risk probabilities, and historical performance are all part of the data that can help operations nail the flow of goods for optimal process timing.

Stock in Transit 

SAP S/4HANA has an entire module dedicated to making trade-off predictions between different options for stock in transit solutions to meet customer order objectives. 

Further opportunities for back office machine learning

These are just five of the hundreds of use cases ML paired with solution-specific software can be applied in order to improve the way the back office functions. Whether it is cutting down on manual tasks, improving accuracy, reducing costs, or helping teams change their critical processes wholesale, machine learning can augment nearly every back-office process.

Accelerate MLOps: using CI/CD with machine learning models

CI/CD pipelineContinuous Integration and Continuous Deployment (CI/CD) are key components of any mature software development environment. During CI, newly added code is merged into the codebase, kicking off builds and automated testing. If all tests succeed, then the CD phase begins, deploying the changes automatically to production. In this way, developers can immediately release changes to production by simply committing to, or merging into, the proper branch in their version control system.

Developers have a great deal of flexibility as to how they build this pipeline, due to the wide variety of open and interoperable platforms for version control and CI/CD. This is not, however, always true in the world of machine learning: it can be difficult to properly version, track, and productionize machine learning models

Challenges of CI/CD in machine learning

Some existing services provide this functionality effectively, but lock data scientists into a black-box silo where their models must be trained, tracked, and deployed on a closed technology stack. Even when open systems are available, they do not always interoperate cleanly, forcing data scientists to undergo a steep learning curve or bring in specialists to build novel deployment tools.

At Algorithmia, we believe ML teams should have the freedom to use any training platform, language, and framework they prefer, then easily and quickly deploy their models to production. To enable this, we provide CI/CD workflows in Jenkins and GitHub Actions, which work out of the box, but can be easily modified to work with just about any CI/CD tool to continuously deploy your ML models into our scalable model-hosting platform, running in our public marketplace or in your own private-cloud or on-prem cluster.

This is made possible by our Algorithm Management API, which provides a code-only solution for deploying your model as a scalable, versioned API endpoint. Let’s take a look at a typical CI/CD pipeline for model deployment:

CI/CD workflow

The process kicks off when you train your initial model or modify the prediction code—or when an automatic process retrains the model as new data becomes available. The files are saved to network-available storage and/or a version control system such as Git. This triggers any tests that must be run, then kicks off the deployment process. In an ideal world, your new model will be live and running on your production servers within seconds or minutes as a readily available API endpoint apps and services can call. In addition, your endpoint should support versioning, so dependant apps/services can access older versions of your model as easily as the latest copy.

Algorithmia’s CI/CD tools provide the latter stages of that workflow: detecting the change in your saved model or code, and deploying your new model to a scalable, versioned Algorithmia API endpoint (an “algorithm” in our terminology). These are drop-in configurations: the only changes you need to make are to the single Python script, which specifies the settings to use (e.g. endpoint name and execution language) and which files to deploy.

If you’re using Jenkins or GitHub Actions, simply clone and configure the appropriate configuration. If you prefer a simple, notebook-driven deploy, check out our Jupyter Notebook example. If you’re using another tool, it should be fairly simple to customize the examples, or you can contact us to request new ones!

The state of machine learning in financial services

Machine learning for finance

The financial services industry has often been at the forefront of using new technology to solve business problems. It’s no surprise that many firms in this sector are embracing machine learning, especially now that increased compute power, network connectivity, and cloud infrastructure are cheaper and more accessible. 

This post will detail five important machine learning use cases that are currently providing value within financial services organizations. 

Fraud detection 

The cost of financial fraud for a financial services company jumped 9 percent between 2017 and 2018, resulting in a cost of $2.92 for every dollar of fraud. We have previously discussed machine learning applications in fraud detection in detail, but it’s worth mentioning some additional reasons why this is one of the most important applications for machine learning in this sector. 

Most fraud prevention models are based on a set of human-created rules that result in a binary classification of “fraud” or “not fraud.” The problem with these models is that they can create a high number of false positives. It’s not good for business when customers receive an abnormally high number of unnecessary fraud notifications. Trust is lost, and actual fraud may continue to go on undetected. 

Machine learning clustering and classification algorithms can help reduce the problem of false positives. They continually modify the profile of a customer whenever they take a new action. With these multiple points of data, the machine can take a nuanced approach to determine what is normal and abnormal behavior. 


Creditworthiness is a natural and obvious use of machine learning. For decades, banks have used very rudimentary logistic regression models with inputs like income 30-60-90-day payment histories to determine likelihood of default, or the payment and interest terms of a loan. 

The logistic model can be problematic as it can penalize individuals with shorter credit histories or those who work outside of traditional banking systems. Banks also miss out on additional sources of revenue from rejected borrowers who would likely be able to pay.

With the growing number of alternative data points about individuals related to their financial histories (e.g., rent and utility bill payments or social media actions), lenders are able to use more advanced models to make more personalized decisions about creditworthiness. For example, a 2018 study suggests that a neural network machine learning model may be more accurate at predicting likelihood of default as compared to logistic regression or decision-tree modeling. 

Despite the optimism around increased equitability for customers and a larger client base for banks, there is still some trepidation around using black box algorithms for making lending decisions. Regulations, including the Fair Credit Reporting Act, require creditors to give individuals specific reasons for an outcome. This has been a challenge for engineers working with neural networks. 

Credit bureau Equifax suggests that it has found a solution to this problem, releasing a “regulatory-compliant machine learning credit scoring system” in 2018. 

Algorithmic trading

Simply defined, algorithmic trading is automated trading using a defined set of rules. A basic example would be a trader setting up automatic buy and sell rules when a stock falls below or rises above a particular price point. More sophisticated algorithms exploit arbitrage opportunities or predict stock price fluctuations based on real-world events like mergers or regulatory approvals. 

The previously mentioned models require thousands of lines of human-written code and have become increasingly unwieldy. Relying on machine learning makes trading more efficient and less prone to mistakes. It is particularly beneficial in high frequency trading, when large volumes of orders need to be made as quickly as possible. 

Automated trading has been around since the 1970s, but only recently have companies had access to the technological capabilities able to handle advanced algorithms. Many banks are investing heavily in machine learning-based trading. JPMorgan Chase recently launched a foreign exchange trading tool that bundles various algorithms including time-weighted average price and volume-weighted average price along with general market conditions to make predictions on currency values.


Robo-advisors have made investing and financial decision-making more accessible to the average person. Their investment strategies are derived from an algorithm based on a customer’s age, income, planned retirement date, financial goals, and risk tolerance. They typically follow traditional investment strategies and asset allocation based on that information. Because robo-advisors automate processes, they also eliminate the conflict of financial advisors not always working in a client’s best interest.

While robo-advisors are still a small portion of assets under management by financial services firms ($426 billion in 2018), this value is expected to more than triple by 2023. Customers are enticed by lower account minimums (sometimes $0), and wealth management companies save on the costs of employing human financial advisors. 

Cybersecurity and threat detection 

Although not unique to the financial services industry, robust cybersecurity protocols are absolutely necessary to demonstrate asset safety to customers. This is also a good use case to demonstrate how machine learning can play a role in assisting humans rather than attempting to replace them. Specific examples of how machine learning is used in cybersecurity include: 

Malware detection: Algorithms can detect malicious files by flagging never-before-seen software attempting to run as unsafe. 

Insider attacks: Monitoring network traffic throughout an organization looking for anomalies like repeated attempts to access unauthorized applications or unusual keystroke behavior

In both cases, the tedious task of constant monitoring is taken out of the hands of an employee and given to the computer. Analysts can then devote their time to conducting thorough investigations and determining the legitimacy of the threats.

It will be important to watch the financial sector closely because its use of machine learning and other nascent applications will play a large role in determining those technologies’ use and regulation across countless other industries.