All posts by Whitney Garrett

How machine learning helps improve fraud detection

Banner graphic, blue background with magnifying glass and lock

Technology companies have been emphasizing fraud detection for decades. Internet fraud first began appearing in 1994, with the introduction of e-commerce businesses. Since then, companies have taken large strides in fraud detection, but with these advancements also come improved tactics from the cyber criminals themselves. Today we will be discussing how companies traditionally implement fraud detection systems, the difference that machine learning makes on these efforts, and the effects that these improvements have on your customer base.

Traditional methods of fraud detection

Before machine learning became the most effective way of detecting fraudulent activity, organizations would rely on rules. Rules provide a semi-reliable means of mitigating fraud risk, and can be used in a variety of ways. Some of these rules might include parameters such as not allowing purchases from “at risk” zip codes, flagging transactions from locations that are not near the billing address, or not allowing multiple purchases from the same credit card in a short period of time. But these rules come with their own limitations, especially when aiming for big data fraud detection.

ML rules come with limitations in fraud detection

Limitations of rules-based models

Fixed thresholds

Each fraud detection rule has a corresponding threshold. For example, if a company doesn’t allow more than three purchases in a half hour window, then that is the rule’s threshold. Although these thresholds are great for general parameters, they are not capable of adapting to individual situations.

Rules are absolute

This goes hand-in-hand with fixed thresholds. Rules are absolute, meaning that they can only be effective when responding to “yes or no” questions. Such questions would include: Is the purchase location within range of the billing address? Is the billing address located in a risky zip code? Has this user made more than three purchases in the past thirty minutes?

Rules are inefficient when used alone

Because rules cannot adapt to unique circumstances, they prove to be inefficient when acting alone to filter fraudulent transactions. Machine learning is used to help make up for these inefficiencies. 

Fraud detection + machine learning

Machine learning helps make fraud detection easier and more efficient. By implementing machine learning into your detection model, you can flag suspicious activity more frequently, and with far greater accuracy than with traditional rule-based methods alone. This allows for better pattern recognition among large amounts of data, instead of relying solely on “yes/no” factors to determine fraudulent users or transactions.

For machine learning to be effective in preventing fraud, it relies on classification. Classification is the process of grouping data together according to certain criteria. Common uses of classification in detecting fraudulent transactions includes spam detection, predicting loan defaults, and implementing recommendation systems, among others. The goal of these methods is to distinguish legitimate transactions from fraudulent ones based on classifications such as which merchant a customer is buying from, the location of both the merchant and buyer, time of day/year of the transaction, and the amount spent.

Methods to improve fraud detection

There are several ways that you can group together customer data to improve fraud detection efforts. Some of these grouping methods include: 

Identity

Age of the customer’s account, amount of characters in their email address, fraud rate of their IP address, number of devices they’ve accessed your site on, etc.

Order history

How many orders were placed when the account was created, the dollar amount spent on each transaction, and how many failed orders were attempted.

Location

The billing address matches the shipping address, the country of the customer’s IP address matches the shipping country, customer’s country, city, or zip code is not known for having fraudulent activity.

Method of payment

Credit card and shipping address are from the same country, matching names between the customer and shipping information, credit card is not issued from a bank with a reputation of fraudulent transactions by its customers.

The effect on customers

Machine learning is not only beneficial to the companies who implement these models, but also for the subsequent customers who visit your site. By having a machine learning model in place, you can reduce the number of falsely flagged transactions, streamlining the purchase process for legitimate users. This system also helps to detect fraud that might otherwise be missed with rules-based models alone, improving inventory management and ensuring that available stock is always accurate and available for those who are ready to buy.

Algorithmia gets machine learning

Implementing machine learning into your fraud detection system might seem like a no-brainer, but Algorithmia understands that such a task can be easier said than done. Our expertise in machine learning allows you to feel confident in your ML implementation, while simultaneously solving your fraud detection problem along the way. We host a serverless microservices architecture that allows enterprises to easily deploy and manage machine learning models at scale, making the entire process simple and effortless for your organization. See how Algorithmia can help you build better software for your organization in our video demo.

See how Algorithmia works in a demo

Continue learning

How machine learning works

The importance of machine learning data

Webinar: 2020 state of enterprise machine learning

United Nations case study

Webinar: 2020 state of enterprise machine learning

webinar slide 1: 2020 state of enterprise machine learning

Last week, our CEO, Diego Oppenheimer, and CEO of ArthurAI, Adam Wenchel, hosted a webinar on the state of enterprise machine learning in 2020. The webinar was moderated by Algorithmia VP of Engineering, Ken Toole.

View recording

Diego and Adam leverage their knowledge of AI and machine learning and offer their enterprise experience making these technologies available for companies to automate their business operations. This is a great opportunity to learn from industry leaders what they see happening in the AI/ML space and what is likely ahead for companies deciding to incorporate AI and ML into their workflows.

Webinar Overview

The talk started with a look at our 2020 state of enterprise machine learning report, which published in December. The report focused on seven key findings:

  • the role of the data scientist and the rise of data science arsenals at companies to prepare for data value extraction via machine learning models.
  • the most common challenges to developing mature machine learning programs are deployment, versioning, and aligning stakeholders within an organization.
  • investment in AI/ML in the enterprise is growing swiftly with several industries leading the charge.
  • most companies are spending more than 8 days, and some times up to a year, deploying a single model.
  • the majority of companies undertaking ML initiatives are in relatively early stages (ie. developing use cases, building models, or working on deployment).
  • there is a discrepancy in determining what ML success looks like across industries and roles within an organization.
  • business use cases for machine learning vary, but the most common ones are for gaining customer insight and for reducing costs.

2020 report findings

One of the topics of discussion surrounded how DevOps, engineering, and data science teams are organizing around machine learning. Diego and Adam both mention the blending of roles and the morphing of resources across business units. About this change, Adam said:

Having to change the way groups are organized in order to be successful is something we see over and over again.” — Adam Wenchel, CEO ArthurAI

Model deployment challenges

A topic that Algorithmia cares about deeply is the time to deployment for machine learning models in the enterprise. We talk to a massive number of companies that say they spend between 8 and 90 days deploying, and an alarming number of companies who spend more than 90 days, and we think that’s unnecessary and a waste of valuable resources.

Time to deployment is where we see a giant gap; the fact that it could potentially take 90 days, or even more in some cases, to deploy a single model is scary because the cost balloons during that time and it’s unacceptable to the C-suite.” – Diego Oppenheimer, CEO Algorithmia

Listen to the full story

The webinar covers many trends in the AI/ML space, and it’s a great opportunity to hear from three leaders in enterprise machine learning. Watch the full webinar here and if you’d like a copy of the slides, click below.

See slides

The best AI programming languages to use

Processor illustration with overlaid code snippet symbolizing the many AI programming languages available for developers

Computer coding must be involved to implement any type of AI system, and there is a variety of programming languages that lend themselves to specific AI or machine learning tasks. Let’s look at which programming languages will be the most beneficial for your specific use cases.

We have composed a simple list showing which five programming languages are best to learn if you want to be successful in the artificial intelligence industry. Each has its own particular strengths and weaknesses for a given project, so consider your end goals before selecting a language.

Alt text: Five of the best programming languages to learn to be successful in the AI industry: Python, R, Java, Scala, and Rust.

These programming languages include:

  • Python
  • R
  • Java
  • Scala 
  • Rust

Python

Python is by far the most popular programming language used in artificial intelligence today because it has easy to learn syntaxes, massive libraries and frameworks, dynamic applicability to a plethora of AI algorithms, and is relatively simple to write. 

Python supports multiple orientation styles; including functional, object-oriented, and procedural. In addition, its massive community helps to keep this language at the forefront of the computer science industry. 

The disadvantages of Python include its lack of speed compared to some of the other languages, its less than optimal mobile coding capabilities, and the difficulty it has with memory-intensive tasks.

R

R is another machine learning programming language, that is relatively easy to understand. The most common uses of R are for data analysis, big data modeling, and data visualization. R’s abundance of package sets and variety of materials make it easy to work with on data-centric tasks.

The disadvantages of R includes its excess use of memory, lack of basic security (unable to embed into web applications), and the fact that it is rooted in an older programming language, S.

Java

Java is object-oriented and includes strengths such as working well with search algorithms—a simplified framework that supports large-scale projects efficiently—and its ease of debugging code. In addition, it is supported by a well-established community and has a myriad of open-source libraries.

The disadvantages of Java include its lack of performance speed compared to other languages and the inefficient use of memory that comes with running on top of the Java Virtual Machine. These two shortcomings generally result in a third: the increased cost of hardware.

Scala

Scala is a highly scalable programming language that can handle large amounts of big data. Being multi-paradigm, Scala supports both object-oriented and functional styles of programming. Due to its concise code, Scala can be more readable and easier to write than other languages, similar to Java. Its speed and efficiency are what makes this language stand out for machine learning and AI models, with relatively error-free coding that is easy to debug when necessary.

The disadvantages of Scala include side effects that come with fulfilling both object-oriented and functional styles. Since this language is a combination of both programming styles, it can make understanding type-information more difficult. In addition, the option to switch back to an object-oriented style can be seen as a downside, as you won’t be forced to think functionally while you code.

Rust

Rust is a systems-level programming language. It was created with the intention of writing “safe” code, meaning that objects are managed in the program itself. This relieves the programmer of doing pointer arithmetic or having to independently manage memory. The inability to use excess memory often results in cleaner code, potentially making it easier to program.

The disadvantages of Rust include a slower compiler than other languages, no garbage collection, and codes that cannot be developed at the same rate as other programming languages, such as Python.

With Algorithmia, you can use multiple languages within one AI software

Algorithmia provides a machine learning architecture that invites programmers to pipeline models together, even if they’re written in different languages. This removes any need to translate algorithms into a certain language to be compatible with the rest of the algorithms in a monolithic architecture. 

Algorithmia provides a machine learning architecture that invites programmers to pipeline models together, even if they’re written in different languages.

You can also reuse pieces of the software by calling them into the application whenever they’re needed, without copying and pasting them. Algorithmia helps organizations create better software, faster in this way.

Watch our video demo to learn how Algorithmia can help your organization increase efficiency in the last-mile machine learning process: deploying, serving, and managing models.

Watch our video demo. 

Recommended readings

How to add AI to your WordPress site

Six open-source machine learning tools you should know

Roadmap for ML: Navigating the Machine Learning Roadmap

Explanation of roles: machine learning engineers vs data scientists

2020 machine learning predictions and the shortage of data scientists

Graph showing the number of data scientists employed from 2018 to 2019

In the last year alone, there have been countless developments in machine learning (ML) tooling and applications. Facial recognition and other computer vision applications are more sophisticated, natural language processing applications like sentiment analysis are increasingly complex, and the number of ML models in development is staggering.

In 2019, we spoke with thousands of companies in various stages of machine learning maturity, and we developed hypotheses about the state of machine learning and the traction it’s gaining in the enterprise across all industries. In October, we undertook a massive survey effort, polling nearly 750 business decision makers from organizations thinking about, developing, and implementing robust machine learning efforts.

We analyzed the data we gathered, gleaning insight into various ML use cases, roadmaps, and the changes companies had seen in recent months in budget, R&D, and head count.

Data science: modern-day gold rush

We put together seven key findings from our analysis and published them in our 2020 State of Enterprise Machine Learning report. The first finding is likely not at all surprising: the field of data science is undergoing tremendous flux as word of demand, potential salaries, quick bootcamps, and open positions bounce around the internet. 

But let’s dig into what we found in our survey data to get a better picture of what’s happening in the field.

The rise of the data science arsenal

One of the pieces of data we collected was the number of data scientists employed at the respondent’s place of work. We hear repeatedly from companies that management is prioritizing hiring for the data science role above many others, including traditional software engineering, IT, and DevOps.

Half of people polled said their companies employ between one and 10 data scientists. This is actually down from 2018 (we polled in 2018 as well) where 58 percent of respondents said their companies employ between one and 10 data scientists. Like us, you might wonder why. We would have expected more companies to have one to 10 data scientists because investment in AI and ML is known to be growing (Gartner).

Movement in the data science realm

However, In 2018, 18 percent of companies employed 11 or more data scientists. This year, however, 39 percent of companies have 11 or more, suggesting that organizations are ramping up their hiring efforts to build data science arsenals of more than 10 people.

Another observation from 2018 was that barely 2 percent of companies had more than 1,000 data scientists; today that number is just over 3 percent, indicating small but significant growth. Companies in this data science bracket are likely the big FAANG tech giants—Facebook, Apple, Amazon, Netflix, and Google (Yahoo); their large data science teams are working hard to derive sophisticated insight from the vast amounts of data they store.

Demand for data scientists

Between 2012 and 2017, the number of data scientist jobs on LinkedIn increased by more than 650 percent (KDnuggets). The talent deficit and high demand for data science skills mean hiring and maintaining data science teams will only become more difficult for small and mid-sized companies that cannot offer the same salary and benefits packages as the FAANG companies.

As demand for data scientists grows, we may see a trend of junior-level hires having less opportunity to structure data science and machine learning efforts within their teams, as much of the structuring and program scoping may have already been done by predecessors who overcame the initial hurdles.

New roles, the same data science

We will likely also see the merging of traditional business intelligence and data science roles in order to fill immediate requirements in the latter talent pool since both domains use data modeling (BI work uses statistical methods to analyze past performance, and data science makes predictions about future events or performance).

Gartner predicts that the overall lack of data science resources will result in an increasing number of developers becoming involved in creating and managing machine learning models (Gartner CIO survey). This blending of roles, will likely lead to another phenomenon related to this finding: more names and job titles for the same work. We are seeing an influx of new job titles in data science such as Machine Learning Engineer, ML Developer, ML Architect, Data Engineer, Machine Learning Operations (ML Ops), and AI Ops as the industry expands and companies attempt to distinguish themselves and their talent from the pack.

The 2020 report and predicting an ML future

The strategic takeaway from the 2020 State of Enterprise Machine Learning survey for us was that a growing number of companies are entering the early stages of ML development, but of those that have moved beyond the initial stages, are encountering challenges in deployment, scaling, versioning, and other sophistication efforts. As a result, we will likely see a boom in the number of ML companies providing services to overcome these obstacles in the near term.

We will do a deeper dive into the other key findings in the coming weeks. In the meantime, we invite you to read the full report and to interact with our survey data in our 2020 State of Enterprise Machine Learning interactive experience.

Read the full report

Executive summary: the 2020 state of enterprise machine learning report

Read it

Reflects data only from survey Group B. Respondents were allowed to choose more than one answer.

In the last 12 months, there have been numerous developments in machine learning (ML) tools, applications, and hardware. Google’s TPUs are in their third generation, the AWS Inferentia chip is a year old, Intel’s Nervana Neural Network Processors are designed for deep learning, and Microsoft is reportedly developing its own custom AI hardware.

This year, Algorithmia has had conversations with thousands of companies in various stages of machine learning maturity. From them we developed hypotheses about the state of machine learning in the enterprise, and in October, we decided to test those hypotheses.

Following the State of Enterprise Machine Learning 2018 report, we conducted a new two-prong survey this year, polling nearly 750 business decision makers across all industries at companies that are actively developing machine learning lifecycles, just beginning their machine learning journeys, or somewhere in between. Sign up to receive the full 2020 report on 12 December 2019 when it publishes.

2020 key findings and report format

The forthcoming 2020 report focuses on seven key findings from the survey. In brief, they are:

  1. The rise of the data science arsenal for machine learning: most all companies are building data science teams to develop ML use cases. There are discrepancies in team size and agility, however, that will affect how quickly and efficiently ML is applied to business problems.
  2. Cutting costs takes center stage as companies grow in size: the primary business use cases center on customer service and internal cost reduction. Company size is the differentiator.
  3. Overcrowding at early maturity levels and AI for AI’s sake: the pool of companies entering the ML arena is growing exponentially but that could bring about an increase in “snake-oil AI” solutions.
  4. An unreasonably long road to deployment: despite the rapid development in use cases, growth in AI/ML budgets, and data science job openings, there is still a long road to model deployment. We offer several hypotheses why.
  5. Innovation hubs and the trouble with scale: we anticipate the proliferation of internal AI centers (innovation hubs) within companies designed to quickly develop ML capabilities so the organization can stay current with its competition. Machine learning challenges still exist, however, stymying the last-mile to sophisticated levels of ML maturity.
  6. Budget and ML maturity, an emerging disparity: AI/ML budgets are growing across all company sizes and industries, but several industries are investing more heavily.
  7. Determining machine learning success across the org chart: hierarchical levels within companies are determining ML success by two different metrics. The director level will likely play a large role in the future of ML adoption.

The report concludes with a section on the future of machine learning and what we expect in the short-term. 

What to expect in the 2020 report

Our findings are presented with our original hypotheses, as well as our analysis of the results. Where possible, we have provided a year-on-year comparison with data from 2018 and included predictions about what is likely to manifest in the ML space in the near term.

We have included graphics throughout to bring the data to life (the banner graphic of this post is a bubble chart depicting the use cases of machine learning and their frequency in the enterprise).

We will continue to conduct this annual survey to increase the breadth of our understanding of machine learning technology in the enterprise and share with the broader industry how ML is evolving. In doing so, we can track trends in ML development across industries over time, ideally making more informed predictions with higher degrees of confidence.

Following the report and future-proofing for machine learning

We will soon make our survey data available on an interactive webpage to foster transparency and a greater understanding of the ML landscape. We are committed to being good stewards of ML technology.

This year’s survey report should confirm for readers that machine learning in the enterprise is progressing at a lightning pace. Though the majority of companies are still in the early stages of ML maturity, it is incorrect to think there is time to delay ML efforts at your company. 

If your organization is not currently ML–oriented, know that your competitors are. Now is the time to future-proof your organization with AI/ML.

Sign up to receive the full 2020 State of Enterprise Machine Learning report when it publishes on 12 December.