Machine learning and fraud: how complex algorithms allow us to fight back

Supported By:

Net Patrol International Inc.  Data Investigation and Forensic Services
Bankruptcy and Insolvency Trustees

The phrase “machine learning,” evokes images of Will Smith killing dozens of free thinking robots while wearing super cool Chuck Taylors in I,Robot. It’s the type of misconception that’s been hammered home by years of Hollywood extrapolating the future to the extreme, often stirring up unnecessary hysteria. The reality is that machine learning and A.I are two sides of the same coin, sharing various similarities but also distinct differences in the way it’s used.

If we think of A.I as the broad concept of machines carrying out complex tasks, machine learning is the practical application of giving a machine access to data and allowing it to create industry specific solutions based off that data. In our everyday lives, machine learning is our Netflix recommendations, self-driving cars, and credit card fraud detection systems.

When it comes to fraud, or more specifically catching fraud before it happens, machine learning is at the forefront of direct change, in helping companies quickly understand fraudulent tendencies by analyzing large amounts of data in real time.  

In this piece we spoke with Padraig Stapleton from Argyle Data, a tech company focused on fighting fraud in the telecommunications industry by using machine learning algorithms. We break down the types of fraud that Argyle Data is using machine learning for, and some of the ethical questions that arise when using artificial intelligence to catch illegal activity.

“Anyone can set up a rule or threshold that if the activity surpasses an investigation is triggered, but the problem with that is all other nefarious activity is left out. Utilizing machine learning, lowers those thresholds for awareness and activity considerably,” Stapleton said. “We’re able to build up a history and a set of data to compare to, instead of specific levels to hit.”

What is machine learning?

Machine learning, to the general public, can be extremely complicated once you start go down this very specific rabbit hole. But when it comes to dealing with telecommunications fraud, we’re dealing with a few pieces of a larger puzzle. In terms of machines or “programs,” iteratively learning over time, there are two specific categories that algorithms contend with, supervised and unsupervised learning.

Supervised learning:With supervised learning the machine algorithms use a initial set of data to begin it’s iterative learning process. The desired data inputs and outputs are already known when analysis begins, hence “supervised” learning. The algorithms then compares the correct output with the actual output to find any errors. The model of learning is then modified accordingly. Think of a teacher in school supervising a lesson in class, giving students a set of information to work with and apply to a problem, course correcting or “teaching,” when errors need to be addressed. “It’s asking the question: How many data labels did we get? And what model of learning can we create off of it?” Stapleton said.

Unsupervised learning: This version of machine learning changes slightly from its sibling concept in regards to its data set. With unsupervised learning, the program isn’t given an expected output, instead it’s shown a set of data and the algorithm needs to figure out what it’s being shown. This, theoretically, is useful for exploring data and finding a suitable infrastructure that can be used to practical use. Think of when you go to a party and encounter a setting and people you’ve never met before. Here, you have no previous data set to analyze and understand how to interact in this situation. So you learn on the fly, interpreting data as it comes in and making decisions based off of these interpretations. “Once you start analyzing data that’s coming in you build up separate databases which then feed this loop, allowing us to compare new data to an established baseline and feeding the algorithm’s process of learning.

So How Does Machine Learning Apply to Fraud?

When it comes to telecommunications fraud, things again, break down into two categories, with wholesale and mobilephone carriers. We’ll discuss wholesale communications fraud briefly, but seeing as mobile communications fraud is more prevalent to the Canadian perspective, we’ll stick there for the most part. When it comes to mobile carriers and the types of fraud they face, we’re looking at companies like Rogers, AT&T and Bell.

“What they (mobile carriers) typically see in day-to-day boils down to two types of fraud, network usage and subscription fraud, but there are dozens of different types of schemes out there,” Stapleton said. “We’re finding new types of fraud every week based off of our data analysis.”

According to Argyle Data, communication service providers (CSPs) lose 38 billion dollars a year to fraud and network abuse.  Giant phones companies are struggling to curb these types of fraud internally, so they’ve turned to platforms like Argyle Data to help them analyze massive amounts of information quickly. Machine learning in terms of its practical application in the CSP space, is very similar to credit card fraud detection: analyzing data sets for anomalies  in specific data patterns.

One giant case of fraud that’s been plaguing the telecommunications industry is referred to as network usage fraud or Wangiri, a Japanese word translated literally to “one (ring) cut.” Fraudsters use automated dialing services to ring and immediately hang up on thousands of users at once, tricking them to call back on a premium rate international lines.

In another type of subscription fraud, stolen credit card data and identity information is used to set up new phone plans. Fraudsters rack up exorbitant usage during the initial billing period, but the bill is never paid. According to Argyle Data, this is the most problematic type of fraud the telecommunications companies are facing.

“This turns out to be a huge problem for carriers, anywhere from three to five percent of customer requests for service turn out to be fraudulent, with devices being accepted and then sold on the black market with the intention of doing so from the beginning,” Stapleton said. “It’s a huge financial loss for these companies seeing as these devices run from six hundred to a thousand dollars a pop.”

According to Stapleton, wholesale carriers primarily face subscription fraud when it comes to an international context for companies connecting customers.

How machine learning can help

The algorithms that we talked about earlier, are applied here, taking massive amounts of cellular data and combing through them, looking for anomalies in what would be considered normal activity. There’s a ton of different algorithms utilizing different methods of mathematical engineering like Logistic Regression and Back Propagation Neural Network when it comes to supervised learning and the Apriori algorithm and K-Means for unsupervised machine learning.

The concept of efficiency also comes into play in terms of  stopping these types of fraud. Instead of days or hours where sensitive data or charges have already been distributed or rung up, machine learning algorithms are now analyzing huge chunks of data in minutes, allowing for fast response time in curbing this activity. Efficiency plays out in things like traveling overseas and making purchases. There was a time where you would have to call the credit card company to let them know not to be alarmed if they see different types of purchases on your credit card bill. But now that credit card companies use machine learning algorithms, they can analyze your purchasing history from the minute you buy plane tickets to England.

“Of course nothing is ever perfect but machine learning when it comes to analyzing and stopping fraudulent activity, has put us leaps and bounds ahead of where we were five years ago,” Stapleton said. “I don’t think we’re as close to robots and emotive A.I as we believe we are, but these algorithms are a great example of where we are with technology right now and it’s just growing.”

So when it comes to fighting fraud and stolen iPhones, machine learning has made strides in an industry that at one point in time, didn’t know where the to turn for help. And while Stapleton doesn’t necessarily believe that we’ll see the realistic application of robotic A.I in his lifetime, he does think there’s still massive growth potential for machine learning. Specifically in the efficiency rate at which machine learning algorithms will be able to break down even larger amounts of data, multiple terabits in a few minutes. You can find out more about the work Argyle Data is doing at their website, here.