November 16, 2017
[Editor’s Note: November is Machine Learning Month on CardNotPresent.com. This is the third article in a series from our sponsor exploring how, against increasingly sophisticated attacks, artificial intelligence and machine learning are being applied to online fraud prevention. Check back here throughout the month for more content that sheds light on the newest technology in fraud prevention and answers questions you might have on how it can impact your antifraud efforts.]
1. At Feedzai, we’re hearing fraud analysts ask: why do I need machine learning and not rules?
The best approach is to combine rules and models for more effective fraud detection and risk management. Using rules can produce great outcomes in fraud detection, but they have clear limitations that can be best addressed by combining them with machine learning models. Across our customers, we have found that combining rules and machine learning models produces a significant increase in fraud detection rates.
Rules have other known limitations which include:
- Rules require fraud analysts and don’t learn directly from the data, but rather require human oversight and intervention.
- Rules often miss interaction effects. For example, one could have rules based on total charge amount or transaction frequency. However, it might be best to block transactions with a high charge amount only if the card’s transaction frequency is also high. Models are ideally suited to exploring these interaction effects, but rules will only catch them if they are known in advance.
- Rules are often built with thresholds. For example, all transactions with more than eight previous transactions in a one-day period could be blocked. However, the precise threshold of eight could block either too many or too few transactions. A model will automatically vary this threshold for optimal fraud detection.
2. How does a model predict fraud?
A model uses a machine learning algorithm to explore a large amount of data quickly and accurately. These models enable end users at banks and merchants to make decisions, such as whether to decline a transaction or to send it for review.
The model takes data from known transactions and outputs a score for each new transaction that describes its similarity to known fraud. One way to use these scores is to label all transactions scored above a set threshold as fraud. Different score thresholds can be used in a variety of ways depending on the business needs.
In addition, models never assume that there is just one type of fraud. Instead, they are able to evaluate patterns across all transactions. This hypervigilance makes machine learning well-suited to the fact that fraudsters are always changing tactics.
A lot of ingredients go into building a model. Some of these include event labels for transactions (e.g., fraudulent or not), features generated from the data (e.g., number of transactions in the past day), and a scoring algorithm for all incoming transactions.
3. Is supervised learning or unsupervised learning better?
There’s an axiom in machine learning: “More information is never worse.” Of course, some information doesn’t have predictive value. For example, rainfall in Nigeria probably won’t affect Google Adword prices. However, any information likely to have predictive value will likely produce a better model.
In supervised learning, the data has labels such as fraud or not. Like in the example above, spurious data can easily be shown to have no statistically predictive value. In supervised learning, these labels are used to understand what type of transactions are fraudulent and what features are most predictive of fraud.
In contrast, unsupervised learning doesn’t use labeled data. It relies on looking for outliers or transactions that don’t fit into normal patterns. These outlier transactions that don’t easily fit into normal patterns are thought of as suspect and are labeled as either fraud or worthy of investigation.
In general, unsupervised learning is not as accurate as supervised learning and is harder to use and interpret. That being said, we’ve found that unsupervised techniques can be valuable in certain cases. The best practice in the industry is to choose the approach based on the specific use case.
4. If models are trained on existing fraud, how do they detect new fraud?
This question has two components. The easier one to answer is: when we have a new event, how do we know if it’s fraud? This is what a model does. It assigns scores to every transaction and those with the higher score are more likely to be fraud.
The harder part to your question is what happens when a model gets really good at detecting a certain type of fraud, and the fraudsters try something new that hasn’t been seen before. How can the modeling approach handle this? Well, any attack requires confirmation over time. As the new fraud pattern emerges, the model learns to identify it and label fraud.
However, in the initial stages of a new attack, before the data is labeled as fraud, the best practice is to monitor the transaction for outlier behavior, review these transactions, and put new denial rules into place. Very quickly, the model learns to identify these new types of fraud incursion, and the rules are no longer necessary.
To enable the partnership between the human and the machine, it’s important for a system to have an effective case manager for fraud analysts to easily monitor fraud and quickly add new rules.
5. What are the best models in today’s machine learning systems for fraud?
Over time, we have found that the best approach is to be “model agnostic.” Every modeling approach has advantages and disadvantages, and it is important to stay abreast of new modeling approaches and compare them to what’s already in place.
There are a number of metrics that can be used to evaluate new modeling approaches. Beyond model accuracy, it is important to also model implementation issues such as speed and scalability. In addition, our customers want to understand and interpret the model results in addition to using the model scores in production.
Our data science team has explored a number of proprietary algorithms for model interpretability and this is an active field of research. My colleague, Feedzai’s co-founder and CSO Pedro Bizarro, has recently mapped the evolution of AI systems in terms of explainability.
After a research career in Theoretical Physics (Ph.D. in Physics at Stanford University under Robert Laughlin, Nobel Laureate), Andy Tikofsky worked in finance for a number of premier U.S. hedge funds focusing on risk management, quantitative strategies, machine learning and portfolio management. For the past decade, Andy’s focus has been on big data, machine learning, and artificial intelligence applied to telecom, sales and marketing strategies and financial transactions. He leads the U.S. Data Science Team for Feedzai, with a focus on delivering data science client solutions and developing new risk management and data science systems to help meet client needs.