Reading Time: 5 minutes



As more and more businesses are facing credit card fraud and identity theft the popularity of “fraud detection” is rising in Google Trends:



Companies are looking for credit card fraud detection software that will help to eliminate this problem or at least reduce possible dangers. Before looking at SPD Group credit card fraud detection project, let us answer the most common questions:


What is a Fraud Detection System?

It is a set of activities undertaken to prevent money or property from being obtained through false pretenses.


What are the Fraud Detection Predictive Models?

Models make predictions based on information about a transaction and some context (historical) information. To make the model more robust, we used only the most important features which were selected based on χ² (A chi-square is a test that measures how expectations compare to actual observed data) and recursive feature elimination techniques.


How can Neural Networks be used to Detect Fraud?

Neural Networks are highly effective when the data scientist has access to a large dataset (say a hundred thousand of data samples and more). They are able to seek patterns and smartly detect new behavior that seems too distinct from the normal flow. However, in our case, we decided to rely on other Machine Learning models such as Classification trees because the RNN performance did not show the accuracy we expected, most likely because of a not large enough dataset.


Anomaly Detection Solution for E-Commerce Credit Card Transactions from SPD Group


Monitoring Fradulent Activity


Development time – 3 months

Team size – 6 experts

Platform – Web


Overview of the Credit Card Fraud Detection Project


SPD Group was addressed by an E-commerce and Financial Service company that offered products and services that can be paid for using Mobile Money or a bank card (Visa, MasterCard) to make their platform a safer online transaction place for their customers. Along with the increase in the number of customers who faced issues with their money suddenly disappear or transferred to the other unknown account, our client thought of implementing a modern fraud prevention method for his platform. Therefore, he addressed us and decided to rely on what Machine Learning can do here.





To dive into the challenges and obstacles of this project we got a quote from Machine Learning Engineer from the development team:


“The most complicated part of the solution was to achieve good metrics for users who have made only a few transactions. We could apply the regular model, which is good for users with rich transaction history, but it would give worse scores if there is a lack of historical data (for example a new user). Another obvious solution is to treat such users as empty accounts that have only identity information without any transaction history. In this case, we lose the advantage of having at least some data about the users, but the results that such a model provides are quite stable (underfitting). After making a weekly stand up on the matter, we decided to look into “few-shot learning” techniques, which could help us improve our metrics. We have prepared a PoC, but it didn’t give us the drastic improvement we had expected. Nevertheless, we proceeded with experimenting and diving into our client business domain, it allowed us to develop features which have made a huge impact on our model that is based on “few-shot learning” techniques. Because of the domain features, our main score improved by more than 15% and it became the production solution.”




Our R&D team worked on the project within 3 months, using Classification rather than classical Anomaly Detection methods. After an intense feature generation phase (about 700 features in total) they went to feature selection, to choose only the most relevant ones. Finally, it was for a blend of Classification methods such as GXBoost, Catboost, and LightGBM to get near the desired score.


Credit Card Fraud Detection Dataset

The platform was an e-commerce and financial service app serving 12,000+ customers daily. This dataset included a sample of approximately 140,000 transactions that occurred between October 2018 and April 2019. One of the fraud detection challenges is that the data is highly imbalanced. There were around 130,000 normal transactions and only 6% of them were fraudulent. We addressed the problem of an imbalanced dataset with various techniques as data oversampling (augmenting the existing data samples) and data samples generation.


Credit Card Fraud Detection Algorithm

Once the machine learning-driven fraud protection module was integrated into the E-commerce platform, it had started tracking the transactions. Whenever a user requests a transaction, it is being processed for some time and depending on the level of predicted fraud probability there are 3 kinds of possible output:


  • If the probability is less than 10%, the transaction is allowed
  • If the probability is between 10% and 80%, additional authentication factor (one time SMS code, Fingerprint, Secret Question) should be applied
  • If the probability is more than 80%, the transaction is frozen, so it should be processed manually


The model estimates the probability of fraudulent transaction basing on transaction information: Date and Time, Product category, Amount, Provider (Seller); client information: Agent information, Location, Client’s behavioral patterns; contextual and aggregated data which is produced by an ML engineer, based on the previously mentioned data.



Technology for Credit Card Fraud Detection Software


Classification models:

  • Decision Trees
  • Isolation trees
  • Random Forest
  • Bootstrap


Anomaly detection:

  • PCA
  • Mahalanobis distance
  • Local Outlier Factor




After this solution was implemented, the whole E-Commerce platform received tangible benefits. Only in half of the year after production, we can highlight the following areas:


  • Cost Reduction for fraudulence problem solving: communication with Customer Support, filling in forms for refund, refund verification by a manager, etc.
  • Customer Satisfaction because of simplified authentication for a transaction with a low probability of fraudulence (usually, most transactions have low fraudulence probability).


What we achieved with the solution: 140+ thousand transactions analyzed with 6% of fraudulent data points a year. Fewer customers claimed to have fraudulent transactions. Our client’s online card transaction platform became a safer service and gained more loyalty from the customers. We continued to support the project after release because it is very important to train the Fraud Detection model continuously whenever new data arrives, so new frauds schemas/patterns could be learned and detected as early as it is possible.