How to Use AI and Machine Learning in Fraud Detection

The things people used to buy at shops years ago are now purchased online, no matter what they are: furniture, food, or clothes. As a result, the global E-Commerce market is rapidly rising and estimated to reach $4.9 trillion by 2021. This undoubtedly triggers members of the criminal world to find paths to victims’ wallets through the Web.

Federal, local, and state law enforcement agencies along with private organizations reported 3 million cases of identity theft in 2019. Money was lost in about 25% of these cases. According to the IC3 (Internet Crime Complaint Center), financial losses caused by fraud in 2019 were at its highest ever; the IC3 processed almost 500,000 complaints. In addition, the IC3 reported that business and personal losses in 2019 were almost $3.5 billion higher than in 2018.

Fraud Detection with Machine Learning

“Artificial intelligence would be the ultimate version of Google. The ultimate search engine that would understand everything on the web. It would understand exactly what you wanted, and it would give you the right thing. We’re nowhere near doing that now. However, we can get incrementally closer to that, and that is basically what we work on.” — Larry Page, the co-founder and developer of Google.

Fraud Detection Process with Machine Learning

Fraud Detection with Machine Learning becomes possible due to the ability of ML algorithms to learn from historical fraud patterns and recognize them in future transactions. Machine Learning algorithms appear more effective than humans when it comes to the speed of information processing. Also, ML algorithms are able to find sophisticated fraud traits that a human simply cannot detect.

Works faster. Rule-based fraud prevention systems imply creating exact written rules to “tell” the algorithm which types of operations seem normal and should be permitted, and which shouldn’t be because they seem suspicious. However, writing rules takes a lot of time. Also, manual interaction in the E-Commerce world is so dynamic that things can change significantly within a few days. Here Machine Learning fraud detection methods will come in handy to learn new patterns.

Scale. ML methods show a better performance along with the growth of the dataset to which they are fitted — meaning the more samples of fraudulent operations they are trained on, the better they recognize fraud. This principle does not apply to rule-based systems as long as they never evolve themselves. Also, a data science team should be aware of the risks linked to fast model scaling; if the model did not detect fraud and marked it incorrectly, this will lead to false negatives in future.

Efficiency. Machines can take over routine tasks and the repetitive work of manual fraud analysis, and the specialists will be able to spend time on making more high-level decisions.

Don't have time to read?

Book a free meeting with our experts to discover how we can help you save time and money.

Book a Meeting

Types of Internet Fraud and How to Prevent Them

Types of Internet Fraud:

Email phishing

Email phishing is a kind of cybercrime that involves spreading fake sites and messages to users, then using the data they share as a result. Email phishing has become a popular and fast way to steal confidential data. If a person is not aware of the possible consequences, he or she may enter their vulnerable data into the fake access window and put themselves at a big risk of being compromised. The easiest way to avoid this is to ignore messages coming from suspicious sources. Although in some cases it is harder to distinguish a real email sender from a fraudster, because some messages look pretty legitimate.

There are traditional methods for phishing detection known as filters. The first one is authentication protection and the second one is network-level protection. Network-level protection splits into three types of filters: whitelist, blacklist, and pattern matching. They work through banning IP address and domains from networks. Authentication protection includes email verification, which implies client-level verification through requiring the completion of a submission from the receiver and the sender.

Apart from traditional methods, which are fading into the past, there are automated methods for Phishing Detection with Machine Learning. These methods are based on classical Machine Learning algorithms for classification and regression.

Payment Fraud (credit card and bank loan scams)

Payment fraud detection is the most common fraud type tackled by Artificial Intelligence (AI). Its variations are as diverse as fraudsters’ imaginations. However, here are a few of the most common types of payment fraud: lost cards, stolen cards, counterfeit cards, card ID theft, and card non-receipt. The recent advent of cards with a chip (EMV cards) helped reduce card- present fraud cases in Europe, but not in the United States where the magnetic stripe credit card elimination process is pretty slow.

Card-not-present transactions occur in a variety of forms. After attacking a user in ways that enable a criminal to collect enough card details through phishing, contacting his or her mobile provider, and breaking into the account online, the fraudster orders goods or takes out loans. A loan scam may happen if someone contacts you to offer a loan with unrealistically good conditions, the lender does not provide a check that confirms the loan, the lender asks for bank details or upfront payment, or the company pretends to be from a certain country but the number is international.

Fraudsters can also make illegitimate charges through application fraud, meaning that they apply for a card in your name by filling in the stolen information. After obtaining confidential information through different methods online, they can call the credit card company and pretend to be the cardholder to say they need a new credit card to be sent to a certain address. The address can be changed in your account if it was compromised.

Identity theft

Information such a victim’s name, bank details, email address, passwords, passport or identification details, and other valuable information to gain access to accounts is under great threat if a professional identity thief comes into the game. Identity theft is a critical form of cybercrime, putting both individuals and enterprises at the risk of unpleasant consequences.

There are three types of identity theft: real name theft, synthetic theft, and account takeover. The collected information is used to register new bank, credit card, and/or mobile phone accounts.

Account takeover: This happens when information is used to gain access to a current account. Sometimes the fraudster may also alter e-mail addresses and other details linked to the account, and the proper owner will not be aware of the changes.

Synthetic theft: This implies merging true information and artificial details to make a new personality. The purpose of this is to make illegal purchases and create counterfeit accounts.

Identity theft can hit you when you expect it the least, such as when you are informed that your bank account has a zero balance — although yesterday you know you had money there — or you get bills for services or products you never ordered.

Identity theft is dangerous, because the consequences (which may take years to rectify) can take one of the following forms:

  • you are rejected for credit because you now have a bad reputation;
  • you receive bills for services you never used;
  • your bank account(s) is closed; and/or
  • you’re held responsible for debt that you did not incur.

Unlike robbery, identity theft can go completely unnoticed before the victim encounters a dramatic loss. The “red flags” for understanding that you have become a victim are unknown transactions or increasing debt on a credit card, the source of which is unknown to you. Mail about the spent money can go to another address of the criminal’s choice. Thus, you will not know about the situation right away.

Machine Learning for identity theft detection helps examine and check identity documents against secure databases in real-time to ensure all fraud cases will be detected. Valuable documents that can be used for identity theft are passports, PAN cards, or driver’s licenses. To enhance the security provided with Machine Learning, additional verification such as face recognition or biometric information can be required. These security methods demand real individuals to authenticate the operation and significantly lower the chance for successful fraud.

ID document forgery

Before, a person could only buy a fake ID for a lot of money in the black market — but now, with the boom of E-Commerce, various websites offer their forging services for as little as $100 and as much as $3,000 for one document. The lower the price of a fake document, the poorer its quality. Expensive IDs are fabricated so masterfully that it becomes nearly impossible to verify their legitimacy and prevent fraud.

The fact that fake IDs have become easier to get puts the safety of many services that have automated personality verification systems at risk, along with the safety of service owners and users. If a person has fooled the system and got verified, he or she can then use the resource for their interests and get away with it, leaving other users fooled and distressed.

There are ways Machine Learning can prevent fake ID-related fraud. For example, perhaps a criminal downloaded a forged document to prove his personality on an apartment rental site. If the site’s verification system has Machine Learning in it, the photo is scanned by a pre-trained Neural Network. Then, the fraud detection system searches fake document patterns that it has seen in numerous fake documents before, classifies the document as fake or suspicious, and — if needed —additional verification is required.

Fake account identification

Identity verification problems also refer to social media accounts. The process of verifying such accounts includes checking the account registration details, the accessing network, and finally the IP and MAC address of the device creating accounts with the same personality (i.e., photo).

The process of fake account detection depends on the rate of engagement and false activity. It is assumed that fake accounts usually have a large number of friends or followers while their profiles hardly show any sign of user interactions. Also, there is usually a large number of likes, comments, and friend requests from the fake account that are noticeably higher than the average for real users.

These factors relate to users of social media sites such as Twitter, Facebook or Instagram, but it is also possible to identify users who register many inactive accounts on retail or other sites. This can be defined by features such as the date of registration, amount of time spent on the site, and the IP and MAC address of the user’s device.

Credit Card Fraud Detection with Machine Learning

Credit Card Fraud Detection Statistics

Information on credit cards and payments made online gives fraudsters the opportunity to illegally use it for their advantage. The IC3 report for 2019 shows that victims lost almost $112 million due to credit card fraud.

Credit card fraud is the most common type of payment fraud type, because digitally stored details give the criminal much higher chance to get away with it. Also, transactions are harder to verify.

The table below shows all possible types of credit card fraud activity:

Types of Credit Card Fraud Activity
#Credit card fraud type
1Stolen credit card
2Formjacking
3Account takeover
4Intercepting mailed cards: cards taken from your mailbox
5Fraudulent credit applications: using your information to apply for new credit in your name (identity theft)

Stolen Credit Cards

Online purchases are often the first steps for someone who has stolen a credit card, as this requires the criminal to simply put the credit card’s information into the necessary fields; not all shops require additional verifications. A criminal can also sell the credit card info to other criminals for as little as $45. If you think about it, the income from such trade can be especially large; just consider the data breaches of millions of registered accounts. One such case, the Marriott data breach, happened in 2018.

Need help with software development services?

Book a free meeting with our experts to find out how we can help you to build your project according to your business vision.

Book a Meeting

Formjacking

A cybercriminal with skills in how a website’s security system works can inject code into JavaScript that will intercept the card details entered by a user while making an online purchase. This works mostly for poorly built sites with code vulnerabilities.

Formjacking Example

Account Takeover

Account takeover happens when a criminal manages to access a victim’s account through phishing, malware, data breach, or other methods. A criminal can take over a user’s bank account or social media page to try and ask for money from his or her friends and family. Here are some other ways in which a criminal can take over an account:

  • If a criminal somehow learns a user’s login and password for one account, they can try to use a similar password for that user’s accounts on other websites.
  • Other cases of account takeover happen through mobile phone scamming, where the criminal pretends to be an official representative of a service that the victim uses.
  • If a criminal has access to your mobile phone or email account, he can gain access to valuable accounts through the “account recovery” option — because for most services, it is possible to verify changes via phone message or email.

Intercepting Mailed Cards

After the fraudster has gained access to your mailbox, he or she can find letters with valuable information about the bank accounts and credit cards you use.

Fraudulent Credit Applications

Criminals can take out credit in your name if they obtain enough information. After a while, you will receive upsetting messages saying that you have a debt you never took out.

Widespread Fraud Scenarios

By knowing the principles of how a fraudster operates online, Machine Learning engineers can develop efficient techniques to detect fraudulent activity.

Here are five widespread fraud scenarios:

Advanced Privacy Software

Experienced fraudsters use special software that hides information about the user, such as the user’s location, from browsers. Software like Anti-Detect and Kameleo is used to create several instances of virtual machines in browser windows.

Location Spoofing

Simulating the location typical for a card owner, the fraudster can avoid the rule-based security system built in to a site. He can determine the necessary location from the compromised card details.

Phone Number Spoofing

If a fraudster somehow obtained the card details of the victim, he can buy his phone number online. Then, to deal with the problem that he does not physically have this mobile phone, he can call the customer’s phone provider and inquire about diverting all purchase information from the card to a new phone number.

Copying a Buyer’s Behavior

At times when criminals were less sophisticated in the “art” of credit card fraud, they used to charge large amounts of money and buy expensive goods immediately after compromising the card. It was easy to spot fraud in such cases. Nowadays, they tend to simulate a “real customer’s” behavior, making smaller purchases before a big one or pretending to think before buying something through adding and removing things from the user’s online shopping basket.

Enhanced Customer Information

To appear more convincing while trying to compromise a user’s credit card, fraudsters buy and sell device IDs and driver’s licenses on the Dark Web. This allows criminals to mix valuable information about a certain person and build a new account based on the fake IDs.

Machine Learning Fraud Detection Models

Email Phishing Detection Models

Phishing emails represent spam letters that have fraudulent intentions. Phishers make fake websites and their URLs highly similar, both visually and semantically, to the originals. They are mostly threats to the Banking sector, multinational companies, and even medical establishments.

Logistic Regression is one of classic Machine Learning algorithms for phishing detection. Logistic Regression uses a linear model to predict a number in range from “0” or “1”, meaning spam or not.

Another way is to extract features from a website and classify it as fake or not with traditional Machine Learning classification models such as SVM, Naive Bayes, and Extreme Learning Machine. The first stage before classification includes NLP to process the text from a website and provide a semantic analysis of the text.

Generally, phishing detection is tackled as a supervised Machine Learning problem that involves collecting a number of falsified emails with fake URLs and an equal number of legit emails and websites from the original sources in order to train the model. The features that most obviously contribute to the classification of an email as “phishing” or not are: usage of the “at” symbol in the URL address, so that the browser cannot read symbols before “at”, or the favicon shown in the address bar is downloaded from a domain other than the one shown in the address bar. Also, the registration length of the site identifies whether the site is fake, because trustworthy resources most likely will register their domains for a long time, unlike phishing websites.

Indicators for phishing fraud detection with Machine Learning are shown in the table below:

Indicators for phishing fraud detection with Machine Learning
#FeatureSignificance
1Having IP addressHigh
2URL lengthMedium
3Having “at” symbolLow
4Double slash redirectingMedium
5Having subdomainMedium
6Domain registration lengthMedium
7FaviconMedium
8HTTPS tokenHigh
9Age of domainHigh
10Links pointing to pageHigh
11Web trafficHigh
12Page rankHigh

Identity Theft Detection Models

To prevent identity theft, a method such as patterns identification can significantly improve the accuracy of fraud detection. For example, if an individual’s behavior patterns are stored to a database. That way, the previous behavior patterns recorded for a certain user are constantly being compared to the activity in the account. In the event that this activity largely differs from the norm, fraud can be suspected. Each new transaction contributes to the behavioral fraud analytics process done by the model, helping it to train better.

Identity theft detection is considered an anomaly detection challenge, so various state-of-art unsupervised Machine Learning algorithms such as LOF, PCA, one-class SVM, and Isolation Forest help find abnormal patterns of a user’s behavior in order to detect unauthorized actions. They work as a litmus test to find anomalies in the field of normal behavior. These algorithms group abnormal behavior data points together in a dense cluster than differs from clusters of normal behaviors.

Unsupervised Machine Learning for Anomaly Detection

Credit Card Fraud Detection Models

Fraud models can be tackled with both supervised and unsupervised Machine Learning algorithms. In the first case, traditional classification algorithms are used; in the second case, we can use anomaly detection techniques. The use of neural networks is also efficient, but it requires a great deal of training data with an equal amount of data points for two classes: abnormal and normal. However, in the case of fraud detection, there’s always a lack of balanced datasets.

ARE YOU INTERESTED IN LEARNING MORE ABOUT CREDIT CARD FRAUD DETECTION?

Find out more about Credit Card Fraud Detection with Machine Learning in our Complete Guide

Read Article: Credit Card Fraud Detection

ID Document Forgery Detection Models

ID document forgery detection deals, in the first place, with image processing. Certain techniques are used to make sense of the visual information that an image carries. CNN models are usually trained to perform this task, whereas neural networks are built in a way to minimize losses. CNN imitates the work of the human visual cortex — the part of the brain that takes care of processing visual information. Just like how supervised learning needs a collected set of forged and real document images, the dataset needs to have a sufficient number of photos from both classes.

Configuring the neural network to perform at its maximum efficiency includes testing different architecture types with different numbers of layers and filter sizes in convolutional layers. Usually, convolutional architecture has four convolutional layers. This method has accuracy of about 98% for detecting ink mismatch problems in forged documents with blue ink and 88% for black ink.

This forgery detection technique relies on HSI, which is short for hyperspectral image analysis. This method implies building an electromagnetic spectrum map to obtain the spectrum for each pixel in the image.

Another approach may be transfer learning and the usage of pre-trained models such as a VGG16 network based on an ImageNet dataset, ResNet50, or VGG19.

Fake Account Identification Models

Fake account identification is a classification problem, so here we start by selecting the profile that needs to be classified as fake. The most important part of classification is feature selection, meaning that we rely on parameters such as rate of engagement, activity, number of followers compared to the number of people the account is following, and the relevancy of comments. After the feature matrix is built, it is fed into the classification model — which may be one of the most efficient binary classifiers, such as Naive Bayes, SVM, Decision Trees, Logistic Regression, etc. The classifier can be continually trained with new data on fake and real accounts, which helps increase the accuracy of its predictions.

The accuracy of Logistic Regression and Random Forest showed one of the highest results in the approach we used, which is around 90% and 92% (respectively). There’s still a lot to research and test about the problem of fake account identification. The main constraint is privacy laws, which interfere with efficient data collection.

Fraud Detection in Banking and E-Commerce

E-Commerce

E-Commerce businesses are the most vulnerable to online payment fraud as long as it is not necessary to have a physical card if you make a transaction online. Starting from small niche retailers to large providers, their websites are under the threat of formjacking or undergoing a data breach. Even huge E-Commerce businesses such as British Airways, Newegg, and Ticketmaster are exposed to attacks on a daily basis.

Read also: Marketplace Platform Development Guide

Seventy percent of card fraud in Europe is represented by card-not-present fraud. Consequently, the number of fraud cases linked to online E-Commerce is rising, while the E-Commerce market is predicted to reach around $4.5 trillion in 2021.

An E-Commerce business can prevent fraud by constantly improving the internal network security system, such as setting a more advanced system based on fraud detection. The main advantage offered by Machine Learning algorithms for fraud identification is a strong performance in the real-time value detection rate. The second thing to consider is that Machine Learning models tend to spot fraudulent E-Commerce transactions at a higher speed without increasing the frequency at which genuine transactions are declined.

Banking

According to research, venture funding for fraud and cybersecurity AI-based web and mobile applications development will increase by 30% in 2020. Banks are usually interested in decreasing the amount of payment, loan, and customer onboarding fraud.

One example of how fraud detection software can work for banks is developing risk profiles for bank customers and rating them on granular data. A bank can either allocate its current software developers to work on such a tool or outsource data science professionals to build Machine Learning models that take widespread fraud schemas into account.

The Final Word

As long the modern world is overwhelmed with card-not-present transactions online, the Banking and Retail sectors are under threat and face many fraud cases. Email phishing, payment fraud, identity theft, document forgery, and fake accounts contribute to the high level of criminal attacks on vulnerable users’ data and lead to data breaches. As old rule-based algorithms for fraud detection fade into the past, new top-notch methods based on Machine Learning algorithms for fraud detection and prevention are bringing greater value to businesses with their real-time work, speed, and efficiency.

Summary

Is it a “big deal”?

Innovations always take time and effort to be implemented and bring the results corresponding to the expectation of the client. While Fraud prevention techniques demand possible changes in the infrastructure of the way data is stored and organized as well as it is cleaned and prepared to use, it is definitely worth it to go for. The first steps towards implementing ML techniques for fraud detection will be difficult, but the use of it will grow from year to year and show up in decreasing numbers of users’ complaints and boosting loyalty.

Why machine learning? What’s the difference between old school methods like rule-based detection?

Unlike old school rule-based methods, Machine Learning algorithms process the raw data, like emails or text and then learn from what they take as input, becoming smarter along the way. Rule-based methods, on the other hand, cannot detect any new patterns in the data, as they only follow a pre-established scenario that does not include slightly changed fraudulent activity patterns.

What types of fraudulent scenarios we can detect using ML?

Machine Learning allows capturing phishing emails, unauthorized transactions and suspicious user behavior in his account that was not seen before. These are only a few examples, in reality, we can adjust the work of ML algorithms to most online fraud scenarios.

Is it suitable for my industry?

Machine Learning has a broad circle of usages for industries from Banking and E-commerce to even military and healthcare. Having a sufficient amount of historical data to learn from, you can find a way to efficiently implement ML methods in your industry or business domain specifically and dealing with fraud in particular.

What are the Machine Learning methods to efficiently detect fraud?

Machine Learning methods for fraud detection can be divided into supervised and unsupervised models, where supervised methods demand a big amount of data, such as fraudulent and non-fraudulent transactions in equal amounts to train the model. Unsupervised methods search patterns and correlations in the raw data and the prediction is built without the additional labeling. Supervised ML methods are considered to be more accurate while unsupervised take much less time to prepare.

Further Reading

  1. Data analysis techniques for fraud detection
  2. Machine Learning for Fraud Prevention: What’s Next
  3. Machine Learning in Ecommerce Fraud Detection
  4. Machine Learning for Unsupervised Fraud Detection

ARE YOU INTERESTED IN DEVELOPING A FRAUD DETECTION SOLUTION?

Contact our experts to get a free consultation and time&budget estimate for your project.

Contact Us