Anomaly Detection with Machine Learning Algorithms

To understand the scope of the problems pending to be solved with anomaly detection, just take a look at the statistics. According to the American Bankers Association (ABA), by 2016, bank industry losses resulting from fraudulent activity had reached as much as $2.2 billion. Besides, the total value of fraudulent transactions conducted using cards issued within SEPA and acquired worldwide amounted to €1.8 billion in 2016. The same year, 3 billion Yahoo accounts were hacked in one of the biggest breaches of all time.

Now, let’s talk about what anomaly detection is as a concept. An anomaly is an event that has occurred unexpectedly in the regular flow of things. Therefore, anomaly detection with Machine Learning is the process of identifying those unusual patterns, events, or observations in data that are different enough from the whole scope of information to be suspicious.

Apart from being used for fraud prevention, anomaly detection is highly applicable in a variety of domains. For instance, these domains include medicine, manufacturing, and traffic systems. More specifically, in medicine, anomaly detection is used to detect damaged or malicious cells. In manufacturing, it can serve to identify structural defects, roots of malfunctions in the equipment work, and so on.

Don't have time to read?

Book a free meeting with our experts to discover how we can help you.

Book a Meeting

Condition monitoring and predictive maintenance

Any machine or device has a probable life duration and certain health indicators. By summarizing the parameters of many similar devices, one can forecast the time period when a machine is likely to get broken, or when a decrease in its health indicators is likely to occur, significant enough to make the machine work poorly. To prevent an unexpected shutdown or failure, Machine Learning experts use such a technique as predictive maintenance that uses anomaly detection as one of the tools.

With the advent of the Industry 4.0 a new way of ensuring the availability of machines came into being. Because around 82% of companies experience unplanned and costly downtimes around $260,000 per hour), o it becomes clear that this problem must be somehow tackled. Also, around 64% of unplanned downtimes are linked to equipment failures due to improper maintenance and lack of condition tracking.

The spectrum of anomaly detection use cases related to condition monitoring and predictive maintenance is quite broad:

Automotive industry
In this industry, tracking the condition of welding machines, spindles in milling machines, laser drilling machines, and some other equipment is essential. Moreover, Machine Learning solutions combined with IoT for the automotive industry help identify cracking, lubrication problems, misalignment of assembled parts, and other defects in real time.

Steel industry
Here, condition monitoring is used to monitor the state of cold rolling mills (this is especially important for the quality of steel). In-time detection of rolling mills’ defects allows conducting corrective actions and minimizing the negative impact.

Oil and Gas
Oil and Gas is not the last industry to apply predictive maintenance in, but here it is used to monitor offshore drillings in real time, also in a combination with IoT. The potential critical state of equipment can be identified once the data produced by them is remotely processed.

Condition Monitoring

Hacker attacks and fraud detection

The most popular area for anomaly detection is the fraudulent activities related to the Internet or banking.

Since 2015, bank card protection against frauds has increased with the arrival of chip card technology that enables the use of a PIN-code every time a transaction is requested. And still, online credit card fraud is predicted to reach around $32 billion by 2020.

Since 2016, there have been numerous cases of cyber-attacks, threatening Internet businesses and commercial websites. Even large corporations such as Yahoo and Uber have suffered from online breaches. Around 3 billion Yahoo accounts have been hacked. Information on more than 57 million passengers and drivers has been stolen from Uber. Globally, Wannacry virus has infected more than 350,000 machines in around 150 countries and resulted in 4 billion dollar damages.

When it comes to credit card fraud detection or cybersecurity system breaches, Machine Learning experts can build intelligent Machine Learning models, classifying transactions into legit or fraudulent based on the transaction details, for e.g. merchant, amount, location, time, and others.

Our fraud detection algorithm for E-commerce transactions

In detecting anomalies with the help of Machine Learning we can take one two ways: a supervised or unsupervised one.
Supervised stands for working with data that has been labeled beforehand. For example, if there is a set of normal and anomalous logs, but these logs haven’t been marked as such, and one has to manually attach a “normal” or “anomalous” label to each of them, so that the algorithm can distinguish between them. The unsupervised method does not require any labeling — special algorithms will assume which data is malicious. For example, most internet connections are normal and only a small number of them are fraudulent, so more rare types of connections appear to be anomalous.

We have used anomaly detection for the identification of fraudulent transactions in project for a a U.S. financial services company. Our client offers various products and services that can be paid for by using Mobile Money (Airtel Money, MTN Mobile Money), a Bank Card (Visa Card, Master Card), a wallet, and an on-credit payment option (Pay Later). The problem was in a very small percentage of illegal transactions, passing through the company. As an input, we had data on 150,000 transactions that had occurred within several months.

As a regular flow, every Machine Learning project includes 3 stages:

Pre-processing (data collection and preparation),
Processing (training the model) and
Model fine-tuning/retraining.

When the project was set up, we encountered the problem of an imbalanced dataset. This means that there was a significant difference between the two classes of the observations the dataset contained.

Imbalanced data can be handled using 9 methods. The three most popular ones include over-sampling, under-sampling, and SMOTE. By trying them out we concluded that SMOTE worked best for the task at hand.

Development process

On average, there could be only around 0.1% of credit card fraud among 1000 transactions, which made the process of model training extremely imbalanced. We solved this problem by using under-sampling (randomly deleting normal transactions to minimize their number in comparison to fraudulent ones), over-sampling (duplicating the fraudulent samples to make many of them and balance their number with the normal ones), and synthetic sampling, or SMOTE (automatic generation of synthetic data samples on the basis of the existing ones). The latter method turned out to be the most effective one, as it increased the accuracy of our algorithm by 5%, which constituted 85% as a result.

After the Data Preparation step or balancing of the data, the next step was to try different classification approaches. We used the supervised techniques to classify data into one of the classes: fraudulent or normal transactions with Logistic Regression, KNN, SVM, or Decision Tree Classifier.

Among the unsupervised learning algorithms, we used One-Class SVM, Isolation Forest, Fitting and Local Outlier Factor. We used these algorithms along with the supervised ones to classify the transactions into two classes without labeling them.

Also, our project team used a Neural Network approach that involved supervised and unsupervised algorithms : LSTM and MLP (supervised) and Auto-Encoder (AE), Restricted Boltzmann Machine (RBM) and Generative Adversarial Networks (GAN).

As a result, we deployed a model that helped our client automatically prevent fraudulent transactions with the accuracy of 85% and block them.

Conclusion

Anomaly detection with Machine Learning is largely used for solving such problems, as cybersecurity breaches, online fraud detection and prevention, predictive maintenance and condition monitoring in various industries, including Manufacturing, E-commerce, Banking, Retail, Oil and Gas, Medicine.

Whether it deals with making credit card transactions or eliminating problems in the work of the device, the value of being able to detect different anomalies in a regular flow of various operations, is hard to overestimate. This is especially true for the prediction of unexpected anomalies that can be an important factor that influences the company’s income. Are you interested in Machine Learning development? Feel free to contact us!

Ready to speed up your Software Development?

Explore the solutions we offer to see how we can assist you!

Schedule a Call