8 Unique Machine Learning Interview Questions about Anomaly Detection

Introductory visual of points in 3d space with anomalous points being denoted in red.

Other Topics — Machine Learning Interview Questions

Introduction

Not interested in background on Anomaly Detection? Skip to the questions here.

The biggest thing holding Machine Learning models back in this day and age is the presence of errors. So as time has evolved, models have too. Various optimizations have been either made to the older models, or the older models themselves have been replaced with newer ones. One of the biggest problems that occur in making models accurate is giving them the ability to detect anomalies.

Article Overview

What Does Anomaly Detection Mean?

Anomaly detection means being able to recognize outliers in a dataset. Outliers or anomalies are those data points that do not follow the general trend of the rest of the dataset. In ML, anomaly detection refers to models being able to distinguish between these data points and the others so that they are neither trained on them nor confused by them in practical life scenarios.

Graphical example of anomaly detection on a line chart.

Why is Anomaly Detection Important?

Anomaly detection is significant since it assists in removing outliers from a dataset. ML models must be able to recognize outliers so that they do not get trained on them. The reason for this is that outliers can erroneously skew the results of an ML model. Henceforth, decisions based on such an ML model could end up producing poor data analysis and damaging a company’s sales or a robot’s working, for example. Therefore,  anomalies can produce a make-or-break instance in the lifecycle of a business or perhaps hurt someone.

Anomaly Detection ML Interview Questions/Answers

Now that we know what anomaly detection is, along with its importance, let us have a look at interview questions related to it. It is pretty evident that questions related to it will indeed be asked. Try to answer them in your head before clicking the arrow to reveal the answer.

What are the differences in Anomalies for Uniform Distribution and Normal Distribution in One-Dimensional Data?

In uniform distribution, the mean and standard deviation merely characterize the range of values. A possible indication of anomalous behavior could be that a small neighborhood contains substantially fewer or more data points than expected from a uniform distribution.

In a normal distribution, the empirical rule, which states that 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, is followed. Hence, a threshold (such as 3 times the standard deviation) is chosen, and points beyond that distance from the mean are declared to be anomalous.

What are Some Types of Anomalies? Give a use case for Anomaly Detection.

They can be placed in the following types:

1) Point anomalies: which are also known as global anomalies and refer to a single instance of data being anomalous if it’s too far off from the others.

Use case: for detecting credit card fraud based on “amount spent.”

2) Contextual anomalies: which are also known as conditional anomalies and consist of an abnormality that is context-specific. Such anomalies occur commonly in time-series data.

Use case: where $100 being spent on food every day during the holiday season is normal, but maybe odd otherwise.

3) Collective anomalies: refers to the types of anomalies that exist as a set of data points that are anomalous to the entire dataset.

Use case: when someone tries to copy data from a remote machine to a local host in an uncalled fashion, an anomaly would be flagged as a potential cyber attack.

Explain the Three Types of Outlier Detection.

The three types of outlier detection are:

1) Supervised: which requires completely labeled training and testing datasets. An ordinary classifier is trained first and applied afterward.

2) Semi-supervised: this utilizes both training and test datasets, where training data only consists of normal data without any outliers. A model of the normal class is learned, and outliers can then be detected if they deviate from that model.

3) Unsupervised: which simply does not require any labels, and there is no distinction between training and test datasets here. Data is scored solely based on the intrinsic properties of the dataset.

Explain the Three Fundamental Approaches to Detect Anomalies.

The three approaches to detect anomalies are:

1) By Density – Normal points occur in dense regions, while anomalies occur in sparse regions

2) By Distance – Normal point is close to its neighbors, and the anomaly is far from its neighbors

3) By Isolation – The term isolation means ‘separating an instance from the rest of the instances.’ Since anomalies are ‘few and different’ and therefore they are more susceptible to isolation.

Can K-means be Used to Find Outliers?

Not it cannot. since it is not built for that purpose. It will end up giving a solution that minimizes the total within-cluster sum of squares, and the outliers will not necessarily define their own cluster.

What’s the difference between Normalization and Standardization?

Normalization is a process that rescales the values into a range of 0 to 1. The outliers from the data set are therefore lost.

Standardization is a process that rescales data to have a mean (μ) of 0 and a standard deviation (σ) of 1 (unit variance). This, therefore, retains the outliers and is recommended for most applications.

What is the best algorithm for Anomaly Detection?

One of the best algorithm for this use case is the Support Vector Machine algorithm. They have a shorter training time and better accuracy than the other algorithms. However, arguments can be made for other algorithms, especially given different constraints.

Name Other Algorithms for Anomaly Detection.

Some other algorithms that can be used for anomaly detection are:

1) Neural Networks

2) K nearest neighbor

3) Local Outlier Factor

Wrap Up

Anomaly detection is clearly a very vital part of Machine Learning. It plays a role during the training and making of every model and algorithm. ML models would not be successful at all if they would not have found ways to cater for anomalies, and their accuracy would never have reached a safe enough level in any application or any insight prediction. 

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.