8 Unique Machine Learning Interview Questions about Bias and Variance

Image showing the tradeoff between bias and variance using a dart board analogy

Introduction

Not interested in background on Bias and Variance? Skip to the questions here.

Data shapes the way we learn. It shapes what we learn. It shapes the quality of our knowledge. As the world continues to rely further and further on Artificial Intelligence, the data that goes into making those AI-based models deserves a level of significance and attention like never before. Henceforth, it becomes imperative for us to focus on ensuring many discrepancies in that data. With such intricate decisions being relied upon using the data at play, let us look at two aspects – Bias and Variance present in the data.

Article Overview

What is Bias and Variance?
How do Bias and Variance work?
Why does Bias and Variance need to be fixed?
Bias and Variance ML Interview Questions & Answers
Wrap Up

What is Bias and Variance?

In simple terms, Bias is the difference between the average prediction of our model and the correct value that we are trying to predict.

Variance is the counterpart of Bias. It is the variability of the model prediction for a given data point or a value that illustrates the spread of our data.

How do Bias and Variance Work?

The working of Bias is quite basic, and so is Variance.

Bias occurs when the training data contains points that are too based on one or more sets of features instead of having an even distribution. It is not spread out enough.

As for Variance, it occurs when the training has too many points based on varying features. It is perhaps spread out a bit too much in places where it shouldn’t be in the sense that a model has difficulty in treating those data points as anomalies.

Why does Bias and Variance need to be fixed?

Models based on training data with high Bias pay negligible attention to the training data and oversimplify the model. They always lead to high errors in the training and test data.

Models based on training data with a high variance place too much emphasis on the training data and do not generalize well on the data they haven’t seen before. Therefore, such models perform very well on the training data yet have a high error rate on the test data.

The above makes it imperative to get rid of the Bias and Variance in the data through preprocessing. Otherwise, the results will be extremely inaccurate and lead to crucial issues in the real world, with such delicate decisions being made on models based on such data.

Bias and Variance ML Interview Questions / Answers

We see from the above that Bias and Variance play quite the critical and necessary role in shaping the results of ML algorithms. Henceforth, it makes perfect sense to look into a few interview questions relating to them. Try to answer them in your head before clicking the arrow to reveal the answer

What impact do Bias and Variance in data have on Machine Learning models?

Bias usually causes ML algorithms to underfit the data. Henceforth, the trained model has high training and testing errors.

Variance usually results in ML algorithms overfitting the data. Therefore, the trained model exhibits a low error in training. However, it is bound to have a high error in testing.

Can Machine models overcome underfitting on biased data and overfitting on data with variance? Does this guarantee correct results?

Yes, they can. Underfitting can be overcome by utilizing ML models with a greater emphasis on the features – increasing the number of features or placing greater weight on the features at play (using higher degree polynomials, for example.)

As for overfitting, the reverse can be done to eradicate it.

This does guarantee plausible results in real life since they still may be based on data that has not been collected with the proper technique.

How can you identify a High Bias (Low Variance) model?

A High Bias model is due to a simple model and can simply be identified when the model contains:

A high training error
A validation error or test error that is the same as the training error

How can you fix a High Bias model?

To fix a High Bias model, a Data Scientist or ML Engineer can:

Add more input features
Add more complexity by introducing polynomial features
Decrease the regularization term

How can you identify a High Variance (Low Bias) model?

A High Variance model is due to a complex model and can simply be identified when the model contains:

A low training error
A validation error or test error that is high

How can you fix a High Variance model?

To fix a High Variance model, a Data Scientist or ML Engineer can:

Reducing the input features
Reducing the complexity by getting rid of the polynomial features
Increasing the regularization term

What is the Bias and Variance Tradeoff?

In situations where our model turns out to be too simple and has very few parameters, then it may exhibit high Bias and low Variance. Contrary to that, in circumstances where our model has a large number of parameters, then it’s going to show high Variance and low Bias. As a result, we must find the right/good balance without overfitting and underfitting the data.

The above tradeoff in complexity is why there is a tradeoff between Bias and Variance. This means that an algorithm can’t be more complex and less complex at the same time since increasing the Bias decreases the Variance, and increasing the Variance decreases the Bias.

As an example, in k-nearest neighbors, a small k results in predictions with high Variance and low Bias, whilst a large k results in predictions with a small Variance and a large Bias.

Would it be better if an ML algorithm exhibits a greater amount of Bias or a greater amount of Variance?

Either one does not have precedence over the other since they both lead to a model that gives inaccurate results, which could cause poor decision-making by the machine or humans at play.

Wrap Up

From all the above, we see that these topics are indeed very deserving of their importance in the Machine Learning world. It is a genuine problem that training data could contain Bias and Variance face value. This then translates onto the model that is being trained, and albeit the model may be able to fix the issue, it is imperative that the data has been collected and preprocessed properly!