Other Topics — Machine Learning Interview Questions
Introduction
Not interested in background on Naive Bayes? Skip to the questions here.
In preparing for your next Machine Learning Interview, one of the topics you certainly need to be familiar with is Naive Bayes. This algorithm is incredibly useful for modeling probabilities and how distinct events are related to each other! The days of Machine Learning taking over the world are well within their stride, so it is important to have a solid grasp on concepts such as this one.
This series of articles is meant to equip you with the knowledge you need to ace your ML interview and secure a top tier job in the field. Once you decide you join in the hunt to master this craft, companies will come after you to get you!
Article Overview
What is Naive Bayes?
Naive Bayes is based on the mathematical concept of the Bayes theorem as the name suggests. It is a collection of multiple algorithms which are based on the common idea of using Bayes theorem. These algorithms assume that all the predictors are independent of each other and do not affect each other. All the features contribute independently in calculating the probability for that class. That’s why it’s called naive.
It is a supervised classification algorithm. Naive Bayes also assumes that all the features have an equal effect on the outcome.
How does Naive Bayes work?
It calculates two probabilities: the probability for each class and the conditional probability for each class according to some condition. All these probabilities are calculated for the training data and after training, new data points can be predicted using Bayes theorem. Naive Bias can also be trained in a semi-supervised manner using a mixture of labeled and unlabelled dataset.
Naive Bayes ML Interview Questions & Answers
Let’s go over some interview questions on Naive Bayes. Try to answer them in your head before clicking the arrow to reveal the answer.
Naive Bayes is based on Bayes theorem in statistics. It calculates probabilities independently for each class based on conditions and without conditions and then predicts outcomes based on that.
Multinomial Naive Bayes
Bernoulli Naive Bayes
Gaussian Naive Bayes
It is a classification algorithm. Naive Bayes is a supervised learning algorithm but it can also be trained as semi-supervised learning algorithm.
It works better than simple algorithms like logistic regression etc. It also works well with categorical data and with numerical data as well. Additionally, It is very easy and fast to work with the Naive Bayes classifier. Complex and high dimensional data is well suited for Naive Bayes classifier. It can also be trained using a small labeled dataset with semi-supervised learning.
In other words:
• It performs well with both clean and noisy data.
• Training takes a few samples, but the fundamental assumption is that the training dataset is a genuine representation of the population.
• Obtaining the likelihood of a forecast is simple.
Naive Bayes classifiers suffer from “Zero Frequency” problem. This happens when a category is not present in the training set. It will give it 0 probability.
Its biggest downside is the consideration of features as independent of each other because in real life it is impossible to get independent features. All features are somehow co-related with each other.
Naive Bayes classifier is a very powerful technique. It is applied in various classification techniques which are used for real-time prediction. The algorithm is also widely used in NLP tasks like sentiment analysis of text sentences, applying spam filtering, text classification etc. It is also used to make recommendation systems and for collaborative filtering.
Naive Bayes is a generative classifier. It learns from the actual distribution of the dataset by performing operations on it. It does not create a decision boundary to classify data.
Here, P(A) & P(B) are independent probabilities of event A and B,
P(A|B) = probability of event A given B is true,
P(B|A) = probability of event B given A is true.
Probability of event A given event B is true, P(A|B), is called as posterior probability and independent probability of event A, P(A), is called as prior probability i.e.
P(A|B) = posterior probability
P(A) = prior probability
The probability of event B given that event A is true is called likelihood and the independent probability of event B is called evidence.
P(B|A) = likelihood
P(B) = evidence
Probability of event A given that event B is true (Posterior) can be found using the expression:
Posterior = Likelihood * Prior / Evidence
We might encounter the zero division error when the probability for a particular scenario in the numerator is zero. To mitigate this, we can use Laplace Smoothing which basically adds a number to the numerator and another number to the denominator.
This is a distribution that evaluates a particular outcome as binary. For example, in the Bernoulli Naïve Bayes classifier, given a word, that word can either be in a message or not.
- For the categorical features, we can estimate our probability using a distribution such as multinomial or Bernoulli.
- For the numerical features, we can estimate our probability using a distribution such as Normal or Gaussian.
If the training data is smaller or if the dataset has fewer number of observations (samples) and a high number of features. Naïve Bayes works well on this data because of its High bias – Low variance trade off.
A Generative Model explicitly models each class’s underlying distribution. It learns the joint probability distribution given a probabilistic interaction
i.e. P(message, spam) = P(spam) * P(message|spam)
Where both P(spam) and P(message|spam) can be estimated from the dataset by computing class frequencies. An example of a generative model would be Naive Bayes.
A Discriminative Model models the decision boundary between the classes by learning the conditional probability distribution P(spam|message) from the dataset. An example of a generative model would be Logistic Regression.
Wrap Up
Naive bias is mostly used as a baseline model to compare multiple model performances if it doesn’t perform to the expected level. It doesn’t produce a complex model yet it performs well on high dimensional datasets. It can also be trained either in supervised or semi supervised fashion which makes it very flexible to train and use.