17 Unique Machine Learning Interview Questions on Logistic Regression

Other Topics — Machine Learning Interview Questions

Introduction

Not interested in background on Logistic Regression? Skip to the questions here.

In preparing for your next Machine Learning Interview, one of the topics you certainly need to be familiar with is Logistic Regression. The days of Machine Learning taking over the world are well within their stride. This series of articles is meant to equip you with the knowledge you need to ace your ML interview and secure a top tier job in the field. Once you decide you join in the hunt to master this craft, companies will come after you to get you!

Article Overview

What is Logistic Regression in the First Place?

Logistic regression is a classification algorithm. More specifically it’s a binary classification problem. It can also be converted into a multi-class classification algorithm. It is originally adopted from statistics and implemented as a Machine Learning algorithm. Using nonlinearity, logistic regression classifies data points into classes. Additionally, it uses a logistic function to learn weights to classify data points in classes. The logistic function resembles an “S” shaped curve. This “S” shaped curve acts as the boundary of two classes. It is a supervised classification algorithm.  

How does Logistic Regression Work?

Logistic regression converts the data in a range of 0 to 1 to predict the class it belongs to. If the value is less than 0.5 then it belongs to class 0 and if the value is greater than 0.5 then it is classified as class 1. As the name suggests logistic regression is derived from its calculation which converts it into logits i.e. 0 or 1. 

Logistic regression gives the probability for the predicted class which lies between 0 and 1. The greater the prediction probability, higher the chance that it belongs to the predicted class. 

Why do We Need Logistic Regression?

It is a popular classification method used in Machine Learning. It is also used as the sigmoid activation function in Deep Learning. Also, it helps to generalize the model better by adding nonlinearity to the network. 

Logistic Regression ML Interview Questions

Some questions related to Logistic Regression important from interview point of view are shown below. Try to answer them in your head before clicking the arrow to reveal the answer.

Is Logistic regression a classification or regression model?

Logistic regression is a classification algorithm. It contains regression in its name but actually, it’s used for classifying data points into classes. It is named so because it predicts probability the probability for the data point belonging to a particular class. That value lies between 0 to 1 so it predicts a continuous value getting the name regression. 

Why Logistic regression is a classification algorithm despite “Regression” in its name?

It classifies the data point into the classes. It is a supervised classification algorithm as it will give a probability value that lies between 0 to 1. Instead of predicting discrete classes, it gives the probability of datapoint belonging to every class and the highest probability class is used as the predicted class.

What function is used in Logistic Regression?

Logistic regression used the Sigmoid function. It resembles to the English alphabet “S” and this S-shaped boundary is used to classify data into one of the classes. The sigmoid function is defined as:

What loss function is used in Logistic Regression?

Loss functions compatible with linear regression are not compatible with Logistic regression even when it predicts a continuous value like linear regression. It is due to the fact that the data belongs to a class that is a discrete value. It uses binary cross-entropy loss which is suitable for classifying data into classes. 

Cross-Entropy loss is also known as log loss is used in logistic regression problems because it returns a convex loss function in which finding the global minimum is easy to find.

What are the different types of Logistic regression?

The main three types of Logistic Regression are:

  • Simple Logistic regression
  • Multinomial Logistic regression and 
  • Ordinal Logistic Regression.

What is Multinomial Logistic Regression?

Multinomial regression predicts three or more classes as compared to simple logistic regression. It predicts unordered classes where ordering of classes does not matter or ordering doesn’t have any relation with each other. Like we can classify data into cat, dog, and bird, here the classes’ ordering doesn’t matter.

What is Ordinal Logistic Regression?

This also classifies the data point into various classes i.e. more than two classes. In ordinal Logistic Regression, ordering matters. Like ratings for music, movies etc. Let’s say on the scale of 1 to 5, 1 is the worst experience and 5 is the best experience and intermediate values denote average to a good experience. 

How do outliers affect logistic regression?

Logistic Regression is affected by outliers so we use the sigmoid function to deal with outliers. 

What is a Softmax function?

A softmax function is a generalised sigmoid such that it produces the probability among K classes (where K > 2). Softmax ensures the predicted probability sums up to 100%. Thus, it is easy to make classifications by finding the class with the most probability.

What are the benefits of Logistic regression?

It is easy to implement, understand and is computationally less expensive than other methods. It works well on linearly separable datasets. Its biggest advantage is that it provides probabilities for each class. The probability provides confidence of algorithm and this confidence level distinguishes it from other algorithms. It is also less prone to overfitting. 

  • Logistic Regression is simple to learn and requires little training.
  • It works well for simple datasets as well as data sets that are linearly separable. 
  • It does not make any assumptions on the distributions of classes in feature space, making it easy to implement, comprehend, and train.

What are the disadvantages of Logistic Regression?

One disadvantage is that it is sensitive to noise and overfitting. Additionally, It only works well with linear data, however; it doesn’t work well with data having non-linear characteristics. Also, Its interpretation is much difficult as its weights are based on multiplicative approach. 

Logistic Regression suffers from the problem of complete separation. If a factor can completely separate the classes, then the training stops. 

It can suffer from underfitting on complex datasets due to simplicity. 

Does Scaling have an effect on logistics regression?

The performance of logistic regression does not improve with data scaling. The reason is that, if there are predictor variables with large ranges that do not affect the target variable, a regression algorithm will make the corresponding coefficients small so that they do not affect predictions so much.

What are the basic assumptions that must be met for logistic regression?

These assumptions include 

  • Independence of errors
  • Linearity in the logit for continuous variables 
  • Absence of multicollinearity
  • Lack of strongly influential outliers

What is the difference between the outputs of the Logistic model and the Logistic function?

The Logistic model outputs the logits, i.e. log-odds; whereas the Logistic function outputs the probabilities.

What are the properties of the cost function for Logistic Regression?

  • The confident wrong predictions are penalized heavily
  • The confident right predictions are rewarded less

What optimization algorithms can we use to find the optimum parameters in logistic regression?

  • Gradient Descent
  • Conjugate Gradient 
  • BFGS – Broydon-Fletcher-Goldfarb-Shanno (algorithm)
  • L-BFGS algorithms: Limited-memory BFGS

What is the most preferable method to best fit the data in logistic regression?

To acquire the model coefficients that relate to the predictors and target, use Maximum Likelihood Estimation.

Wrap Up

Logistic regression is the first go-to algorithm for classification tasks. Despite its simplicity, it is widely used in industry as well as in academia. It can be applied to wide range of applications in multiple domains. This makes it a very popular Machine Learning approach and one you should definitely be familiar with!

 

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.