15 Unique Gradient Boosted Decision Trees Interview Questions

Other Topics — Machine Learning Interview Questions

Introduction

Not interested in background? Skip to the questions here.

The decision trees that we have all come to adore and abide by having their shortcomings. It is natural for an algorithm to suffer from that. However, we as Machine Learning/Artificial Intelligence Engineers or Data Scientists decided not to stop there itself, look at those short incomings right in the eye, and eventually conquer them by not giving up and changing the entire algorithm but building on top of it and bringing innovation to it to seal the crack that caused the issue. In doing so, we discovered the usual suspects, such as random forests, but today, we shall discuss GBDTs (Gradient Boosted Decision Trees). The name itself implies an optimization or enhancement of something initial, does it not?

Article Overview

What is a Gradient Boosted Decision Tree?

A gradient boosted decision tree is yet another algorithm based on ensemble learning. However, it utilizes boosting (obviously) instead of its counterpart, bagging. Therefore, weak learners are converted to strong learners in a GBDT. 

So, a GBDT is a decision tree based on the technique of producing an additive predictive model by combining decision trees that, of course, are weak predictors on their own. Therefore, GBDTs can be used for classifying or regressing, depending on the type of problem.

Isn’t that interesting?

How does a Gradient Boosted Decision Tree work?

Our beloved GDBTs work on the following steps, and it is a necessity that we study each of them properly to grasp an apt understanding of how they function:

Classification

  1. Initial Prediction
  2. Calculate Residuals
  3. Predict residuals by building a decision tree
  4. Obtain new probability
  5. Obtain new residuals
  6. Repeat steps 3 to 5 until the residuals converge to 0 or the number of iterations becomes equal to the required hyperparameter (number of estimators/decision trees) given
  7. Final Computation

Regression

  1. Calculate the average of the target label
  2. Calculate the residuals
  3. Predict residuals by building a decision tree
  4. Predict the target label using all the trees within the ensemble
  5. Compute the new residuals
  6. Repeat steps 3 to 5 until the residuals converge to 0 or the number of iterations becomes equal to the required hyperparameter (number of estimators/decision trees) given
  7. After training is done, use all the trees to make a final prediction as to the value of the target variable

Why do we need Gradient Boosted Decision Trees?

Well, we harken back to the exact reasoning why we decided to use Random Forests as the reason for GBDTs’ usage is pretty similar, get rid of overfitting. However, GBDTs move some steps further. Usually, GBDTs get a more accurate result than other models, and they are excellent with unbalanced data, making their usage a no-brainer!

Gradient Boosted Decision Trees ML Interview Questions/Answers

The machine learning questions on GBDT’s are listed below. Try to answer them in your head before clicking the arrow to reveal the answer

Wrap Up

Alright then, yet another excellent approach to resolving the everyday Machine Learning problems. The degree of control that can be exhibited with GBDTs, the way almost all their disadvantages can be taken care of using various maneuvers, the way they are able to deal with unbalanced data, and not to forget their excellent level accuracy, triumphant above most other models. All of these make their usage not only worthwhile but also a definite necessity in many real-world problems. Don’t you think?

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.