Quickly Master Locally Weighted Learning – ML Interview Q&A

Other Topics — Machine Learning Interview Questions

Introduction

Not interested in background on Locally Weighted Learning? Skip to the questions here.

Moving forth with our learning of newer or for that matter even older Machine Learning algorithms, it is a must to realize that even the smallest change could bring about a revolution. We could think of penalizing with biases when going through the gradient descent algorithm to prevent overfitting or too much variance. Yes, it sounds logical but we see where it can take us. Now, let us have a look at Locally Weighted Learning, LWL.

Article Overview

What is Locally Weighted Learning?

A prediction is made by employing an approximated local model located near the current point of interest in a class of function approximation techniques called “Locally Weighted Learning, LWL.” Regression is typically linked to it in machine learning. In fact, LWL is almost interchangeable with “Locally Weighted Regression” in ML. Locally weighted linear regression is a non-parametric, supervised learning approach. Additionally, there is no training phase for linear regression using LWL. The testing phase and/or while making predictions are when all the work is completed.

What is the Significance of the Locally Weighted Regression Algorithm?

Let’s say we have the above-depicted non-linear dataset. This kind of dataset wouldn’t completely fit a standard line. In this situation, we must present a locally weighted algorithm that can predict a value that is extremely close to the actual value of a certain query point. LWL really shines in this situation and demonstrates its value. We can divide the given dataset into smaller datasets using locally weighted regression, and then we can have numerous smaller lines that can match the smaller dataset. These various lines can then be combined to fit the complete dataset as shown below.

Locally Weighted Learning ML Interview Questions/Answers

So now that we know what Locally Weighted Learning is and why it is significant, it is natural to progress onto the interview questions regarding it. Try to answer them in your head before clicking the arrow to reveal the answer.

What Are The Advantages of Locally Weighted Learning?

Its advantages are the following:

  • Interestingly enough, LWL is a simple algorithm that works on the same idea of minimizing the least-squared error function.
  • It also can certainly give excellent results when we have non-linear data points, and in situations where features are less, for example, 2 or 3, and we want to incorporate all features in our analysis.
  • One of its best parts is that since it’s a non-parametric algorithm, there is no training phase.
  • Another major homecoming of LWL is that fitting non-linear datasets does not actually require fiddling manually with features.

What Are The Disadvantages of Locally Weighted Learning?

Its disadvantages are the following:

  • For obvious reasons, due to breaking down a larger dataset into multiple smaller datasets, LWL doesn’t work very well for high-dimensional data.
  • Uses local fitting of data points at high computational costs, again for a similar reason as stated in the first point above.
  • Based on the exact reasoning as the first point, it also turns out that LWL is very prone to the effect of outliers.

How Does Locally Weighted Linear Regression Work?

Locally weighted regression, as mentioned before, is a non-parametric learning algorithm. As a result, the amount of data you need to keep around to represent the hypothesis h(⋅) then grows with the size of the training set. Of course in case of locally weighted regression, this specific growth rate is linear.

As pleasing as it may seem to be using a non-parametric algorithm, one must keep in mind that they are not ideal if you have a massive dataset. This stems from the fact that you are forced to keep all of the data in computer memory just to make predictions.

When Should We Use Locally Weighted Learning?

Typically, consider linear regression first because it is the most basic algorithm in every way. However, if the linear regression is underfitting or, for example, if the data exhibits a non-linear connection, it will not be a valid model in that situation. Then, we can move on to locally weighted regression, which can, in some cases, produce promising results.

However, the dataset must preferably be small in dimensions and have a few thousand data points for optimal performance.

What Kernel Functions Can We Use In Locally Weighted Regression?

Well, in most cases, the gaussian kernel gives the best results. However, we can certainly also use other kernel functions like the tri-cubic kernel.

What Does The Bandwidth Parameter τ Do in Locally Weighted Learning?

The bandwidth parameter τ simply dictates the width of the neighborhood you should look into to fit the local straight line. τ dictates how quickly the weight of a training example x(i) falls off with distance from the query point x. Depending on τ, you choose a fatter or thinner bell-shaped curve, which causes you to look in a bigger or narrower window respectively. This in turn decides the number of nearby examples to use in order to fit the straight line.

What Type of Parameter is the Bandwidth Parameter τ? 

It is a hyper parameter of the locally weighted regression algorithm.

What Effect Does τ Have On Overfitting and Underfitting A Dataset?

τ has a clearly defined effect on overfitting and underfitting:

  • If τ is too broad, then you over-smooth the data which leads to underfitting.
  • If τ is too thin, then you fit a very jagged and variable fit to the data which of course leads to overfitting.

Wrap Up

As we see from the above, Locally Weighted Learning, LWL, actually is quite beneficial in the world of Machine Learning. Its application or use case makes excellent and suitable practical sense. We may not see it being used as often as we would like, however its idea is technically incorporated in ML algorithms if one looks close enough. For example, the way it does not have any training can be seen in K Nearest Neighbors which is also a non-parametric algorithm. Therefore, it is certainly good to have knowledge on it as well!

Avi Arora
Avi Arora

Avi is a Computer Science student at the Georgia Institute of Technology pursuing a Masters in Machine Learning. He is a software engineer working at Capital One, and the Co Founder of the company Octtone. His company creates software products in the Health & Wellness space.