Other Topics — Machine Learning Interview Questions
Introduction
Not interested in background on Support Vector Machines (SVM)? Skip to the questions here.
As the world of Machine Learning evolves and takes leaps into newer and more complicated models, at times, it becomes evident that perhaps such advancement can be done in a different way. Support Vector Machines is one such way where something that looks basic can actually work wonders in the field of Artificial Intelligence. It works using the kind of logic that can be understood by someone who may not possess the abundance of technical knowledge the AI scientists of today’s world resonate with.
Henceforth, let us dive into Support Vector Machines, SVMs, and questions related to them that may just secure your job as someone who loves ML!
Article Overview
- What are Support Vector Machines?
- How do Support Vector Machines Work?
- Why do we need to Use Support Vector Machines?
- Support Vector Machines ML Interview Questions/Answers
- Wrap Up
What are Support Vector Machines?
These are supervised ML algorithms that can work on classification and regression problems. When it comes to classification problems, they incorporate a solution that finds a hyperplane that maximizes the margin between classes present in the training data. This results in a clear distinction between the data points present in the respective classes. For regression problems, it uses the exact same principle! The only difference is that we get the end result in numeric form rather than a classification.
How do Support Vector Machines Work?
SVMs work using the steps mentioned below:
- They map data to a high dimensional feature space for categorization even when it is not possible to distinguish the data points using a linear separator/model.
- Once the separator is found, the data is transformed in such a way wherein the separator can then be drawn as a hyperplane.
- Then, using the characteristics of the new data, predictions can be made which classify a data point to a particular class.
Why do we need to use Support Vector Machines?
We need SVMs due to the benefits we get from incorporating them in our ML models. One such benefit is that they can be used for not only linear classifications or regressions but also non-linear ones. This enables SVMs to decipher much more complex relationships between the data points in the data that we present to them without the onus being on us to perform the complicated transformations. This also results in SVMs being effective in high dimensional spaces, even when the dimensions are higher than the number of samples in the data.
Support Vector Machines ML Interview Questions & Answers
The above should have already got anyone excited regarding Support Vector Machines and their importance and use. Therefore it makes perfect sense for us to now transition onto interview questions related to them. Try to answer them in your head before clicking the arrow to reveal the answer
Support vectors are the instances located on the margin of the hyperplane in an SVM. For SVMS, the decision boundary is determined solely by using the support vectors. As a result, any instance that is not a support vector (not on the margin boundaries) has no influence on the decision boundary.
Their basic principle is finding the optimal hyperplane; the one that best distinguishes between the data points in the classes within the dataset.
The Kernel itself is a function capable of computing the dot product of instances mapped in higher dimension space without actually transforming all the instances into that higher feature space. This can be considered a trick that results in the entire process being much less computationally expensive than that actual transformation to calculate the dot product. This makes it useful and advantageous at the same time.
Decision boundaries are affected only by the support vectors; instances located on the margin of the SVM. Instances not lying on the margin of the hyperplane do not affect the decision boundary.
Hard Margin SVMs are those that work only if the data is linearly separable. They have a ‘hard’ constraint on them. Hence these types of SVMs are quite sensitive to outliers.
Soft Margin SVMs find a good balance between keeping the margins as large as possible while limiting the margin violation i.e. instances that end up in the middle of margin or even on the wrong side. Their constraint is ‘soft.’ However, by no means is it inaccurate or non-optimized.
By Hinge Loss, we refer to a function defined by max (0, 1 – t) which is called the Hinge Loss function. It is a loss function which penalizes the SVM model for inaccurate predictions.
Its properties are:
- Hinge loss = 0 when t>=1
- Hinge loss derivative (slope) = –1 if t < 1 and 0 if t > 1
- The loss is not differentiable at t = 1
- The loss penalizes the model for wrongly classifying the instances
Yes, SVMs are sensitive to Feature Scaling as they utilize input data to find the margins around hyperplanes. They also end up getting biased for the variance in high values.
A Polynomial Kernel function represents the similarity of vectors, in our case the training samples, in a feature space over polynomials of the original variables. This enables the learning of non-linear models.
The problem with Hinge loss is that it is not differentiable. Since we can’t take the derivative of it, we can’t find the gradient of the function. The technique to mitigate this issue is called Sub-Gradient Descent which in particular we can use PEGASOS (primal estimated sub-gradient solver) for SVMs.
Using the sub-gradient, we get a guaranteed minimum just like with classifiers like logistic regression. We would be able to update weights just like we would for gradient descent.
It helps us tune how much we want to penalize points lying either inside of our margin or complete misclassifications. It is recommended to use cross-validation to find out the best value of C
Margin is the space between the hyperplane and support vectors. In the case of soft margin Support vectors, margin includes the slack. Slack is the relaxing of the constraint that all example must lie outside the margin which creates the soft-margin SVM
For classification with SVM, our goal is to maximize the distance between our decision boundary and our support vectors (margin). For regression, the goal now is to keep all the points within the margin.
RBF which is the Radial Basis Function is a kernel function that represents a separate dimension per data point that we have. It allows our data to be linearly separable by assigning each data point a Gaussian distribution. It then traces a line of the sum of these Gaussian distributions (now interconnected). The points are projected on this line (still within its Gaussian distribution) and then we separate our data and project it back to a 1-dimensional space.
RBF is the default kernel used for many Support Vector Machine algorithms due to its robustness and efficiency.
We can create an SVM per class. The idea here is to be able to classify between a particular class and every other class. This paradigm is known as 1 versus rest.
SVM(x, y, z) -> SVM(x, y==z), SVM(y, x==z), SVM(z, x==y)
So to choose a prediction, we plug in samples from every SVM, we then measure the margin it produces within the classes. We predict the class that produces the largest margin between the sample and the other class.
To make the separator more robust, and fit it with the greatest margin possible. To use kernel methods to effectively estimate bigger feature spaces.
Wrap Up
Support Vector Machines prove to be a Machine Learning topic that deserves attention due to its practicality and usefulness – the benefits it provides. For example, implementing Logistic Regression instead of SVMs would prove to be much less efficient as well as costly. SVMs are also quite simple to understand even if you are not the most tech loving person. The logic behind their working is what makes them as relevant as they are in the world of Machine Learning!