Other Topics — Machine Learning Interview Questions
Introduction
Not interested in background on Linear Discriminant Analysis? Skip to the questions here.
In this article, you will learn the theory behind LDAs and how it differs from PCA in a simplified manner.
By the end of this read, you should
- Have understood the theory behind LDA
- Know the difference between LDA and PCA
- When and what to use LDAs for
- Be more confident to take on questions relating to LDA in interviews.
Let’s begin with understanding LDAs.
Article Overview
- What is Linear Discriminant Analysis?
- Why is Linear Discriminant Analysis Important?
- How do LDA Algorithms work?
- Linear Discriminant Analysis ML Interview Q&A
- Wrap Up
What is Linear Discriminant Analysis (LDA)?
LDA is an analytical method that finds the linear combination of features that best separates the data into various classes. In other words, it finds the best linear equation that clearly separates the data into classes. It’s one of the numerous discriminant analysis methods. LDA is also a dimensionality reduction method but its simplicity and robustness make it suitable for classification problems.
Why is Linear Discriminant Analysis important?
Some of the reasons LDA is important include:
- It is multifaceted and can handle multiple and different scenarios
- It can be used as a multi-class linear classifier unlike Logistic regression
- It can be used for dimensionality reduction of features
- It can be used for extracting features in face detection models.
How do LDAs work?
LDA works in a similar way to PCA. The aim of an LDA algorithm is to try to find the best linear combination that gives the maximum separation between the number of groups present.
It calculates the discriminant scores from a linear combination of weights and centred data points.These weights are extracted from eigenvectors. Unlike PCA, the eigenvectors are not calculated from a covariance matrix but from a matrix which is computed from the transpose of the distance between groups multiplied by the distance between groups.
Linear Discriminant Analysis ML Interview Questions/Answers
Try to answer them in your head before clicking the arrow to reveal the answer.
LDA is suitable for linear data, i.e datasets where a line can effectively separate the classes. You can also use it when you need a simple classifier that is easy to explain. It can also be used for dimensionality reduction.
Unlike Linear Discriminant Analysis (LDA), Primary Component Analysis (PCA) captures more variation between the data points but it does not separate them very well.
PCA is a linear combination that accounts for as much variability as possible. For LDA, it maximizes the separation between two or more groups i.e it tries to increase the distance between two or more groups.
The goal is for there to be little variance within the classes and more variance between the classes. LDA tries to reduce the distance between data points in the same class while increasing the distance between the data points two different classes. The ratio of the variance within classes and the variance between classes should be as large as possible to ensure separation. Simply, the means of the classes should be far away from each other and the data point should be close to the means.
Linear Discriminant Analysis computes its weights from the product of the inverse of the matrix-difference-within-groups and the matrix-difference-between-groups and then combines them into a linear combination to get the discriminant scores.
Linear Discriminant Analysis is a supervised learning method because it requires labeled data, in contrast to PCA which is an unsupervised method. It is a classifier so it needs to have predefined labels.
To estimate how much a variable contributes to the separation, we must inspect the standardized discriminant function coefficients. If the variable is associated with a relatively high weight, then that variable is better to separate the groups compared to the others.
LDA supports some classification metrics such as:
- Sensitivity: The ratio of the true positives to both true positivities and false negatives
- Specificity: The percentage of true negatives to both true negatives and false positives
- Accuracy: The ratio of the number of true positives and negatives compared to the whole result
- AUC: A graph that shows the overall performance of the classifier over all possible thresholds.
Sensitivity tries to find the number of times a positive class is correctly predicted while It is calculated as the total number of true positives divided by the number of true positives and false negatives.
Specificity on the other hand, finds the number of times a negative class is predicted. It is calculated as the total number of true negatives divided by the total number of true negatives and false positives.
- Split the data into training and validation
- Calculate the discriminant scores with the training data
- Determine an appropriate cut off value
- Use the test data to calculate discriminant scores using the pre-computed weights from the training data
- Evaluate the model using the metrics and make predictions.
LDA can be used to predict more than two groups, unlike some linear models. In the case of three groups, you’ll have two LDA equations with the first as the most distinguishing. If you have 4 groups, you’ll have 3 LDA equations, and so on.
LDAs are more suitable as classifiers but they can also be used for dimensionality reduction and feature extraction.
No, it cannot. Clustering is an unsupervised learning approach while LDA is a supervised learning algorithm. Unlike clustering, LDA is used when the classes or labels are known.
- It requires that your data is linear and is not applicable to nonlinear problems
- It assumes your data has a normal distribution.
- It doesn’t perform well with imbalanced data
Wrap up
Explainability is something you want to look out for if you’re concerned about how your model makes decisions and LDA helps you with that. LDA is a simple and explainable discriminant analysis algorithm that helps you with classification tasks, dimensionality reduction, and feature extraction. However, its biggest limitation is that it is not suitable for non linear data.