Other Topics — Machine Learning Interview Questions
Article Overview
- What are Association Rule Learning Algorithms?
- Why are Association Rule Learning Algorithms important?
- How do Association Rule Learning Algorithms work?
- Association Rule Learning ML Interview Q&A
Not interested in background on Association Rule Learning? Skip to the questions here.
What is an Association Rule Learning Algorithm?
The association rule learning algorithm is a rule-based machine learning approach to find patterns from items that are dependent on one another and map the connections between them. It is usually used in a large database to find interesting relationships in how and why two items are connected. Association rule learning algorithm finds application in many real-life scenarios.
For instance, companies use it to understand consumer behavior and put tailor-suited products in front of their customers. Association rule learning algorithm detects patterns from past sales which are used to present other related products to customers.
Association rule learning algorithms can also called association rule mining algorithms.
Why is Association Rule Learning Algorithm important?
The association rule learning algorithm is particularly useful in predicting behaviors and relationships between variables in a dataset. It is useful for classification and discovering patterns within data. It also helps to find patterns that explain the correlation between features in a dataset.
How do Association Rule Learning Algorithms work?
Association rule learning algorithms work like conditional statements (ex, if A then B). In this case, A is called the antecedent while B is called the consequent.
A, or the antecedent can be an item in your data while B is the result of the combination of antecedent(s). B could take the form of an action, such as signaling a customer is likely to be a repeat buyer, or B could be another item in the dataset. The algorithm differentiates random transactions from important patterns by using metrics like support, confidence and lift.
Support is the number of times an item was present in a transaction. Confidence is the number of times an item has been combined in a transaction. While Lift is used to compare the number of times a rule was supposed to be obeyed to the number of times it actually obeyed.
Association rule learning algorithms work in two simple steps. First, all frequent items in the dataset are found. Then, association rules from the frequent itemsets are generated using the support, confidence and lift threshold.
Association Rule Learning Algorithms ML Interview Question/Answer
Apriori is an algorithm that finds all frequent items set in a dataset. It finds items that are frequently transacted together whose support and confidence are above the minimum threshold. In scenarios where there are so many items, Apriori helps with defining the rules for these items.
The future pattern (FP) growth algorithm tries to find the most frequent itemset using a depth-first tree method while Apriori uses a breath first and hash tree approach. It’s also much faster than Apriori which looks at the data one at a time.
- Apriori algorithm: Used to generate association rules
- Eclat (Equivalent Class Transformation): Uses the current generated itemset to learn frequent itemset in the data.
- FP Growth: Frequent Pattern algorithms aim to extract frequent patterns from the data set using a tree-based approach.
- Support shows how frequently an item appears in a dataset and it’s useful in understanding how items connect to the whole dataset.
For a simple rule where A implies B, it can simply be calculated as the total number A and B to the total number of transactions altogether.
- Confidence is a bit different from support in that it finds how an item connects to other items in the dataset. It is simply trying to filter out how many times another item B occurs when Item A has occurred. It works like conditional probability and it’s usually expressed in percentages.
In a rule where A implies B, it can be calculated as the ratio of the number of transactions containing A and B to the total number of transactions containing A alone.
- Lift tells us the strength of a rule over every random choice.
For a rule where A implies B, it is calculated as the support of A and B to the product of the support of A and the support of B. If the lift has a value less than 1, it means that one item is only a substitute for the other and that they can’t be bought together. If the lift has a value greater than 1, it shows you the extent to which one item depends on the other.
Market basket analysis is a technique used by companies to find associations between products so that they can generate more revenue by presenting various related products to the customer
Frequent itemset is a set of items whose support and confidence obey the minimum threshold rules.
Pruning helps to remove rules that are below the minimum threshold you set. It helps you filter out irrelevant rules.
The algorithms are widely used in web data mining, intrusion detection, continuous production, and bioinformatics.
- Setting the parameter and threshold is a challenge
- It can involve finding too many rules, of which some are irrelevant and thus lead to low / lower performance of the model.
The difference between association rule and sequence mining is that the association rule does not consider the order (or arrangement) of the information, unlike sequence mining where order matters.
Wrap Up
You’ve learnt about different association rule algorithms and the way they differ from the Apriori algorithm in terms of speed and approach. You’ve also seen the areas it can be applied to and its limitations. If you try out the algorithm on a large dataset, you will observe how an item strongly correlates to other items in the data.