Association Rule Learning | Unsupervised Learning

Introduction

Association Rule Learning is a key technique used in unsupervised machine learning to discover interesting relationships, patterns, and associations among a set of items in large datasets. It is widely used in market basket analysis, web usage mining, and bioinformatics.

Basic Concepts

Association Rule Learning involves the following basic concepts:

Itemset: A collection of one or more items.
Support: The frequency or occurrence of an itemset in the dataset.
Confidence: The likelihood that a rule is correct.
Lift: The ratio of the observed support to that expected if the itemsets were independent.

Apriori Algorithm

One of the most popular algorithms for extracting association rules is the Apriori algorithm. The Apriori algorithm operates in two main steps:

Identify frequent itemsets in the dataset using a minimum support threshold.
Generate association rules from these frequent itemsets using a minimum confidence threshold.

Example

Let's walk through a simple example of using the Apriori algorithm with a sample dataset.

Consider the following transactions:

Transaction 1: {Milk, Bread} 
Transaction 2: {Milk, Bread, Butter} 
Transaction 3: {Bread, Butter} 
Transaction 4: {Milk, Butter}

Assume a minimum support threshold of 50% and a minimum confidence threshold of 80%.

Step-by-Step Implementation

Step 1: Identify Frequent Itemsets

Calculate the support for each itemset:

{Milk} = 3/4 = 75%
{Bread} = 3/4 = 75%
{Butter} = 3/4 = 75%
{Milk, Bread} = 2/4 = 50%
{Milk, Butter} = 2/4 = 50%
{Bread, Butter} = 2/4 = 50%

Frequent itemsets with support ≥ 50%:

{Milk}, {Bread}, {Butter}, {Milk, Bread}, {Milk, Butter}, {Bread, Butter}

Step 2: Generate Association Rules

Generate rules from the frequent itemsets and calculate their confidence:

Rule: {Milk} -> {Bread}
Confidence: support({Milk, Bread}) / support({Milk}) = 50% / 75% = 66.7%

Rule: {Milk} -> {Butter}
Confidence: support({Milk, Butter}) / support({Milk}) = 50% / 75% = 66.7%

Rule: {Bread} -> {Butter}
Confidence: support({Bread, Butter}) / support({Bread}) = 50% / 75% = 66.7%

Using Python for Association Rule Learning

We can implement the Apriori algorithm in Python using the mlxtend library. Here's a step-by-step guide:

Step 1: Install the Required Libraries

pip install mlxtend

Step 2: Load and Prepare the Data

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

transactions = [['Milk', 'Bread'],
                ['Milk', 'Bread', 'Butter'],
                ['Bread', 'Butter'],
                ['Milk', 'Butter']]

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)
print(df)

   Bread  Butter   Milk
0   True   False   True
1   True    True   True
2   True    True  False
3  False    True   True

Step 3: Apply the Apriori Algorithm

from mlxtend.frequent_patterns import apriori, association_rules

frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print(frequent_itemsets)

   support        itemsets
0     0.75         (Bread)
1     0.75        (Butter)
2     0.75          (Milk)
3     0.50  (Bread, Butter)
4     0.50     (Milk, Butter)
5     0.50     (Milk, Bread)

Step 4: Generate Association Rules

rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.6)
print(rules)

   antecedents  consequents  ...  lift  leverage  conviction
0      (Bread)      (Butter)  ...  1.0       0.0         1.0
1      (Butter)      (Bread)  ...  1.0       0.0         1.0
2       (Milk)      (Butter)  ...  1.0       0.0         1.0
3      (Butter)       (Milk)  ...  1.0       0.0         1.0
4       (Milk)       (Bread)  ...  1.0       0.0         1.0
5      (Bread)       (Milk)  ...  1.0       0.0         1.0

Conclusion

Association Rule Learning is a powerful technique for discovering interesting relationships in large datasets. By understanding the basic concepts and algorithms like Apriori, you can uncover valuable insights and patterns that can inform decision-making in various fields, from retail to healthcare.