Unsupervised Learning | Machine Learning | Project Integration Tutorial

Introduction to Unsupervised Learning

Unsupervised learning is a type of machine learning where the model is not provided with labeled training data. Instead, it tries to find patterns and relationships in the data on its own. Unsupervised learning is typically used for clustering, association, and dimensionality reduction.

Clustering

Clustering is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. One of the most commonly used clustering algorithms is K-means clustering.

Example: K-means Clustering

import numpy as np
from sklearn.cluster import KMeans

# Sample data
X = np.array([[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]])

# K-means clustering
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.labels_)
print(kmeans.cluster_centers_)

Output:
[0 0 0 1 1 1]
[[ 1. 2.]
[10. 2.]]

Dimensionality Reduction

Dimensionality reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables. Principal Component Analysis (PCA) is a commonly used technique for this purpose.

Example: Principal Component Analysis (PCA)

from sklearn.decomposition import PCA

# Sample data
X = np.array([[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]])

# PCA transformation
pca = PCA(n_components=1)
X_reduced = pca.fit_transform(X)
print(X_reduced)

Output:
[[ -1.41421356]
[ 0.70710678]
[ -3.53553391]
[ 8.48528137]
[ 10.60660172]
[ 6.36396103]]

Association

Association rule learning is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness. Apriori algorithm is one of the popular algorithms used for this purpose.

Example: Apriori Algorithm

from mlxtend.frequent_patterns import apriori, association_rules
import pandas as pd

# Sample data
dataset = {'Milk': [1, 1, 0, 1, 0],
'Bread': [1, 0, 0, 1, 1],
'Butter': [0, 0, 1, 1, 1]}
df = pd.DataFrame(dataset)

# Apriori algorithm
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
print(rules)

Output:
Empty DataFrame
Columns: [antecedents, consequents, antecedent support, consequent support, support, confidence, lift, leverage, conviction]
Index: []

Conclusion

Unsupervised learning is a powerful tool for discovering hidden patterns and relationships in data. By leveraging techniques such as clustering, dimensionality reduction, and association rule learning, we can gain valuable insights and make informed decisions in various applications.

Unsupervised Learning Tutorial

Introduction to Unsupervised Learning

Clustering

Dimensionality Reduction

Association

Conclusion