Stacking | Ensemble Learning | Machine Learning Tutorial

Introduction

Stacking is a powerful ensemble learning technique used in machine learning to improve model performance by combining multiple base models. Unlike other ensemble methods like bagging and boosting, stacking involves training a meta-model to aggregate the predictions of several base models. The meta-model learns to make better predictions by interpreting the outputs of the base models.

Concept of Stacking

Stacking consists of two levels:

Level 0: This level consists of multiple base learners which are trained using the training data.
Level 1: This level consists of a meta-learner which is trained to combine the predictions of the base learners from Level 0.

The goal of the meta-learner is to learn how to best combine the base models' predictions to minimize the overall error.

How Stacking Works

The process of stacking can be summarized into the following steps:

Split the training data into K folds.
Train each base model on K-1 folds and validate it on the remaining fold.
Collect the predictions from each base model for the validation fold.
Repeat steps 2 and 3 for each fold.
Train the meta-learner using the collected predictions as features.
Make predictions on the test data using the trained base models and meta-learner.

Example: Stacking with Python

Let's see a practical example of stacking using Python and the popular machine learning library scikit-learn.

First, we need to import the necessary libraries:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

Next, we load the Iris dataset and split it into training and test sets:

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Now, we define the base models and the meta-learner:

# Define base models
base_models = [
    ('knn', KNeighborsClassifier(n_neighbors=3)),
    ('svm', SVC(kernel='linear', probability=True))
]

# Define meta-learner
meta_learner = LogisticRegression()

# Create Stacking Classifier
stacking_clf = StackingClassifier(estimators=base_models, final_estimator=meta_learner)

Finally, we train the stacking classifier and evaluate its performance:

# Train the stacking classifier
stacking_clf.fit(X_train, y_train)

# Evaluate the model
accuracy = stacking_clf.score(X_test, y_test)
print(f'Accuracy: {accuracy:.2f}')  # Output will be displayed in the console

Conclusion

Stacking is an effective ensemble learning method that leverages the strengths of multiple models to improve predictive performance. By using a meta-learner to combine the predictions from base models, stacking can often outperform individual models. It's a versatile technique that can be applied to various machine learning tasks and is a valuable tool in a data scientist's toolkit.

Stacking - Ensemble Learning

Introduction

Concept of Stacking

How Stacking Works

Example: Stacking with Python

Conclusion