Confusion Matrix Tutorial
Introduction
In the realm of machine learning, evaluating the performance of a classification model is crucial. One of the most common and informative tools for this purpose is the confusion matrix. In this tutorial, we will explore what a confusion matrix is, how to interpret it, and how to create one using Python.
What is a Confusion Matrix?
A confusion matrix is a table used to describe the performance of a classification model on a set of test data for which the true values are known. It allows visualization of the performance of an algorithm and helps in understanding the types of errors being made.
Structure of a Confusion Matrix
The confusion matrix is a 2x2 table for binary classification problems, with the following structure:
Predicted: Yes | Predicted: No | |
---|---|---|
Actual: Yes | True Positive (TP) | False Negative (FN) |
Actual: No | False Positive (FP) | True Negative (TN) |
Here:
- True Positive (TP): The model correctly predicts the positive class.
- False Negative (FN): The model incorrectly predicts the negative class.
- False Positive (FP): The model incorrectly predicts the positive class.
- True Negative (TN): The model correctly predicts the negative class.
Metrics Derived from the Confusion Matrix
The confusion matrix can be used to calculate several important metrics:
- Accuracy: \( \frac{TP + TN}{TP + TN + FP + FN} \)
- Precision: \( \frac{TP}{TP + FP} \)
- Recall: \( \frac{TP}{TP + FN} \)
- F1 Score: \( \frac{2 \cdot (Precision \cdot Recall)}{Precision + Recall} \)
Example: Creating a Confusion Matrix in Python
Let's see how to create a confusion matrix in Python using the scikit-learn
library:
import numpy as np from sklearn.metrics import confusion_matrix from sklearn.metrics import ConfusionMatrixDisplay # Example data y_true = np.array([0, 1, 0, 1, 0, 1, 0, 1]) y_pred = np.array([0, 0, 0, 1, 0, 1, 1, 1]) # Compute confusion matrix cm = confusion_matrix(y_true, y_pred) # Display confusion matrix disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=[0, 1]) disp.plot()
The above code will generate the following confusion matrix:

In this example, the confusion matrix shows the counts of true positives, false positives, true negatives, and false negatives for a binary classification problem.
Conclusion
Understanding and using a confusion matrix is essential for evaluating the performance of classification models. It provides insights into the types of errors your model is making and helps you to improve its accuracy. By leveraging tools like Python's scikit-learn
, you can easily create and interpret confusion matrices, leading to better model evaluation and refinement.