Model Evaluation Metrics
Introduction
Model evaluation metrics are essential for assessing the performance of machine learning models. These metrics provide insights into how well the model is making predictions and help in comparing different models to select the best one. In this tutorial, we will cover various metrics used to evaluate models in supervised learning.
Confusion Matrix
A confusion matrix is a table that is often used to describe the performance of a classification model. It shows the true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).
Example:
Consider a binary classification problem where we are predicting whether an email is spam or not. The confusion matrix might look like this:
Predicted: Spam | Predicted: Not Spam | |
---|---|---|
Actual: Spam | TP | FN |
Actual: Not Spam | FP | TN |
Accuracy
Accuracy is the ratio of correctly predicted instances to the total instances. It is a useful metric when the class distribution is balanced.
Formula: (TP + TN) / (TP + TN + FP + FN)
Example:
If a model correctly predicts 90 out of 100 instances, the accuracy is:
Accuracy = (90) / (100) = 0.90 or 90%
Precision
Precision is the ratio of correctly predicted positive observations to the total predicted positives. It is useful when the cost of a false positive is high.
Formula: TP / (TP + FP)
Example:
If a model correctly predicts 50 spam emails out of 60 predicted spam emails, the precision is:
Precision = 50 / 60 = 0.83 or 83%
Recall
Recall (or Sensitivity) is the ratio of correctly predicted positive observations to the all observations in actual class. It is useful when the cost of a false negative is high.
Formula: TP / (TP + FN)
Example:
If a model correctly predicts 50 spam emails out of 70 actual spam emails, the recall is:
Recall = 50 / 70 = 0.71 or 71%
F1 Score
The F1 Score is the harmonic mean of precision and recall. It is useful when you need a balance between precision and recall.
Formula: 2 * (Precision * Recall) / (Precision + Recall)
Example:
If a model has a precision of 0.83 and recall of 0.71, the F1 Score is:
F1 Score = 2 * (0.83 * 0.71) / (0.83 + 0.71) = 0.77 or 77%
ROC Curve and AUC
The Receiver Operating Characteristic (ROC) curve is a graphical representation of a classifier's performance. The Area Under the Curve (AUC) provides an aggregate measure of performance across all classification thresholds.
ROC Curve: Plots TPR (True Positive Rate) vs FPR (False Positive Rate).
AUC: The greater the AUC, the better the model performance.
Mean Absolute Error (MAE)
Mean Absolute Error measures the average magnitude of errors in a set of predictions, without considering their direction.
Formula: MAE = (1/n) Σ |y_i - y_i'|
Example:
If the actual values are [3, -0.5, 2, 7] and the predicted values are [2.5, 0.0, 2, 8], the MAE is:
MAE = (1/4) * (|3-2.5| + |-0.5-0.0| + |2-2| + |7-8|) = 0.5
Mean Squared Error (MSE)
Mean Squared Error measures the average squared difference between actual and predicted values.
Formula: MSE = (1/n) Σ (y_i - y_i')^2
Example:
If the actual values are [3, -0.5, 2, 7] and the predicted values are [2.5, 0.0, 2, 8], the MSE is:
MSE = (1/4) * ((3-2.5)^2 + (-0.5-0.0)^2 + (2-2)^2 + (7-8)^2) = 0.375