Swiftorial Logo
Home
Swift Lessons
Matchups
CodeSnaps
Tutorials
Career
Resources

Supervised Learning - Comprehensive Tutorial

Introduction to Supervised Learning

Supervised learning is a type of machine learning where the algorithm is trained on labeled data. This means that each training example is paired with an output label. The goal of supervised learning is to learn a mapping from inputs to outputs that can be used to predict the labels of new, unseen examples.

Types of Supervised Learning

Supervised learning can be divided into two main types:

  • Classification: The task is to predict discrete labels. For example, classifying emails as spam or not spam.
  • Regression: The task is to predict continuous values. For example, predicting the price of a house based on its features.

Key Concepts

Here are some key concepts in supervised learning:

  • Training Data: The dataset used to train the model.
  • Test Data: The dataset used to evaluate the model's performance.
  • Features: The input variables used to make predictions.
  • Labels: The output variable that the model is trying to predict.
  • Model: The mathematical representation of the relationship between features and labels.

Example: Linear Regression

Linear regression is a simple algorithm used for regression tasks. It models the relationship between the input features and the continuous output label by fitting a linear equation to the observed data.

Here is an example of linear regression in Python using scikit-learn:

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Generate some synthetic data
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

# Create and train the model
model = LinearRegression()
model.fit(X, y)

# Make predictions
X_new = np.array([[0], [2]])
y_predict = model.predict(X_new)

# Plot the results
plt.scatter(X, y)
plt.plot(X_new, y_predict, color='red')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression')
plt.show()

The code above generates some synthetic data, trains a linear regression model on the data, and then plots the results. The red line represents the fitted linear model.

Example: Classification with Decision Trees

Decision trees are a popular algorithm for classification tasks. They work by splitting the data into subsets based on the value of input features, aiming to create subsets that are as homogeneous as possible with respect to the output label.

Here is an example of classification using a decision tree in Python with scikit-learn:

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the model
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

The code above loads the iris dataset, splits it into training and test sets, trains a decision tree classifier on the training data, makes predictions on the test data, and calculates the accuracy of the model.

Evaluation Metrics

Evaluating the performance of a supervised learning model is crucial. Here are some common metrics used for evaluation:

  • Accuracy: The proportion of correctly classified instances (for classification tasks).
  • Precision: The proportion of true positive predictions among all positive predictions.
  • Recall: The proportion of true positive predictions among all actual positives.
  • F1 Score: The harmonic mean of precision and recall.
  • Mean Squared Error (MSE): The average of the squared differences between the predicted and actual values (for regression tasks).
  • R-squared: The proportion of variance in the dependent variable that is predictable from the independent variables (for regression tasks).

Conclusion

Supervised learning is a powerful and widely-used approach in machine learning. By understanding the principles of supervised learning, and by practicing with actual datasets and algorithms, you can build models that make accurate predictions and provide valuable insights from data.