ARIMA Models Tutorial
Introduction to ARIMA Models
ARIMA stands for AutoRegressive Integrated Moving Average. It is a class of models that explains a given time series based on its own past values (AR terms), the differenced past values (I terms), and past forecast errors (MA terms). ARIMA models are widely used in time series forecasting.
Understanding the Components of ARIMA
An ARIMA model is characterized by three parameters: (p, d, q).
- p: The number of lag observations included in the model (AR part).
- d: The number of times that the raw observations are differenced (I part).
- q: The size of the moving average window (MA part).
Step-by-Step Guide to Building an ARIMA Model
Step 1: Importing Libraries
import pandas as pd import numpy as np from statsmodels.tsa.arima.model import ARIMA import matplotlib.pyplot as plt import seaborn as sns sns.set()
Step 2: Load and Visualize the Data
For this tutorial, we'll use a sample time series dataset.
# Load dataset
data = pd.read_csv('sample_time_series.csv', index_col='Date', parse_dates=True)
# Visualize the data
plt.figure(figsize=(10, 6))
plt.plot(data)
plt.title('Sample Time Series Data')
plt.xlabel('Date')
plt.ylabel('Values')
plt.show()
Step 3: Stationarity Check
Before fitting an ARIMA model, we need to ensure that the time series is stationary. We can use the Augmented Dickey-Fuller (ADF) test for this purpose.
from statsmodels.tsa.stattools import adfuller
result = adfuller(data['Values'])
print('ADF Statistic:', result[0])
print('p-value:', result[1])
ADF Statistic: -3.123456 p-value: 0.012345
Step 4: Differencing the Data
If the data is not stationary, we need to difference it. Differencing can be done as follows:
# Differencing the data
data_diff = data.diff().dropna()
# Visualize the differenced data
plt.figure(figsize=(10, 6))
plt.plot(data_diff)
plt.title('Differenced Time Series Data')
plt.xlabel('Date')
plt.ylabel('Differenced Values')
plt.show()
Step 5: Fit the ARIMA Model
Now, we can fit the ARIMA model to the (stationary) time series data. For this example, we'll use ARIMA(1,1,1).
# Fit the ARIMA model model = ARIMA(data, order=(1, 1, 1)) model_fit = model.fit() # Summary of the model print(model_fit.summary())
SARIMAX Results
==============================================================================
Dep. Variable: Values No. Observations: 100
Model: ARIMA(1, 1, 1) Log Likelihood -120.000
Date: Tue, 01 Jan 2023 AIC 246.000
Time: 00:00:00 BIC 254.000
Sample: 01-01-2000 HQIC 249.000
- 12-31-2000
Covariance Type: opg
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 0.1234 0.056 2.200 0.028 0.013 0.234
ar.L1 -0.4567 0.123 -3.713 0.000 -0.698 -0.215
ma.L1 0.5678 0.098 5.791 0.000 0.375 0.760
sigma2 1.2345 0.234 5.278 0.000 0.776 1.693
==============================================================================
Step 6: Diagnostic Plots
We can use diagnostic plots to ensure that the residuals of the model are approximately normally distributed and uncorrelated.
# Diagnostic plots model_fit.plot_diagnostics(figsize=(15, 12)) plt.show()
Step 7: Forecasting
Finally, we can use the fitted ARIMA model to make forecasts.
# Forecasting
forecast = model_fit.forecast(steps=10)
print(forecast)
# Plot the forecast
plt.figure(figsize=(10, 6))
plt.plot(data, label='Original')
plt.plot(forecast, label='Forecast')
plt.title('Forecasting using ARIMA Model')
plt.xlabel('Date')
plt.ylabel('Values')
plt.legend()
plt.show()
