【Practical Exercise】 Implementation of ARIMA Model for Time Series to Forecast Product Sales
发布时间: 2024-09-14 00:34:57 阅读量: 13 订阅数: 35
# Practical Exercise: Time Series ARIMA Model Implementation for Sales Forecasting
## 2.1 Principle and Steps of the ARIMA Model
### 2.1.1 Stationarity Test of Time Series
Before establishing an ARIMA model, it is necessary to conduct a stationarity test on the time series. Stationarity refers to the time series' mean, variance, ***mon stationarity test methods include:
- **Unit root test:** To determine if a time series contains a unit root, indicating non-stationarity, using the ADF (Augmented Dickey-Fuller) test or KPSS (Kwiatkowski-Phillips-Schmidt-Shin) test.
- **Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF):** Observing the decay rate of the autocorrelation coefficients in ACF and PACF graphs; slow decay indicates a stationarity issue with the time series.
### 2.1.2 Estimation and Selection of Model Parameters
The parameters of the ARIMA model include:
- **p:** The order of autoregression, indicating the linear relationship between the current value and the past p values in the time series.
- **d:** The order of differencing, indicating the number of times the time series needs to be differenced to achieve stationarity.
- **q:** The order of the moving average, indicating the linear relationship between the current value and the past q residuals in the time series.
Parameter estimation typically employs the maximum likelihood method, minimizing the sum of squared residuals to obtain the optimal parameters. Model selection can be performed using information criteria such as AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to select the optimal model.
# 2. ARIMA Model Theory and Practice
## 2.1 Principle and Steps of the ARIMA Model
### 2.1.1 Stationarity Test of Time Series
The stationarity of a time series refers to the constancy of the mean, variance, and autocorrelation coefficient over time. Stationarity testing is the foundation for establishing the ARIMA model, and common methods include:
- **ADF Test:** To test if a time series contains a unit root, indicating non-stationarity.
- **KPSS Test:** To test if a time series is stationary, indicating the absence of a unit root.
**Code Block:**
```python
import statsmodels.api as sm
# ADF Test
def adf_test(timeseries):
print('ADF Test Results:')
result = sm.tsa.stattools.adfuller(timeseries)
print('ADF Statistic: {}'.format(result[0]))
print('p-value: {}'.format(result[1]))
print('Critical Values:')
for key, value in result[4].items():
print('\t{}: {}'.format(key, value))
# KPSS Test
def kpss_test(timeseries):
print('KPSS Test Results:')
result = sm.tsa.stattools.kpss(timeseries)
print('KPSS Statistic: {}'.format(result[0]))
print('p-value: {}'.format(result[1]))
print('Critical Values:')
for key, value in result[3].items():
print('\t{}: {}'.format(key, value))
```
**Logical Analysis:**
Both the ADF and KPSS tests are based on the assumption of stationarity in a time series. The ADF test assumes that the time series has a unit root (non-stationarity), while the KPSS test assumes the opposite (stationarity). If the p-value from the ADF test is less than 0.05, the hypothesis of a unit root in the time series is rejected, suggesting stationarity; if the p-value from the KPSS test is greater than 0.05, the hypothesis of no unit root is rejected, suggesting non-stationarity.
### 2.1.2 Estimation and Selection of Model Parameters
The parameters of the ARIMA model include the autoregressive order (p), differencing order (d), and moving average order (q). Parameter estimation and selection typically follow these steps:
1. **Autocorrelation Analysis:** Analyze the autocorrelation coefficient plot and partial autocorrelation coefficient plot to determine the autoregressive order (p) and moving average order (q).
2. **Differencing Analysis:** Differ the time series to eliminate non-stationarity and determine the differencing order (d).
3. **Parameter Estimation:** Estimate model parameters using the maximum likelihood method.
4. **Model Selection:** Choose the optimal model based on the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).
**Code Block:**
```python
import pmdarima as pm
# Autocorrelation Analysis
def acf_pacf_plot(timeseries):
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6))
sm.graphics.tsa.plot_acf(timeseries, ax=ax1)
ax1.set_title('Autocorrelation Function')
sm.graphics.tsa.plot_pacf(timeseries, ax=ax2)
ax2.set_title('Partial Autocorrelation Function')
plt.show()
# Model Parameter Estimation
def arima_model(timeseries, p, d, q):
model = pm.auto_arima(timeseries, order=(p, d, q), seasonal=False)
print('ARIMA Model Summary:')
print(model.summary())
return model
# Model Selection
def model_selection(timeseries):
aic_values = []
bic_values = []
for p in range(0, 5):
for d in range(0, 3):
for q in range(0, 5):
model = pm.auto_arima(timeseries, order=(p, d, q), seasonal=False)
aic_values.append(***c())
bic_values.append(model.bic())
best_aic_model = pm.auto_arima(timeseries, order=np.argmin(aic_values), seasonal=False)
best_bic_model = pm.auto_arima(timeseries, order=np.argmin(bic_values), seasonal=False)
print('Best AIC Mode
```
0
0