基于ARIMA模型对第三产业GDP进行分析预测并给出代码
时间: 2023-12-25 10:04:02 浏览: 88
首先,我们需要先导入一些必要的库,包括pandas、matplotlib、statsmodels等。代码如下:
```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
```
接下来,我们可以通过pandas读取第三产业GDP数据,并且对数据进行预处理和可视化。
```
df = pd.read_csv('third_industry_GDP.csv')
df = df.fillna(method='ffill')
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.plot(figsize=(12,8))
plt.show()
```
接下来,我们可以对数据进行平稳性检验,如果数据不平稳,我们需要对数据进行差分处理。
```
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
#Determing rolling statistics
rolmean = timeseries.rolling(window=12).mean()
rolstd = timeseries.rolling(window=12).std()
#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
#Perform Dickey-Fuller test:
print('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)
test_stationarity(df['GDP'])
```
如果数据是非平稳的,我们可以通过差分操作来使其平稳。在这里,我们使用一阶差分(即对原数据做一次差分):
```
df_diff = df.diff().dropna()
test_stationarity(df_diff['GDP'])
```
接下来,我们可以使用ACF图和PACF图来选择ARIMA模型的参数。
```
plot_acf(df_diff, lags=20)
plot_pacf(df_diff, lags=20)
plt.show()
```
根据ACF和PACF图,我们可以选择p=2,q=1来训练ARIMA模型。接下来,我们可以使用ARIMA模型来对数据进行训练和预测。
```
model = ARIMA(df, order=(2,1,1))
results = model.fit(disp=-1)
plt.plot(df_diff)
plt.plot(results.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results.fittedvalues-df_diff['GDP'])**2))
plt.show()
```
最后,我们可以使用ARIMA模型来对未来的第三产业GDP进行预测。
```
forecast = results.forecast(steps=12)[0]
print(forecast)
```
完整代码如下:
```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima_model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
df = pd.read_csv('third_industry_GDP.csv')
df = df.fillna(method='ffill')
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df.plot(figsize=(12,8))
plt.show()
def test_stationarity(timeseries):
#Determing rolling statistics
rolmean = timeseries.rolling(window=12).mean()
rolstd = timeseries.rolling(window=12).std()
#Plot rolling statistics:
orig = plt.plot(timeseries, color='blue',label='Original')
mean = plt.plot(rolmean, color='red', label='Rolling Mean')
std = plt.plot(rolstd, color='black', label = 'Rolling Std')
plt.legend(loc='best')
plt.title('Rolling Mean & Standard Deviation')
plt.show(block=False)
#Perform Dickey-Fuller test:
print('Results of Dickey-Fuller Test:')
dftest = adfuller(timeseries, autolag='AIC')
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print(dfoutput)
test_stationarity(df['GDP'])
df_diff = df.diff().dropna()
test_stationarity(df_diff['GDP'])
plot_acf(df_diff, lags=20)
plot_pacf(df_diff, lags=20)
plt.show()
model = ARIMA(df, order=(2,1,1))
results = model.fit(disp=-1)
plt.plot(df_diff)
plt.plot(results.fittedvalues, color='red')
plt.title('RSS: %.4f'% sum((results.fittedvalues-df_diff['GDP'])**2))
plt.show()
forecast = results.forecast(steps=12)[0]
print(forecast)
```
阅读全文