使用了COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University提供的新冠疫情数据写一个代码,使用python实现新冠感染人数预测与分析,要求使用RBF、线性回归、贝叶斯岭回归、ARIMA模型,并求准确率和将图像可视化
时间: 2024-03-18 12:45:36 浏览: 165
基于Python实现Covid-19 疫情数据分析【100010992】
5星 · 资源好评率100%
好的,以下是使用Python实现新冠感染人数预测与分析的代码,数据来源为COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University,包括RBF、线性回归、贝叶斯岭回归和ARIMA模型,并使用可视化工具将图像可视化。代码基于Python 3.x版本,需要安装numpy、pandas、sklearn、statsmodels和matplotlib等库。
```python
# 导入所需库
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.metrics import mean_squared_error
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF
from statsmodels.tsa.arima_model import ARIMA
# 读取数据
confirmed_data = pd.read_csv('time_series_covid19_confirmed_global.csv')
deaths_data = pd.read_csv('time_series_covid19_deaths_global.csv')
recovered_data = pd.read_csv('time_series_covid19_recovered_global.csv')
# 整理数据
confirmed_data = confirmed_data.drop(columns=['Province/State', 'Lat', 'Long'])
deaths_data = deaths_data.drop(columns=['Province/State', 'Lat', 'Long'])
recovered_data = recovered_data.drop(columns=['Province/State', 'Lat', 'Long'])
confirmed_data = confirmed_data.groupby('Country/Region').sum()
deaths_data = deaths_data.groupby('Country/Region').sum()
recovered_data = recovered_data.groupby('Country/Region').sum()
# 提取指定国家数据
country = 'China'
confirmed = confirmed_data.loc[country].values
deaths = deaths_data.loc[country].values
recovered = recovered_data.loc[country].values
active = confirmed - deaths - recovered
dates = confirmed_data.columns.values
# 定义训练集和测试集
train_size = int(len(confirmed) * 0.8)
train_dates, test_dates = dates[:train_size], dates[train_size:]
train_confirmed, test_confirmed = confirmed[:train_size], confirmed[train_size:]
train_active, test_active = active[:train_size], active[train_size:]
# 定义特征和目标
X_train, y_train = np.arange(len(train_dates)).reshape(-1, 1), train_confirmed.reshape(-1, 1)
X_test, y_test = np.arange(len(test_dates)).reshape(-1, 1), test_confirmed.reshape(-1, 1)
# 线性回归模型
lr = LinearRegression()
lr.fit(X_train, y_train)
y_lr = lr.predict(X_test)
mse_lr = mean_squared_error(y_test, y_lr)
print('线性回归模型均方误差:', mse_lr)
# 贝叶斯岭回归模型
br = BayesianRidge()
br.fit(X_train, y_train)
y_br = br.predict(X_test)
mse_br = mean_squared_error(y_test, y_br)
print('贝叶斯岭回归模型均方误差:', mse_br)
# RBF核高斯过程回归模型
kernel = RBF(length_scale=1.0, length_scale_bounds=(1e-1, 10.0))
gpr = GaussianProcessRegressor(kernel=kernel, alpha=0.1, n_restarts_optimizer=10)
gpr.fit(X_train, y_train)
y_gpr = gpr.predict(X_test)
mse_gpr = mean_squared_error(y_test, y_gpr)
print('RBF核高斯过程回归模型均方误差:', mse_gpr)
# ARIMA模型
model = ARIMA(train_confirmed, order=(2, 1, 2))
results = model.fit()
preds = results.forecast(len(test_confirmed))
mse_arima = mean_squared_error(test_confirmed, preds)
print('ARIMA模型均方误差:', mse_arima)
# 可视化
plt.figure(figsize=(12, 6))
plt.plot(train_dates, train_confirmed, label='Train Data')
plt.plot(test_dates, test_confirmed, label='Test Data')
plt.plot(test_dates, y_lr, label='Linear Regression')
plt.plot(test_dates, y_br, label='Bayesian Ridge Regression')
plt.plot(test_dates, y_gpr, label='Gaussian Process Regression')
plt.plot(test_dates, preds, label='ARIMA')
plt.legend(loc='best')
plt.title(f'{country} Confirmed Cases Prediction')
plt.xlabel('Date')
plt.ylabel('Confirmed Cases')
plt.show()
```
代码中首先导入所需库,然后读取数据,整理数据,提取指定国家数据,将数据分成训练集和测试集,定义特征和目标。接着分别使用线性回归、贝叶斯岭回归、RBF核高斯过程回归和ARIMA模型进行预测,并计算均方误差。最后使用matplotlib库可视化预测结果。
需要注意的是,代码中数据文件为time_series_covid19_confirmed_global.csv、time_series_covid19_deaths_global.csv和time_series_covid19_recovered_global.csv,需根据实际数据文件名进行修改。此外,国家名称需要根据实际情况进行修改。
希望对你有所帮助!
阅读全文