写一段python代码,读取一个csv文件,该文件没有表头,第一列为时间编号,编号以1,2,3……这样自然数的形式出现,第二列为充电量数值,将该数据集的前90%划分为训练集,其余部分设置为测试集,利用L1/L2正则化改进的lstm模型预测该时间序列,并打印rmse作为评价指标,并分别绘制训练集的真实值和预测值图片以及测试集的真实值和预测值图片,两张图片的横坐标单位为time/h,纵坐标单位为kwh
时间: 2023-12-01 20:04:02 浏览: 77
好的,下面是代码实现,需要安装pandas、numpy、matplotlib、sklearn、tensorflow等库:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras import regularizers
# 读取csv文件
data = pd.read_csv('data.csv', header=None)
values = data.values[:, 1].astype('float32')
# 数据归一化
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(values.reshape(-1, 1))
# 划分训练集和测试集
train_size = int(len(scaled) * 0.9)
train, test = scaled[:train_size, :], scaled[train_size:, :]
# 将序列数据转换为监督学习数据
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = pd.DataFrame(data)
cols, names = [], []
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
names += [('var%d(t-%d)' % (j + 1, i)) for j in range(n_vars)]
for i in range(0, n_out):
cols.append(df.shift(-i))
if i == 0:
names += [('var%d(t)' % (j + 1)) for j in range(n_vars)]
else:
names += [('var%d(t+%d)' % (j + 1, i)) for j in range(n_vars)]
agg = pd.concat(cols, axis=1)
agg.columns = names
if dropnan:
agg.dropna(inplace=True)
return agg
n_lag = 3
n_features = 1
reframed = series_to_supervised(scaled, n_lag, 1)
# 划分输入和输出
values = reframed.values
train_X, train_y = values[:train_size, :-1], values[:train_size, -1]
test_X, test_y = values[train_size:, :-1], values[train_size:, -1]
# 转换成3D格式[Lag, Samples, Features]
train_X = train_X.reshape((train_X.shape[0], n_lag, n_features))
test_X = test_X.reshape((test_X.shape[0], n_lag, n_features))
# 构建LSTM模型
model = Sequential()
model.add(LSTM(50, input_shape=(train_X.shape[1], train_X.shape[2]),
kernel_regularizer=regularizers.l1_l2(l1=0.01, l2=0.01)))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
# 训练模型
history = model.fit(train_X, train_y, epochs=50, batch_size=72,
validation_data=(test_X, test_y), verbose=2, shuffle=False)
# 预测测试集
yhat = model.predict(test_X)
test_X = test_X.reshape((test_X.shape[0], n_lag * n_features))
# 反归一化
inv_yhat = scaler.inverse_transform(yhat)
inv_y = scaler.inverse_transform(test_y.reshape(-1, 1))
# 计算rmse
rmse = np.sqrt(np.mean((inv_y - inv_yhat)**2))
print('Test RMSE: %.3f' % rmse)
# 绘制训练集和测试集预测结果
train_predict = model.predict(train_X)
train_predict = scaler.inverse_transform(train_predict)
train_actual = scaler.inverse_transform([train_y])
test_predict = scaler.inverse_transform(yhat)
test_actual = scaler.inverse_transform(test_y.reshape(-1, 1))
plt.figure(figsize=(10, 6))
plt.subplot(2, 1, 1)
plt.plot(train_actual.flatten(), label='Train Actual')
plt.plot(train_predict.flatten(), label='Train Predict')
plt.legend()
plt.subplot(2, 1, 2)
plt.plot(test_actual.flatten(), label='Test Actual')
plt.plot(test_predict.flatten(), label='Test Predict')
plt.legend()
plt.show()
```
注:该代码适用于只有一列数据的csv文件,如果有多列数据需要进行修改。
其中,L1正则化和L2正则化分别通过`regularizers.l1_l2(l1=0.01, l2=0.01)`实现,参数`l1`和`l2`分别表示L1正则化和L2正则化的权重。RMSE是评价指标,通过`np.sqrt(np.mean((inv_y - inv_yhat)**2))`计算,其中`inv_y`和`inv_yhat`分别表示真实值和预测值的反归一化结果。训练集和测试集的真实值和预测值图片通过matplotlib库绘制。
阅读全文