transformer-bilstm AirPassengers
时间: 2024-12-27 10:28:35 浏览: 5
### 使用 Transformer-BiLSTM 模型进行 AirPassengers 数据集时间序列预测
#### 准备工作
为了使用 Transformer 和双向 LSTM (BiLSTM) 处理 `AirPassengers` 数据集并实现时间序列预测,需先加载必要的库和准备数据。
```python
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import tensorflow as tf
from keras.models import Sequential, Model
from keras.layers import Dense, Input, Embedding, GRU, Bidirectional, TimeDistributed, LayerNormalization, MultiHeadAttention
```
#### 加载与预处理数据
读取 `AirPassengers.csv` 文件中的乘客数量列作为目标变量,并对其进行缩放和平滑化处理以便于后续建模过程[^1].
```python
data = pd.read_csv('AirPassengers.csv')
passenger_counts = data['# Passengers'].values.reshape(-1, 1)
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_data = scaler.fit_transform(passenger_counts)
```
#### 构造训练样本
创建输入特征 X 及其对应的标签 y 来构建适合神经网络学习的数据结构。这里采用滚动窗口方法来提取历史观测值作为未来时刻的预测依据[^2].
```python
def create_dataset(dataset, look_back=1):
dataX, dataY = [], []
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back), 0]
dataX.append(a)
dataY.append(dataset[i + look_back, 0])
return np.array(dataX), np.array(dataY)
time_step = 12 # 假设我们考虑过去一年内的月度数据来进行下一个月份的预测
train_size = int(len(scaled_data) * 0.8)
test_size = len(scaled_data) - train_size
train, test = scaled_data[0:train_size,:], scaled_data[train_size:len(scaled_data),:]
# 创建训练集和测试集
trainX, trainY = create_dataset(train, time_step)
testX, testY = create_dataset(test, time_step)
# 调整形状以适应 Keras 的输入要求
trainX = np.reshape(trainX, (trainX.shape[0], trainX.shape[1], 1))
testX = np.reshape(testX, (testX.shape[0], testX.shape[1], 1))
```
#### 定义混合架构模型
结合 Transformer 编码器层以及 BiLSTM 层的优势设计一个自定义的时间序列预测框架。此部分会涉及到多头注意力机制的应用,从而增强对于长期依赖关系的学习能力[^3].
```python
input_layer = Input(shape=(None, 1))
# 添加位置编码(如果需要)
pos_encoding = PositionalEncoding(input_shape[-2])(input_layer)
transformer_block = TransformerEncoderBlock(d_model=64, num_heads=8)(pos_encoding)
bi_lstm_output = Bidirectional(LSTM(units=50))(transformer_block)
output_layer = Dense(1)(bi_lstm_output)
model = Model(inputs=input_layer, outputs=output_layer)
model.compile(optimizer='adam', loss='mean_squared_error')
class PositionalEncoding(tf.keras.layers.Layer):
def __init__(self, position, d_model):
super(PositionalEncoding, self).__init__()
angle_rads = get_angles(np.arange(position)[:, np.newaxis],
np.arange(d_model)[np.newaxis, :],
d_model)
sines = np.sin(angle_rads[:, 0::2])
cosines = np.cos(angle_rads[:, 1::2])
pos_encoding = np.concatenate([sines, cosines], axis=-1)
pos_encoding = pos_encoding[np.newaxis, ...]
self.pos_encoding = tf.cast(pos_encoding, dtype=tf.float32)
def call(self, inputs):
return inputs + self.pos_encoding[:, :tf.shape(inputs)[1], :]
class TransformerEncoderBlock(tf.keras.Model):
def __init__(self, *, d_model, num_heads):
super().__init__()
self.mha = MultiHeadAttention(key_dim=d_model, num_heads=num_heads)
self.ffn = PointWiseFeedForwardNetwork()
self.layernorm1 = LayerNormalization(epsilon=1e-6)
self.dropout1 = Dropout(rate=0.1)
def call(self, x, training=True):
attn_output = self.mha(x, x, x)
out1 = self.layernorm1(x + self.dropout1(attn_output))
ffn_output = self.ffn(out1)
return out1 + ffn_output
class PointWiseFeedForwardNetwork(tf.keras.Sequential):
def __init__(self):
super().__init__([
Dense(2048, activation='relu'),
Dense(64)])
```
#### 训练模型
设置合适的超参数并对上述所建立之模型实施拟合操作直至收敛为止;期间可利用早停法防止过拟合并节省计算资源消耗[^4].
```python
history = model.fit(
trainX,
trainY,
epochs=20,
batch_size=32,
validation_split=0.2,
callbacks=[EarlyStopping(monitor='val_loss', patience=3)],
verbose=1)
```
#### 预测与评估性能
最后一步则是运用已训练好的模型对未来一段时间内航空旅客流量做出合理推测,并通过可视化手段直观展示实际值同预测结果之间的差异程度[^5].
```python
predictions = model.predict(testX)
predicted_values = scaler.inverse_transform(predictions)
actual_values = scaler.inverse_transform([testY])
plt.plot(actual_values.flatten(), label="Actual Values")
plt.plot(predicted_values.flatten(), label="Predictions", linestyle="--")
plt.legend()
plt.show()
```
阅读全文