pytorch实现LSTM训练模型,使用NSS-KDD数据集
时间: 2023-12-09 14:07:01 浏览: 70
基于Pytorch实现LSTM
首先,需要导入必要的库和数据集。在这里,我们将使用PyTorch和NSS-KDD数据集。
```python
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
# 加载数据集
train_data = pd.read_csv('KDDTrain.csv')
test_data = pd.read_csv('KDDTest.csv')
```
接下来,我们需要对数据进行预处理。NSS-KDD数据集包含42个特征,其中包括41个离散特征和1个连续特征。我们需要将这些特征转换为数值形式,并对它们进行标准化处理。
```python
# 将离散特征转为数值形式
def convert_to_numerical(data):
# 对离散特征进行编码(One-Hot编码)
protocol = pd.get_dummies(data['protocol_type'])
service = pd.get_dummies(data['service'])
flag = pd.get_dummies(data['flag'])
# 将编码后的特征与连续特征合并
numerical_data = pd.concat([protocol, service, flag, data[['duration', 'src_bytes', 'dst_bytes']]], axis=1)
# 将标签进行编码
labels = data['label']
labels[labels!='normal.'] = 'attack.'
labels = pd.get_dummies(labels)['attack.']
return numerical_data, labels
# 对训练集和测试集进行预处理
train_data_numerical, train_labels = convert_to_numerical(train_data)
test_data_numerical, test_labels = convert_to_numerical(test_data)
# 标准化处理
mean = train_data_numerical.mean(axis=0)
std = train_data_numerical.std(axis=0)
train_data_numerical = (train_data_numerical - mean) / std
test_data_numerical = (test_data_numerical - mean) / std
```
接下来,我们将定义一个LSTM模型。在这里,我们将使用一个单层LSTM结构,其输入维度为42(特征数量),输出维度为128,隐藏状态的维度也为128,输出层的维度为2(二分类问题)。
```python
class LSTM(nn.Module):
def __init__(self, input_size, output_size, hidden_size):
super(LSTM, self).__init__()
self.input_size = input_size
self.output_size = output_size
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
h0 = torch.zeros(1, x.size(0), self.hidden_size)
c0 = torch.zeros(1, x.size(0), self.hidden_size)
out, _ = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out
```
接下来,我们将定义一些超参数,包括学习率、批大小和训练轮数。
```python
# 超参数
learning_rate = 0.001
batch_size = 64
num_epochs = 10
```
接下来,我们将使用PyTorch的DataLoader来加载数据集,并将其分成小批次进行训练。
```python
# 加载数据集
train_dataset = torch.utils.data.TensorDataset(torch.tensor(np.array(train_data_numerical), dtype=torch.float32), torch.tensor(np.array(train_labels), dtype=torch.float32))
test_dataset = torch.utils.data.TensorDataset(torch.tensor(np.array(test_data_numerical), dtype=torch.float32), torch.tensor(np.array(test_labels), dtype=torch.float32))
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False)
```
接下来,我们将定义一个优化器和一个损失函数。在这里,我们将使用Adam优化器和二元交叉熵损失函数。
```python
# 定义优化器和损失函数
model = LSTM(input_size=42, output_size=2, hidden_size=128)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.BCEWithLogitsLoss()
```
接下来,我们将训练模型。在每个训练轮次中,我们将使用DataLoader加载小批量数据,并将其送入模型中进行训练。然后,我们将计算损失并使用优化器来更新模型的参数。
```python
# 训练模型
for epoch in range(num_epochs):
for i, (data, labels) in enumerate(train_loader):
optimizer.zero_grad()
outputs = model(data)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
if (i+1) % 100 == 0:
print('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
.format(epoch+1, num_epochs, i+1, len(train_loader), loss.item()))
```
最后,我们将使用测试集来评估模型的性能。在这里,我们将计算模型的准确率、精确度、召回率和F1分数。
```python
# 测试模型
model.eval()
with torch.no_grad():
correct = 0
total = 0
tp = 0
fp = 0
fn = 0
for data, labels in test_loader:
outputs = model(data)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
tp += ((predicted == 1) & (labels == 1)).sum().item()
fp += ((predicted == 1) & (labels == 0)).sum().item()
fn += ((predicted == 0) & (labels == 1)).sum().item()
accuracy = correct / total
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1_score = 2 * (precision * recall) / (precision + recall)
print('Accuracy: {:.2f}%'.format(100 * accuracy))
print('Precision: {:.2f}%'.format(100 * precision))
print('Recall: {:.2f}%'.format(100 * recall))
print('F1 Score: {:.2f}%'.format(100 * f1_score))
```
这就是使用PyTorch实现LSTM训练模型的完整代码。
阅读全文