使用pytorch实现lstm,使用NSL-KDD数据集
时间: 2023-12-09 16:06:32 浏览: 153
以下是使用pytorch实现lstm并使用NSL-KDD数据集的示例代码:
```python
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder, StandardScaler
from sklearn.model_selection import train_test_split
# 加载数据集
data = pd.read_csv('KDDTrain+.txt', header=None)
data.columns = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land',
'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'num_compromised',
'root_shell', 'su_attempted', 'num_root', 'num_file_creations', 'num_shells', 'num_access_files',
'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate',
'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate',
'srv_diff_host_rate', 'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate',
'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate',
'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate',
'dst_host_srv_rerror_rate', 'label']
# 将标签列转换为分类变量
data['label'] = data['label'].apply(lambda x: 'normal' if x == 'normal.' else 'attack')
le = LabelEncoder()
data['protocol_type'] = le.fit_transform(data['protocol_type'])
data['service'] = le.fit_transform(data['service'])
data['flag'] = le.fit_transform(data['flag'])
data['label'] = le.fit_transform(data['label'])
# 对数据集进行标准化
scaler = StandardScaler()
data.iloc[:, :41] = scaler.fit_transform(data.iloc[:, :41])
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(data.iloc[:, :41], data['label'], test_size=0.2, random_state=42)
# 将数据集转换为张量
X_train = torch.tensor(X_train.values, dtype=torch.float32)
X_test = torch.tensor(X_test.values, dtype=torch.float32)
y_train = torch.tensor(y_train.values, dtype=torch.int64)
y_test = torch.tensor(y_test.values, dtype=torch.int64)
# 定义LSTM模型
class LSTM(nn.Module):
def __init__(self, input_dim, hidden_dim, output_dim):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
h0 = torch.zeros(1, x.size(0), self.hidden_dim).requires_grad_()
c0 = torch.zeros(1, x.size(0), self.hidden_dim).requires_grad_()
out, (hn, cn) = self.lstm(x, (h0.detach(), c0.detach()))
out = self.fc(out[:, -1, :])
return out
# 定义模型参数
input_dim = 41
hidden_dim = 32
output_dim = 2
lr = 0.001
epochs = 10
# 实例化模型和损失函数
model = LSTM(input_dim, hidden_dim, output_dim)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
# 训练模型
for epoch in range(epochs):
optimizer.zero_grad()
outputs = model(X_train)
loss = criterion(outputs, y_train)
loss.backward()
optimizer.step()
if epoch % 2 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, epochs, loss.item()))
# 在测试集上进行评估
with torch.no_grad():
outputs = model(X_test)
_, predicted = torch.max(outputs.data, 1)
accuracy = (predicted == y_test).sum().item() / y_test.size(0)
print('Test Accuracy: {:.2f}%'.format(accuracy * 100))
```
在上面的代码中,我们首先使用pandas加载NSL-KDD数据集,然后将标签列转换为分类变量,并对数据集进行标准化。接下来,我们将数据集分为训练集和测试集,并将它们转换为PyTorch张量。然后,我们定义了一个LSTM模型,其中包含一个LSTM层和一个全连接层。接着,我们定义了模型参数,实例化了模型和损失函数,并使用Adam优化器训练了模型。最后,我们在测试集上进行评估,并输出了准确率。
阅读全文