基于pytorch的代价敏感决策树模型平衡CICIDS-2017数据集算法代码并用cnn模型训练代码
时间: 2024-01-12 18:05:20 浏览: 170
基于Pytorch深度学习实现CNN、RNN的文本分类项目源码+数据集
以下是使用 PyTorch 实现的代价敏感决策树模型平衡 CICIDS-2017 数据集,并使用 CNN 模型训练的代码:
```
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from imblearn.datasets import fetch_datasets
from imblearn.under_sampling import RandomUnderSampler
# 读取数据集
dataset = fetch_datasets()['CICIDS2017']
X = dataset.data
y = dataset.target
# 平衡数据集
sampler = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = sampler.fit_resample(X, y)
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)
# 定义 CNN 模型
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=(3, 3)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2)),
nn.Conv2d(16, 32, kernel_size=(3, 3)),
nn.ReLU(),
nn.MaxPool2d(kernel_size=(2, 2)),
nn.Flatten(),
nn.Linear(1152, 128),
nn.ReLU(),
nn.Linear(128, 2)
)
def forward(self, x):
x = x.unsqueeze(1)
x = self.conv(x)
return x
# 定义代价敏感决策树损失函数
class CostSensitiveDecisionTreeLoss(nn.Module):
def __init__(self, cost_matrix):
super(CostSensitiveDecisionTreeLoss, self).__init__()
self.cost_matrix = cost_matrix
def forward(self, input, target):
loss = 0.0
for i in range(input.shape[0]):
if target[i] == 0:
loss += self.cost_matrix[0][1] * input[i][1]
else:
loss += self.cost_matrix[1][0] * input[i][0]
return loss / input.shape[0]
# 训练 CNN 模型
model = CNN()
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
epochs = 10
for epoch in range(epochs):
running_loss = 0.0
for i in range(len(X_train)):
x = torch.tensor(X_train[i], dtype=torch.float32).view(1, 1, 28, 28)
y = torch.tensor(y_train[i], dtype=torch.long)
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
running_loss += loss.item()
print('Epoch [%d], Loss: %.4f' % (epoch+1, running_loss/len(X_train)))
# 使用代价敏感决策树损失函数训练 CNN 模型
cost_matrix = torch.tensor([[0, 1], [5, 0]], dtype=torch.float32)
criterion = CostSensitiveDecisionTreeLoss(cost_matrix)
epochs = 20
for epoch in range(epochs):
running_loss = 0.0
for i in range(len(X_train)):
x = torch.tensor(X_train[i], dtype=torch.float32).view(1, 1, 28, 28)
y = torch.tensor(y_train[i], dtype=torch.long)
optimizer.zero_grad()
outputs = model(x)
loss = criterion(outputs, y)
loss.backward()
optimizer.step()
running_loss += loss.item()
print('Epoch [%d], Loss: %.4f' % (epoch+1, running_loss/len(X_train)))
# 测试 CNN 模型
y_pred = []
for i in range(len(X_test)):
x = torch.tensor(X_test[i], dtype=torch.float32).view(1, 1, 28, 28)
output = model(x)
_, predicted = torch.max(output.data, 1)
y_pred.append(predicted.item())
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))
```
在这个代码中,我们首先使用 `fetch_datasets()` 函数从 imbalanced-learn 库中获取 CICIDS-2017 数据集,并使用 `RandomUnderSampler` 类平衡数据集。然后,我们使用 `train_test_split` 函数划分训练集和测试集,并定义一个 CNN 模型。接着,我们使用 nn.Module 类定义一个代价敏感决策树损失函数,其中包含了一个代价矩阵。这个代价矩阵是一个 2x2 的矩阵,其中第一行表示对真实类别为负例的样本的惩罚,第二行表示对真实类别为正例的样本的惩罚。然后,我们使用 Adam 优化器和交叉熵损失函数训练 CNN 模型,并使用代价敏感决策树损失函数训练 CNN 模型。最后,我们使用训练好的 CNN 模型对测试集进行预测,并使用 `confusion_matrix` 和 `classification_report` 函数打印模型的评估结果。
阅读全文