基于pytorch的cic-ids2017数据集使用代价敏感决策树平衡数据集后使用cnn训练代码加测试代码
时间: 2024-01-13 21:02:58 浏览: 74
以下是基于PyTorch的CIC-IDS2017数据集使用代价敏感决策树平衡数据集后使用CNN训练和测试的代码:
```python
import torch
import torch.nn as nn
import torch.optim as optim
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from imblearn.datasets import fetch_datasets
from imblearn.tree import CostSensitiveDecisionTreeClassifier
# Define the CNN model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv1d(1, 16, 3)
self.pool1 = nn.MaxPool1d(2)
self.conv2 = nn.Conv1d(16, 32, 3)
self.pool2 = nn.MaxPool1d(2)
self.fc1 = nn.Linear(32 * 39, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x):
x = self.pool1(nn.functional.relu(self.conv1(x)))
x = self.pool2(nn.functional.relu(self.conv2(x)))
x = x.view(-1, 32 * 39)
x = nn.functional.relu(self.fc1(x))
x = nn.functional.sigmoid(self.fc2(x))
return x
# Load the CIC-IDS2017 dataset
dataset = fetch_datasets()['cic_ids_2017']
# Convert the dataset to a Pandas DataFrame
data = pd.DataFrame(dataset.data, columns=dataset.feature_names)
target = dataset.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)
# Use a cost-sensitive decision tree to balance the training set
clf = CostSensitiveDecisionTreeClassifier(random_state=0, min_samples_leaf=10, cost_matrix={
0: {0: 0, 1: 1},
1: {0: 5, 1: 0}
})
clf.fit(X_train, y_train)
# Transform the training set using the decision tree
X_train, y_train = clf.sample(X_train, y_train)
# Convert the training and testing sets to PyTorch tensors
X_train_tensor = torch.tensor(X_train.values).float().unsqueeze(1)
y_train_tensor = torch.tensor(y_train.values).float()
X_test_tensor = torch.tensor(X_test.values).float().unsqueeze(1)
y_test_tensor = torch.tensor(y_test.values).float()
# Train the CNN model on the balanced training set
net = Net()
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters(), lr=0.001)
for epoch in range(10):
running_loss = 0.0
for i in range(len(X_train)):
optimizer.zero_grad()
outputs = net(X_train_tensor[i])
loss = criterion(outputs, y_train_tensor[i])
loss.backward()
optimizer.step()
running_loss += loss.item()
print(f"Epoch {epoch + 1} loss: {running_loss / len(X_train)}")
# Measure the performance of the CNN on the testing set
y_pred = net(X_test_tensor).detach().numpy().round()
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
print(f"True Negatives: {tn}")
print(f"False Positives: {fp}")
print(f"False Negatives: {fn}")
print(f"True Positives: {tp}")
```
这段代码在之前的基础上增加了一个CNN模型,并使用平衡后的训练集对其进行训练。在训练过程中,每个样本都是单独传递给模型的。最后,代码输出了CNN在测试集上的性能评估结果。请注意,这里的CNN模型可能需要根据实际情况进行调整,以确保其在平衡后的数据集上能够很好地工作。
相关推荐
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)