随机产生6类数据,要求其中每一类数据分别有249,48,23,23,16,11;使用SMOTE过采样对一些样本较少的类产生一些数据,最后通过三维可视化图表现过采样前后的数据样本分布
时间: 2024-09-25 21:04:04 浏览: 29
首先,让我们创建一个6类不均衡的数据集,然后使用SMOTE对其中的一些类别进行过采样,并最终将过采样前后的情况用三维散点图展示出来。
```python
# 导入库
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
# 定义每类的数量
class_counts = [249, 48, 23, 23, 16, 11]
# 创建数据集
data = {}
for i, count in enumerate(class_counts):
X_class_i, _ = make_classification(n_samples=count, random_state=42, n_features=3) # 生成随机数据
data[f"Class_{i}"] = X_class_i
# 合并所有类
X_all = pd.concat(data.values(), ignore_index=True)
# SMOTE过采样,这里只对第3、4、5类进行过采样
classes_to_oversample = [2, 3, 4]
smote = SMOTE(random_state=42)
X_over, y_over = smote.fit_resample(X_all[X_all.columns[2:]], X_all.iloc[:, 0].astype(int))
# 分别提取过采样前后各类数据
X_resampled = {f"Class_{i}": X_all.loc[X_all.iloc[:, 0] == i, :][:3].values for i in range(6)}
X_smoted = {f"Class_{i}": X_over[y_over == i][:3] for i in classes_to_oversample}
# 拼接过采样后的数据
X_total = pd.concat({**X_resampled, **X_smoted}, ignore_index=True)
# 三维可视化
fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(projection='3d')
colors = ['blue', 'red', 'green', 'purple', 'gray', 'brown']
for i, class_name in enumerate(["Class_0", "Class_1", "Class_2", "Class_3", "Class_4", "Class_5"]):
if class_name in X_smoted:
ax.scatter(X_smoted[class_name][:, 0], X_smoted[class_name][:, 1], X_smoted[class_name][:, 2], s=10, c=colors[i], label=f"Over-sampled {class_name}")
else:
ax.scatter(X_resampled[class_name][:, 0], X_resampled[class_name][:, 1], X_resampled[class_name][:, 2], s=10, c=colors[i], label=f"Original {class_name}")
ax.legend()
ax.set_xlabel('Feature 1')
ax.set_ylabel('Feature 2')
ax.set_zlabel('Feature 3')
plt.title("Data Distribution Before and After SMOTE Over-sampling")
plt.show()
阅读全文