python imbalanced-learn库实用例子(examples)代码
时间: 2023-09-03 21:04:17 浏览: 143
imbalanced-learn是一个用于解决数据不平衡问题的Python库。下面是使用imbalanced-learn库的一个实际例子代码:
```python
from imblearn.over_sampling import RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# 创建一个不平衡的二分类数据集
X, y = make_classification(n_samples=1000, n_features=10, weights=[0.95, 0.05], random_state=42)
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 显示训练集中的类别分布
print("训练集中各类别数量:", len(y_train))
print("训练集中类别1的数量:", sum(y_train == 1))
print("训练集中类别0的数量:", sum(y_train == 0))
# 使用RandomOverSampler进行过采样
over_sampler = RandomOverSampler(random_state=42)
X_train_over, y_train_over = over_sampler.fit_sample(X_train, y_train)
# 显示过采样后的训练集中的类别分布
print("过采样后的训练集各类别数量:", len(y_train_over))
print("过采样后的训练集类别1的数量:", sum(y_train_over == 1))
print("过采样后的训练集类别0的数量:", sum(y_train_over == 0))
# 使用RandomUnderSampler进行欠采样
under_sampler = RandomUnderSampler(random_state=42)
X_train_under, y_train_under = under_sampler.fit_sample(X_train, y_train)
# 显示欠采样后的训练集中的类别分布
print("欠采样后的训练集各类别数量:", len(y_train_under))
print("欠采样后的训练集类别1的数量:", sum(y_train_under == 1))
print("欠采样后的训练集类别0的数量:", sum(y_train_under == 0))
# 使用SVM进行训练和预测
svm = SVC(random_state=42)
svm.fit(X_train_over, y_train_over)
y_pred = svm.predict(X_test)
# 计算预测准确率
accuracy = accuracy_score(y_test, y_pred)
print("预测准确率:", accuracy)
```
上述代码首先使用make_classification函数生成一个不平衡的二分类数据集,并将其分为训练集和测试集。然后,分别使用RandomOverSampler和RandomUnderSampler进行过采样和欠采样处理,得到过采样和欠采样后的训练集。接下来,使用SVM模型对过采样后的训练集进行训练,并在测试集上进行预测,最后计算预测准确率。通过这个例子,我们可以看到imbalanced-learn库可以很方便地解决数据不平衡问题。
阅读全文