ADASYN自适应综合过采样python实现
时间: 2023-11-10 12:03:48 浏览: 244
ADASYN(Adaptive Synthetic Sampling)是一种针对不平衡数据集的过采样方法,它可以根据不同类别之间的分布差异性来生成合成样本。相比于传统的过采样方法,ADASYN可以更好地保留原始数据的特征信息,从而提高分类器的性能。
以下是ADASYN自适应综合过采样的Python实现:
```python
from collections import Counter
import numpy as np
from sklearn.neighbors import NearestNeighbors
def ADASYN(X, y, k=5, ratio=0.5):
"""
ADASYN自适应综合过采样
:param X: 特征矩阵
:param y: 标签矩阵
:param k: 邻居数,默认为5
:param ratio: 合成样本比例,默认为0.5
:return: 合成后的特征矩阵和标签矩阵
"""
# 统计每个类别的样本数
counter = Counter(y)
# 计算少数类样本数目
minority_num = min(counter.values())
# 计算多数类样本数目
majority_num = max(counter.values())
# 计算需要合成的新样本数目
synthetic_num = int((minority_num * ratio) / (1 - ratio))
# 初始化合成后的特征矩阵和标签矩阵
synthetic_X = np.zeros((synthetic_num, X.shape[1]))
synthetic_y = np.zeros(synthetic_num)
# 计算每个少数类样本需要合成的新样本数目
for i, minority_class in enumerate(counter.keys()):
# 获取少数类样本的索引
minority_indices = np.where(y == minority_class)[0]
# 计算每个少数类样本需要合成的新样本数目
if i == 0:
synthetic_minority_num = synthetic_num
else:
synthetic_minority_num = int(synthetic_num * (counter[minority_class] / minority_num))
# 使用K近邻算法生成新样本
knn = NearestNeighbors(n_neighbors=k).fit(X[minority_indices])
for j in range(synthetic_minority_num):
# 随机选择一个少数类样本
index = np.random.randint(0, minority_num)
# 找到该样本的k个近邻
_, neighbors = knn.kneighbors(X[minority_indices[index]].reshape(1, -1))
# 随机选择一个近邻
nn_index = np.random.randint(0, k)
# 计算插值权重
weight = np.random.rand()
# 生成新样本
synthetic_X[j] = X[minority_indices[index]] + weight * (X[minority_indices[neighbors[0][nn_index]]] - X[minority_indices[index]])
synthetic_y[j] = minority_class
# 合并原始样本和新样本
X_resampled = np.vstack((X, synthetic_X))
y_resampled = np.hstack((y, synthetic_y))
return X_resampled, y_resampled
```
阅读全文