使用python写一段smote算法
时间: 2023-05-23 11:04:45 浏览: 155
Sure! 下面是一个基本的SMOTE算法的Python代码:
```python
import numpy as np
from sklearn.neighbors import NearestNeighbors
def smote(X, y, k=5, ratio=1):
"""
Generate synthetic samples using SMOTE algorithm.
:param X: 2D numpy array of shape (n_samples, n_features)
:param y: 1D numpy array of shape (n_samples,), the corresponding class labels
:param k: int, the number of nearest neighbors to use when selecting similar samples
:param ratio: float, the amount of SMOTE to apply (e.g. 0.5 means increase the minority class by 50%)
:return: 2D numpy array of shape (n_samples', n_features), the synthetic samples
"""
# Get the unique class labels and the number of samples in each class
classes, counts = np.unique(y, return_counts=True)
# Find the minority class
minority_class = classes[np.argmin(counts)]
# Find the indices of the minority class samples
minority_indices = np.where(y == minority_class)[0]
# Calculate the number of synthetic samples to generate
n_to_generate = int(ratio * counts[np.argmin(counts)])
# Initialize an empty array to hold the synthetic samples
synthetic_samples = np.zeros((n_to_generate, X.shape[1]))
# Fit a k-NN model to the original data
knn = NearestNeighbors(n_neighbors=k).fit(X)
# Generate the synthetic samples
for i in range(n_to_generate):
# Choose a random minority class sample
idx = np.random.choice(minority_indices)
# Find its k nearest neighbors in the original data
nn = knn.kneighbors(X[idx].reshape(1, -1), return_distance=False)[0]
# Choose one of the neighbors randomly
nn_idx = np.random.choice(nn)
# Calculate the difference between the minority sample and the neighbor
diff = X[nn_idx] - X[idx]
# Multiply this difference by a random value between 0 and 1
gap = np.random.rand() * diff
# Add this gap to the minority sample to create the synthetic sample
synthetic_samples[i, :] = X[idx] + gap
# Combine the original data with the synthetic data
new_X = np.vstack((X, synthetic_samples))
new_y = np.hstack((y, np.array([minority_class] * n_to_generate)))
# Shuffle the data and return it
idx = np.random.permutation(new_X.shape[0])
return new_X[idx], new_y[idx]
```
使用方法:
```python
# 载入数据
X, y = load_data()
# 使用 SMOTE 生成新的样本
X_smote, y_smote = smote(X, y, k=5, ratio=0.5)
```
阅读全文