smote python
时间: 2023-10-16 13:22:34 浏览: 55
SMOTE (Synthetic Minority Over-sampling Technique) is an oversampling technique used in machine learning to balance imbalanced datasets. It generates synthetic samples for the minority class by interpolating between existing samples. In Python, you can use the `imblearn` library to apply SMOTE to your dataset.
Here is an example code snippet:
```python
from imblearn.over_sampling import SMOTE
from sklearn.datasets import make_classification
X, y = make_classification(n_classes=2, class_sep=2,
weights=[0.1, 0.9], n_informative=3,
n_redundant=1, flip_y=0, n_features=20,
n_clusters_per_class=1, n_samples=1000,
random_state=10)
print("Original dataset shape:", X.shape, y.shape)
# Original dataset shape: (1000, 20) (1000,)
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X, y)
print("Resampled dataset shape:", X_res.shape, y_res.shape)
# Resampled dataset shape: (1800, 20) (1800,)
```
In this example, we first generate an imbalanced dataset with 10% of the samples belonging to the minority class. We then use SMOTE to balance the dataset by generating synthetic samples for the minority class. The resulting dataset has the same number of samples for both classes.
阅读全文