python使用GBDT进行二分类输入roc曲线
时间: 2024-06-05 16:01:34 浏览: 100
要使用GBDT进行二分类并绘制ROC曲线,需要进行以下步骤:
1. 导入所需库和数据集。我们可以使用sklearn中的make_classification函数生成一个二分类的数据集。
```python
from sklearn.datasets import make_classification
import pandas as pd
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
df = pd.DataFrame(X, columns=['feature_1', 'feature_2', 'feature_3', 'feature_4'])
df['target'] = y
```
2. 将数据集划分为训练集和测试集。
```python
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
3. 使用sklearn中的GradientBoostingClassifier进行模型训练。
```python
from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
model.fit(X_train, y_train)
```
4. 使用sklearn中的roc_curve函数计算ROC曲线的假正率(FPR)和真正率(TPR)。
```python
from sklearn.metrics import roc_curve
y_pred_proba = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
```
5. 使用matplotlib绘制ROC曲线。
```python
import matplotlib.pyplot as plt
plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()
```
完整代码如下:
```python
from sklearn.datasets import make_classification
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt
X, y = make_classification(n_samples=1000, n_classes=2, random_state=42)
df = pd.DataFrame(X, columns=['feature_1', 'feature_2', 'feature_3', 'feature_4'])
df['target'] = y
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)
model.fit(X_train, y_train)
y_pred_proba = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()
```
阅读全文