构建LogisticRegression模型,完成训练 计算常见指标,并绘制PR曲线和ROC曲线 调整模型参数(penalty, C, solver),观察指标变化情况
时间: 2024-06-11 07:10:17 浏览: 100
首先导入需要的包,读取数据集:
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_curve, auc, precision_recall_curve
# 读取数据集
data = pd.read_csv('data.csv')
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
```
接着,将数据集分为训练集和测试集,使用默认参数构建LogisticRegression模型,训练模型并进行预测:
```python
# 分割数据集为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 构建模型
model = LogisticRegression()
# 训练模型
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
```
计算常见指标:
```python
# 计算指标
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print('Accuracy: {:.2f}'.format(accuracy))
print('Precision: {:.2f}'.format(precision))
print('Recall: {:.2f}'.format(recall))
print('F1-score: {:.2f}'.format(f1))
```
绘制PR曲线和ROC曲线:
```python
# 绘制PR曲线
precision, recall, _ = precision_recall_curve(y_test, y_pred)
plt.plot(recall, precision)
plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('PR Curve')
plt.show()
# 绘制ROC曲线
fpr, tpr, _ = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label='ROC curve (area = {:.2f})'.format(roc_auc))
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()
```
调整模型参数,观察指标变化情况:
```python
# 调整模型参数
for penalty in ['l1', 'l2']:
for C in [0.001, 0.01, 0.1, 1, 10, 100]:
for solver in ['liblinear', 'saga']:
# 构建模型
model = LogisticRegression(penalty=penalty, C=C, solver=solver)
# 训练模型
model.fit(X_train, y_train)
# 预测
y_pred = model.predict(X_test)
# 计算指标
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print('Penalty: {}, C: {}, Solver: {}'.format(penalty, C, solver))
print('Accuracy: {:.2f}'.format(accuracy))
print('Precision: {:.2f}'.format(precision))
print('Recall: {:.2f}'.format(recall))
print('F1-score: {:.2f}'.format(f1))
print('-' * 50)
```
通过调整模型参数,可以找到最优的参数组合,从而得到最好的模型。
阅读全文