使用python调用sklearn库实现对鸢尾花数据的分类,采用逻辑回归分类方法,并计算以下多分类性能评价指标(两种方式):混淆矩阵、准确率、精确率、召回率、F1值、分类报告,并绘制PR曲线与ROC曲线。
时间: 2023-12-31 12:04:32 浏览: 93
好的,我可以为您解答这个问题。
首先,我们需要加载鸢尾花数据集并进行预处理。代码如下:
``` python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
data = load_iris()
X, y = data.data, data.target
sc = StandardScaler()
X = sc.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```
接下来,我们需要使用逻辑回归模型进行训练和预测,计算多分类性能评价指标。代码如下:
``` python
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, classification_report
from sklearn.metrics import roc_auc_score, roc_curve, precision_recall_curve
import matplotlib.pyplot as plt
lr = LogisticRegression(multi_class='multinomial', solver='lbfgs')
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='macro')
recall = recall_score(y_test, y_pred, average='macro')
f1 = f1_score(y_test, y_pred, average='macro')
print("Confusion Matrix:\n", cm)
print("Accuracy: {:.2f}%".format(accuracy * 100))
print("Precision: {:.2f}%".format(precision * 100))
print("Recall: {:.2f}%".format(recall * 100))
print("F1 Score: {:.2f}%".format(f1 * 100))
print("Classification Report:\n", classification_report(y_test, y_pred))
```
我们也可以使用另一种方式计算多分类性能评价指标,代码如下:
``` python
from sklearn.metrics import multilabel_confusion_matrix, fbeta_score
mcm = multilabel_confusion_matrix(y_test, y_pred)
f_beta = fbeta_score(y_test, y_pred, beta=1, average='macro')
print("Multilabel Confusion Matrix:\n", mcm)
print("F-beta Score: {:.2f}%".format(f_beta * 100))
```
最后,我们可以绘制PR曲线和ROC曲线。代码如下:
``` python
y_proba = lr.predict_proba(X_test)
roc_auc = roc_auc_score(y_test, y_proba, multi_class='ovr')
fpr, tpr, thresholds = roc_curve(y_test, y_proba[:, 1], pos_label=1)
precision, recall, thresholds = precision_recall_curve(y_test, y_proba[:, 1], pos_label=1)
plt.plot(fpr, tpr, label="ROC curve")
plt.plot(recall, precision, label="PR curve")
plt.legend()
plt.show()
```
这样,我们就完成了对鸢尾花数据的分类,计算了多分类性能评价指标,绘制了PR曲线和ROC曲线。
阅读全文