针对鸢尾花数据集进行处理并实现二分类,画出结果展示图并输出准确率及类别预测概率
时间: 2024-04-27 07:03:46 浏览: 144
logistic回归分类器鸢尾花_logistic回归_logistic回归分类器_回归分析_鸢尾花数据集_
5星 · 资源好评率100%
好的,我可以帮您完成这个任务。首先,我们需要导入相关的库:numpy, pandas, sklearn 和 matplotlib。您可以使用以下代码进行导入:
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
```
然后,我们可以加载鸢尾花数据集并将其转换为 Pandas DataFrame。您可以使用以下代码:
```python
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
```
接下来,我们将数据集分为特征和目标变量,并将目标变量转换为二进制分类问题。我们将类别 0 和 1 合并为一个类别,类别 2 作为另一个类别。您可以使用以下代码:
```python
X = df.iloc[:, :-1].values
y = np.where(df.iloc[:, -1]==2, 1, 0)
```
然后,我们将数据集拆分为训练集和测试集。我们将使用 80% 的数据作为训练集,20% 的数据作为测试集。您可以使用以下代码:
```python
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
```
接下来,我们将对特征进行标准化处理,以确保它们具有零均值和单位方差。您可以使用以下代码:
```python
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
```
然后,我们将使用逻辑回归模型进行训练和预测。您可以使用以下代码:
```python
lr = LogisticRegression(random_state=0)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
```
接下来,我们将绘制混淆矩阵以展示分类结果。您可以使用以下代码:
```python
cm = confusion_matrix(y_test, y_pred)
print(cm)
```
最后,我们将输出模型的准确率和类别预测概率。您可以使用以下代码:
```python
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Class Probabilities: ')
print(lr.predict_proba(X_test))
```
完整的代码如下:
```python
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
iris = load_iris()
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
columns= iris['feature_names'] + ['target'])
X = df.iloc[:, :-1].values
y = np.where(df.iloc[:, -1]==2, 1, 0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
lr = LogisticRegression(random_state=0)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
print('Accuracy: ', accuracy_score(y_test, y_pred))
print('Class Probabilities: ')
print(lr.predict_proba(X_test))
```
运行代码后,您应该得到一个混淆矩阵和准确率,并输出每个测试样本属于每个类别的概率。您还可以使用 Matplotlib 绘制 ROC 曲线和精度-召回率曲线以进一步评估模型的性能。
阅读全文