【金山文档】 ch6_diabetes https://kdocs.cn/l/ccCqfzTWNNqO为数据集,数据集记录了768位印第安女性如下表所示的9个变量信息。建立逻辑回归模型,预测患者是否有糖尿病,给出模型在检验样本上的混淆矩阵,评价模型分类效果。
时间: 2024-06-08 19:07:44 浏览: 225
好的,下面我将介绍如何使用Python进行逻辑回归分析。
首先,我们需要导入所需的库:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
```
然后,我们可以读取数据集:
```python
url = 'https://kdocs.cn/l/ccCqfzTWNNqO'
data = pd.read_csv(url)
```
接下来,我们需要将数据集分为训练集和测试集:
```python
X = data.drop('Outcome', axis=1)
y = data['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
```
然后,我们可以创建逻辑回归模型并进行拟合:
```python
model = LogisticRegression()
model.fit(X_train, y_train)
```
接着,我们可以使用测试集进行预测并计算混淆矩阵和准确率:
```python
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
print(cm)
print('Accuracy:', accuracy)
```
最后,我们可以输出混淆矩阵和准确率来评价模型分类效果。
完整代码如下:
```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
url = 'https://kdocs.cn/l/ccCqfzTWNNqO'
data = pd.read_csv(url)
X = data.drop('Outcome', axis=1)
y = data['Outcome']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
print(cm)
print('Accuracy:', accuracy)
```
希望这可以帮到你。
阅读全文