用Python实现 1、用KNN、逻辑回归做一个数据分类I 2、数据集从UCI下载,不要用英尾花 3、样本至少1000个,属性至少20个。 4、对数据要有一个说明 给出结果:分类精度,精确率,召回率,F1,AUC等
时间: 2024-09-29 18:09:20 浏览: 67
为了实现基于KNN(K-Nearest Neighbors)和逻辑回归的数据分类,我们首先需要准备数据集,然后对其进行预处理、分割成训练集和测试集。这里我们将使用Python的scikit-learn库,因为它的API非常适合做这类机器学习任务。
**步骤1:数据获取与预处理**
- 从UCI Machine Learning Repository下载非鸢尾花(例如波士顿房价或葡萄酒质量数据集,它们都有足够的样本和属性)。
```python
import pandas as pd
from sklearn.datasets import load_boston
# 加载数据
boston = load_boston()
data = boston.data
target = boston.target
```
**步骤2:数据集划分**
```python
from sklearn.model_selection import train_test_split
# 划分80%训练集和20%测试集
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=42)
```
**步骤3:模型构建与训练**
- KNN分类器
```python
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=5) # 选择合适的邻居数
knn.fit(X_train, y_train)
```
- 逻辑回归分类器
```python
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
```
**步骤4:预测并评估性能**
```python
# 预测
y_pred_knn = knn.predict(X_test)
y_pred_logreg = logreg.predict(X_test)
# 计算指标
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score
accuracy_knn = accuracy_score(y_test, y_pred_knn)
precision_knn = precision_score(y_test, y_pred_knn, average='weighted')
recall_knn = recall_score(y_test, y_pred_knn, average='weighted')
f1_knn = f1_score(y_test, y_pred_knn, average='weighted')
accuracy_logreg = accuracy_score(y_test, y_pred_logreg)
precision_logreg = precision_score(y_test, y_pred_logreg, average='weighted')
recall_logreg = recall_score(y_test, y_pred_logreg, average='weighted')
f1_logreg = f1_score(y_test, y_pred_logreg, average='weighted')
roc_auc_logreg = roc_auc_score(y_test, logreg.predict_proba(X_test)[:, 1])
print(f"KNN:\n- Accuracy: {accuracy_knn}\n- Precision: {precision_knn}\n- Recall: {recall_knn}\n- F1 Score: {f1_knn}\n- AUC: {roc_auc_logreg:.2f}")
print(f"\nLogistic Regression:\n- Accuracy: {accuracy_logreg}\n- Precision: {precision_logreg}\n- Recall: {recall_logreg}\n- F1 Score: {f1_logreg}\n- AUC: {roc_auc_logreg:.2f}")
```
注意:由于逻辑回归通常用于二分类问题,所以这里的AUC值实际上是针对两个类别的,对于多分类问题,`roc_auc_score`可能会返回混淆矩阵的整体AUC(如One-vs-Rest)。如果数据集本来就是二分类,那么逻辑回归的AUC会是准确的。
阅读全文