KNN分类在breast cancer数据集上的应用在jupyter notebook中实现
时间: 2024-05-06 14:17:15 浏览: 69
利用KNN对数据进行分类
好的,下面是在Jupyter Notebook中使用KNN算法对Breast Cancer数据集进行分类的代码:
首先导入需要的库:
```python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
```
加载Breast Cancer数据集:
```python
data = load_breast_cancer()
```
将数据集转换为DataFrame格式:
```python
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
```
将数据集分为训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(df[data.feature_names], df['target'], test_size=0.2, random_state=42)
```
初始化KNN分类器:
```python
knn = KNeighborsClassifier(n_neighbors=5)
```
在训练集上训练KNN分类器:
```python
knn.fit(X_train, y_train)
```
在测试集上进行预测:
```python
y_pred = knn.predict(X_test)
```
计算模型的准确率:
```python
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```
完整代码如下:
```python
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# 加载Breast Cancer数据集
data = load_breast_cancer()
# 将数据集转换为DataFrame格式
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target
# 将数据集分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(df[data.feature_names], df['target'], test_size=0.2, random_state=42)
# 初始化KNN分类器
knn = KNeighborsClassifier(n_neighbors=5)
# 在训练集上训练KNN分类器
knn.fit(X_train, y_train)
# 在测试集上进行预测
y_pred = knn.predict(X_test)
# 计算模型的准确率
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```
运行代码后,即可得到模型在测试集上的准确率。
阅读全文