复现乳腺癌数据集内容的KNN算法,要求使用交叉验证算法,尽量求出更好的值与更好的算法。
时间: 2024-05-12 18:20:03 浏览: 60
KNN算法诊断乳腺癌
首先,我们需要加载乳腺癌数据集,并将其分为训练集和测试集:
```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=42)
```
接下来,我们需要标准化数据,以确保所有特征具有相同的重要性:
```python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
```
然后,我们可以使用交叉验证算法来确定最佳的K值。我们可以使用GridSearchCV函数来执行此操作,它将对每个K值进行评估并返回最佳参数:
```python
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
param_grid = {'n_neighbors': range(1, 11)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid=param_grid, cv=5)
grid.fit(X_train_scaled, y_train)
print("Best parameters: {}".format(grid.best_params_))
print("Best cross-validation score: {:.2f}".format(grid.best_score_))
```
最后,我们可以使用最佳参数来训练KNN模型,并在测试集上进行评估:
```python
knn = KNeighborsClassifier(n_neighbors=grid.best_params_['n_neighbors'])
knn.fit(X_train_scaled, y_train)
print("Test set score: {:.2f}".format(knn.score(X_test_scaled, y_test)))
```
以下是完整的代码:
```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
cancer = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(cancer.data, cancer.target, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
param_grid = {'n_neighbors': range(1, 11)}
grid = GridSearchCV(KNeighborsClassifier(), param_grid=param_grid, cv=5)
grid.fit(X_train_scaled, y_train)
print("Best parameters: {}".format(grid.best_params_))
print("Best cross-validation score: {:.2f}".format(grid.best_score_))
knn = KNeighborsClassifier(n_neighbors=grid.best_params_['n_neighbors'])
knn.fit(X_train_scaled, y_train)
print("Test set score: {:.2f}".format(knn.score(X_test_scaled, y_test)))
```
阅读全文