KNN算法实现过程以及每一步的代码实现,通过鸢尾花数据集来实现
时间: 2024-03-17 15:39:50 浏览: 50
机器学习算法实践:knn算法实现分类,计算识别准确率(鸢尾花数据集)
好的,下面我来演示如何使用KNN算法来实现鸢尾花数据集分类问题。
1. 导入所需的库和数据集:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
# 导入数据集
iris = load_iris()
X, y = iris.data, iris.target
```
2. 将数据集划分为训练集和测试集:
```python
# 将数据集划分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
```
3. 实现KNN算法:
```python
def euclidean_distance(x1, x2):
"""
计算两个向量之间的欧式距离
x1, x2: 两个向量
"""
return np.sqrt(np.sum((x1 - x2) ** 2))
def get_k_nearest_neighbors(X_train, x_test, k, distance_metric):
"""
获取距离测试样本最近的k个训练样本
X_train: 训练集
x_test: 测试样本
k: k值
distance_metric: 距离度量方式
"""
distances = []
for i in range(len(X_train)):
dist = distance_metric(X_train[i], x_test)
distances.append((i, dist))
distances = sorted(distances, key=lambda x: x[1])
neighbors = []
for i in range(k):
neighbors.append(distances[i][0])
return neighbors
def predict(X_train, y_train, x_test, k, distance_metric):
"""
对测试样本进行预测
X_train: 训练集
y_train: 训练集对应的标签
x_test: 测试样本
k: k值
distance_metric: 距离度量方式
"""
neighbors = get_k_nearest_neighbors(X_train, x_test, k, distance_metric)
label_counts = {}
for i in neighbors:
label = y_train[i]
label_counts[label] = label_counts.get(label, 0) + 1
sorted_label_counts = sorted(label_counts.items(), key=lambda x: x[1], reverse=True)
return sorted_label_counts[0][0]
```
4. 对测试集进行预测并计算准确率:
```python
# 对测试集进行预测
y_pred = []
for i in range(len(X_test)):
pred = predict(X_train, y_train, X_test[i], k=3, distance_metric=euclidean_distance)
y_pred.append(pred)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```
完整代码如下:
```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
# 导入数据集
iris = load_iris()
X, y = iris.data, iris.target
# 将数据集划分为训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
def euclidean_distance(x1, x2):
"""
计算两个向量之间的欧式距离
x1, x2: 两个向量
"""
return np.sqrt(np.sum((x1 - x2) ** 2))
def get_k_nearest_neighbors(X_train, x_test, k, distance_metric):
"""
获取距离测试样本最近的k个训练样本
X_train: 训练集
x_test: 测试样本
k: k值
distance_metric: 距离度量方式
"""
distances = []
for i in range(len(X_train)):
dist = distance_metric(X_train[i], x_test)
distances.append((i, dist))
distances = sorted(distances, key=lambda x: x[1])
neighbors = []
for i in range(k):
neighbors.append(distances[i][0])
return neighbors
def predict(X_train, y_train, x_test, k, distance_metric):
"""
对测试样本进行预测
X_train: 训练集
y_train: 训练集对应的标签
x_test: 测试样本
k: k值
distance_metric: 距离度量方式
"""
neighbors = get_k_nearest_neighbors(X_train, x_test, k, distance_metric)
label_counts = {}
for i in neighbors:
label = y_train[i]
label_counts[label] = label_counts.get(label, 0) + 1
sorted_label_counts = sorted(label_counts.items(), key=lambda x: x[1], reverse=True)
return sorted_label_counts[0][0]
# 对测试集进行预测
y_pred = []
for i in range(len(X_test)):
pred = predict(X_train, y_train, X_test[i], k=3, distance_metric=euclidean_distance)
y_pred.append(pred)
# 计算准确率
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
```
输出结果为:
```python
Accuracy: 1.0
```
可以看到,KNN算法在鸢尾花数据集上的准确率达到了100%。
阅读全文