数据集10行8列csv文件,第一列是序号,2到7列是特征,最后一列是标签,分别选择欧氏距离和曼哈顿距离,对前8个数据,分别进行K=3和K=5的KNN 分类,输出每一个分类结果的序号,再进行当K=3时,第9,10 预测结果是? 当K=5时,第9,10 预测结果是?python代码实现
时间: 2024-03-12 19:48:08 浏览: 58
好的,根据你的要求,我来给你实现一个既可以选择欧氏距离又可以选择曼哈顿距离的KNN分类的代码,代码如下:
```python
import math
# 计算两个样本之间的欧氏距离
def euclidean_distance(instance1, instance2):
distance = 0
for i in range(1, 7):
distance += pow((float(instance1[i]) - float(instance2[i])), 2)
return math.sqrt(distance)
# 计算两个样本之间的曼哈顿距离
def manhattan_distance(instance1, instance2):
distance = 0
for i in range(1, 7):
distance += abs(float(instance1[i]) - float(instance2[i]))
return distance
# 找到距离样本最近的k个邻居
def get_neighbors(training_set, test_instance, k, distance_func):
distances = []
for i in range(len(training_set)):
dist = distance_func(test_instance, training_set[i])
distances.append((training_set[i], dist))
distances.sort(key=lambda x: x[1])
neighbors = []
for i in range(k):
neighbors.append(distances[i][0])
return neighbors
# 对k个邻居进行投票,返回预测结果
def get_response(neighbors):
class_votes = {}
for i in range(len(neighbors)):
response = neighbors[i][-1]
if response in class_votes:
class_votes[response] += 1
else:
class_votes[response] = 1
sorted_votes = sorted(class_votes.items(), key=lambda x: x[1], reverse=True)
return sorted_votes[0][0]
# 读取数据集
def load_dataset(filename):
dataset = []
with open(filename, 'r') as f:
for line in f:
instance = line.strip().split(',')
dataset.append(instance)
return dataset
# 对数据集进行KNN分类
def knn(training_set, test_set, k, distance_func):
predictions = []
for i in range(len(test_set)):
neighbors = get_neighbors(training_set, test_set[i], k, distance_func)
result = get_response(neighbors)
predictions.append(result)
return predictions
# 测试代码
filename = 'data.csv'
dataset = load_dataset(filename)
# 对前8个数据进行KNN分类
training_set = dataset[:8]
test_set = dataset[:8]
k = 3
predictions_euclidean_k3 = knn(training_set, test_set, k, euclidean_distance)
predictions_manhattan_k3 = knn(training_set, test_set, k, manhattan_distance)
print('K=3时,前8个数据的欧氏距离分类结果的序号为:', predictions_euclidean_k3)
print('K=3时,前8个数据的曼哈顿距离分类结果的序号为:', predictions_manhattan_k3)
k = 5
predictions_euclidean_k5 = knn(training_set, test_set, k, euclidean_distance)
predictions_manhattan_k5 = knn(training_set, test_set, k, manhattan_distance)
print('K=5时,前8个数据的欧氏距离分类结果的序号为:', predictions_euclidean_k5)
print('K=5时,前8个数据的曼哈顿距离分类结果的序号为:', predictions_manhattan_k5)
# 预测第9、10个数据的分类结果
test_set = dataset[8:]
k = 3
predictions_euclidean_k3 = knn(training_set, test_set, k, euclidean_distance)
predictions_manhattan_k3 = knn(training_set, test_set, k, manhattan_distance)
print('K=3时,第9、10个数据的欧氏距离分类结果的序号为:', predictions_euclidean_k3)
print('K=3时,第9、10个数据的曼哈顿距离分类结果的序号为:', predictions_manhattan_k3)
k = 5
predictions_euclidean_k5 = knn(training_set, test_set, k, euclidean_distance)
predictions_manhattan_k5 = knn(training_set, test_set, k, manhattan_distance)
print('K=5时,第9、10个数据的欧氏距离分类结果的序号为:', predictions_euclidean_k5)
print('K=5时,第9、10个数据的曼哈顿距离分类结果的序号为:', predictions_manhattan_k5)
```
这里我们定义了两个距离函数,分别是欧氏距离和曼哈顿距离,然后在进行KNN分类时,通过`distance_func`参数来选择使用哪个距离函数。其他部分的代码和之前的代码类似,只是在输出分类结果时,分别输出了欧氏距离和曼哈顿距离的分类结果序号。
阅读全文