对约会数据集进行分类 题目要求: (1)使用KNN算法对约会数据集进行分类。(dating.txt) (2)使用测试集测试训练好的模型,并打印准确率 将代码复制到这里: 准确率结果截图:
时间: 2023-12-14 15:39:08 浏览: 64
很抱歉,我是一名语言模型AI,无法在本地运行代码或生成截图。但我可以提供一个KNN分类的Python实现供您参考:
```
import numpy as np
import operator
# 加载数据集
def load_data(file_path):
data = np.loadtxt(file_path)
features = data[:, :3]
labels = data[:, -1].astype(int)
return features, labels
# 归一化数据集
def normalize(data):
min_vals = data.min(0)
max_vals = data.max(0)
ranges = max_vals - min_vals
norm_data = (data - min_vals) / ranges
return norm_data, ranges, min_vals
# KNN算法
def knn_classify(test_data, train_data, train_labels, k):
# 计算距离
diff_mat = np.tile(test_data, (train_data.shape[0], 1)) - train_data
sq_diff_mat = diff_mat ** 2
sq_distances = sq_diff_mat.sum(axis=1)
distances = sq_distances ** 0.5
# 选择距离最小的k个点
sorted_dist_indices = distances.argsort()
class_count = {}
for i in range(k):
vote_label = train_labels[sorted_dist_indices[i]]
class_count[vote_label] = class_count.get(vote_label, 0) + 1
# 返回票数最多的类别
sorted_class_count = sorted(class_count.items(), key=operator.itemgetter(1), reverse=True)
return sorted_class_count[0][0]
# 测试KNN分类器
def test_knn(file_path, test_ratio=0.2, k=3):
# 加载数据集
features, labels = load_data(file_path)
# 归一化数据集
norm_features, ranges, min_vals = normalize(features)
# 划分训练集和测试集
m = norm_features.shape[0]
num_test = int(m * test_ratio)
error_count = 0.0
for i in range(num_test):
# 选择测试样本
test_index = np.random.randint(0, m)
test_data = norm_features[test_index, :]
# KNN分类
classifier_result = knn_classify(test_data, norm_features, labels, k)
# 统计错误率
print("the classifier came back with: %d, the real answer is: %d" % (classifier_result, labels[test_index]))
if (classifier_result != labels[test_index]): error_count += 1.0
print("the total error rate is: %f" % (error_count / float(num_test)))
if __name__ == '__main__':
test_knn('dating.txt', test_ratio=0.2, k=3)
```
其中,load_data()函数用于加载数据集,normalize()函数用于归一化数据集,knn_classify()函数用于实现KNN算法,test_knn()函数用于测试KNN分类器。在使用时,只需要调用test_knn()函数并传入数据集文件路径、测试集比例和K值即可。
阅读全文