读取testData目录中存储的数字符号数据文件(每个数字有1个存储文件,命名规则与训练数据相同),通过模型对符号进行分类预测并分别输出分类正确和错误的数字。提示:如果模型预测结果与文件名中第一部分存储的数字相同代表预测正确;汉明距离计算可用第三方Levenshtein 模块,from Levenshtein import hamming,hamming('0010', '1111')结果为3。
时间: 2024-02-28 10:53:05 浏览: 80
好的,根据你的要求,我可以给出以下代码实现,它读取测试数据目录中的所有数字符号存储文件,并使用之前训练好的K最近邻模型对其进行分类预测,并统计分类正确和分类错误的数字:
```python
import os
from Levenshtein import hamming
# 读取训练数据
train_dir = 'trainingData'
train_data = []
train_labels = []
for filename in os.listdir(train_dir):
label = int(filename.split('_')[0])
train_labels.append(label)
with open(os.path.join(train_dir, filename), 'r') as f:
data = f.read().strip()
train_data.append(data)
# 定义K值和汉明距离函数
K = 5
def hamming_distance(a, b):
return hamming(a, b)
# 对每个测试数据进行分类预测
test_dir = 'testData'
correct_count = 0
total_count = 0
for filename in os.listdir(test_dir):
true_label = int(filename.split('_')[0])
with open(os.path.join(test_dir, filename), 'r') as f:
test_data = f.read().strip()
distances = [hamming_distance(test_data, d) for d in train_data]
nearest_indices = sorted(range(len(distances)), key=lambda i: distances[i])[:K]
nearest_labels = [train_labels[i] for i in nearest_indices]
pred_label = max(set(nearest_labels), key=nearest_labels.count)
if pred_label == true_label:
print(f'Correct: {true_label}')
correct_count += 1
else:
print(f'Error: true label is {true_label}, but predicted label is {pred_label}.')
total_count += 1
# 输出分类准确率
accuracy = correct_count / total_count
print(f'Accuracy: {accuracy:.2%} ({correct_count}/{total_count})')
```
以上代码与之前的代码类似,不同之处在于它遍历测试数据目录中的所有文件,并对每个文件进行分类预测。如果预测标签与真实标签相同,则输出"Correct: 数字"的结果,否则输出"Error: true label is 真实数字, but predicted label is 预测数字."的结果。
最后,它统计了分类的准确率并输出结果。
阅读全文