Knn算法解决海伦约会问题代码
时间: 2023-10-03 12:04:41 浏览: 108
海伦约会预测KNN算法实现
5星 · 资源好评率100%
以下是使用Python实现KNN算法解决海伦约会问题的代码,代码中使用了sklearn库来实现KNN算法。
```python
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
# 读取数据集
def file2matrix(filename):
fr = open(filename)
arrayOLines = fr.readlines()
numberOfLines = len(arrayOLines)
returnMat = np.zeros((numberOfLines, 3))
classLabelVector = []
index = 0
for line in arrayOLines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index, :] = listFromLine[0:3]
classLabelVector.append(int(listFromLine[-1]))
index += 1
return returnMat, classLabelVector
# 数据归一化
def autoNorm(dataSet):
minVals = dataSet.min(0)
maxVals = dataSet.max(0)
ranges = maxVals - minVals
normDataSet = np.zeros(np.shape(dataSet))
m = dataSet.shape[0]
normDataSet = dataSet - np.tile(minVals, (m, 1))
normDataSet = normDataSet/np.tile(ranges, (m, 1))
return normDataSet, ranges, minVals
# 测试集比例
hoRatio = 0.10
# 读取数据集
datingDataMat, datingLabels = file2matrix('datingTestSet2.txt')
# 数据归一化
normMat, ranges, minVals = autoNorm(datingDataMat)
# 数据集划分
m = normMat.shape[0]
numTestVecs = int(m*hoRatio)
numTrainVecs = m - numTestVecs
trainMat = normMat[numTestVecs:m,:]
trainLabels = datingLabels[numTestVecs:m]
testMat = normMat[0:numTestVecs,:]
testLabels = datingLabels[0:numTestVecs]
# 训练模型
k = 3
clf = KNeighborsClassifier(n_neighbors=k)
clf.fit(trainMat, trainLabels)
# 预测结果
accuracy = clf.score(testMat, testLabels)
print("预测准确率为:%.2f%%" % (accuracy*100))
```
其中,数据集文件`datingTestSet2.txt`需要自行下载,也可以使用其他数据集进行测试。
阅读全文