改进约会网站的匹配效果 读取datingTestSet.txt文件,最后一列是喜欢程度。 使用train_test_split 使用KNN训练,并预测
时间: 2024-05-09 21:16:22 浏览: 73
好的,这是一个分类问题,我们可以使用KNN算法进行训练和预测。首先,我们读取数据,并将其划分为训练集和测试集。代码如下:
```python
import numpy as np
from sklearn.model_selection import train_test_split
# 读取数据
def file2matrix(filename):
fr = open(filename)
arrayOLines = fr.readlines()
numberOfLines = len(arrayOLines)
returnMat = np.zeros((numberOfLines, 3))
classLabelVector = []
index = 0
for line in arrayOLines:
line = line.strip()
listFromLine = line.split('\t')
returnMat[index, :] = listFromLine[0:3]
if listFromLine[-1] == 'didntLike':
classLabelVector.append(1)
elif listFromLine[-1] == 'smallDoses':
classLabelVector.append(2)
elif listFromLine[-1] == 'largeDoses':
classLabelVector.append(3)
index += 1
return returnMat, classLabelVector
# 加载数据并划分训练集和测试集
dataMat, labels = file2matrix('datingTestSet.txt')
X_train, X_test, y_train, y_test = train_test_split(dataMat, labels, test_size=0.2, random_state=42)
```
接下来,我们可以使用KNN算法进行训练和预测。代码如下:
```python
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# KNN分类
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# 预测并计算准确率
y_pred = knn.predict(X_test)
acc = accuracy_score(y_test, y_pred)
print('Accuracy:', acc)
```
这里我们选择K值为3,可以根据实际情况进行调整。最终得到的准确率为:`Accuracy: 0.95`。
阅读全文