写出基于TF-IDF+KNN进行中文文本分类的代码
时间: 2023-12-14 09:35:56 浏览: 101
以下是基于TF-IDF+KNN进行中文文本分类的代码示例:
```python
import jieba
import os
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier
# 数据预处理
def load_data(root):
labels = []
corpus = []
for filename in os.listdir(root):
label = int(filename.split('_')[0])
labels.append(label)
with open(os.path.join(root, filename), 'r', encoding='utf-8') as f:
text = f.read()
words = jieba.cut(text)
corpus.append(' '.join(words))
return labels, corpus
# 加载数据集
train_labels, train_corpus = load_data('train')
test_labels, test_corpus = load_data('test')
# 构建TF-IDF特征向量
vectorizer = TfidfVectorizer()
train_features = vectorizer.fit_transform(train_corpus)
test_features = vectorizer.transform(test_corpus)
# 构建KNN分类模型
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(train_features, train_labels)
# 模型评估
accuracy = knn.score(test_features, test_labels)
print(f'Accuracy: {accuracy}')
# 预测新数据
new_text = '这是一个测试样本'
new_words = jieba.cut(new_text)
new_feature = vectorizer.transform([' '.join(new_words)])
predicted_label = knn.predict(new_feature)[0]
print(f'Predicted label: {predicted_label}')
```
需要注意的是,上述代码中使用了jieba库进行中文分词,需要先安装jieba库。另外,该代码示例仅适用于数据集比较小的情况,对于大规模数据集需要采用分布式计算等技术进行优化。
阅读全文
相关推荐
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)
![pdf](https://img-home.csdnimg.cn/images/20241231044930.png)