用python写基于KNN算法的文本分类
时间: 2024-03-11 12:46:57 浏览: 28
以下是基于Python实现的KNN算法文本分类示例:
首先,需要导入必要的库,包括numpy、pandas和sklearn:
```python
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
```
接下来,读取文本数据集并进行预处理:
```python
df = pd.read_csv('text_dataset.csv') # 读取数据集
# 将文本转化为向量表示
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(df['text'])
# 将类别转化为数字标签
labels = df['label'].astype('category').cat.codes
```
然后,划分训练集和测试集:
```python
X_train, X_test, y_train, y_test = train_test_split(tfidf, labels, test_size=0.2, random_state=42)
```
接下来,构建KNN分类器并进行训练和预测:
```python
k = 5 # 设置K值
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train) # 训练模型
y_pred = knn.predict(X_test) # 预测测试集
```
最后,计算分类准确率:
```python
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
完整代码如下:
```python
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
df = pd.read_csv('text_dataset.csv') # 读取数据集
# 将文本转化为向量表示
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf = tfidf_vectorizer.fit_transform(df['text'])
# 将类别转化为数字标签
labels = df['label'].astype('category').cat.codes
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(tfidf, labels, test_size=0.2, random_state=42)
# 构建KNN分类器并进行训练和预测
k = 5 # 设置K值
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train) # 训练模型
y_pred = knn.predict(X_test) # 预测测试集
# 计算分类准确率
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
请注意,上述示例代码仅提供了一个基本的KNN文本分类示例,实际应用中还需要根据具体情况进行调参和优化。