基于word2vec的文本分类 代码
时间: 2024-01-22 17:03:50 浏览: 260
pytorch文本分类word2vec+TextCNN. 完整代码+数据 可直接运行
5星 · 资源好评率100%
基于word2vec的文本分类代码如下所示:
```python
import numpy as np
from gensim.models import Word2Vec
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
# 1. 加载训练好的word2vec模型
model = Word2Vec.load("word2vec_model.model")
# 2. 准备训练数据和标签
sentences = [["I", "love", "this", "movie"],
["This", "is", "an", "awesome", "film"],
["Great", "acting", "in", "this", "movie"],
["The", "plot", "is", "exciting"],
["Bad", "acting", "ruined", "the", "film"]]
labels = [1, 1, 1, 1, 0]
# 3. 将每个句子的词向量进行平均,得到句子的向量表示
sentence_vectors = []
for sentence in sentences:
vectors = []
for word in sentence:
if word in model.wv:
vectors.append(model.wv[word])
if vectors:
sentence_vector = np.mean(vectors, axis=0)
sentence_vectors.append(sentence_vector)
# 4. 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(sentence_vectors, labels, test_size=0.2, random_state=42)
# 5. 训练SVM分类器
clf = SVC()
clf.fit(X_train, y_train)
# 6. 预测并计算准确率
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```
阅读全文