词向量应用到情感分类的代码
时间: 2024-02-09 14:09:11 浏览: 23
以下是使用词向量进行情感分类的Python代码示例:
```
import numpy as np
import pandas as pd
import nltk
from nltk.corpus import stopwords
from gensim.models import Word2Vec
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# 加载数据集
data = pd.read_csv('sentiment_data.csv')
# 分割数据集
X_train, X_test, y_train, y_test = train_test_split(data['text'], data['sentiment'], test_size=0.2)
# 处理文本数据
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
def process_text(text):
words = nltk.word_tokenize(text.lower())
words = [word for word in words if word.isalpha() and word not in stop_words]
return words
# 训练词向量模型
sentences = [process_text(text) for text in X_train]
model = Word2Vec(sentences, min_count=1, size=100)
# 构建词向量特征
vectorizer = CountVectorizer(tokenizer=lambda text: process_text(text))
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)
def get_word_vector(word):
if word in model.wv:
return model.wv[word]
else:
return np.zeros(model.vector_size)
def get_sentence_vector(text):
words = process_text(text)
vectors = [get_word_vector(word) for word in words]
if len(vectors) > 0:
return np.mean(vectors, axis=0)
else:
return np.zeros(model.vector_size)
X_train = np.array([get_sentence_vector(text) for text in X_train])
X_test = np.array([get_sentence_vector(text) for text in X_test])
# 训练分类器
clf = LogisticRegression()
clf.fit(X_train, y_train)
# 预测并评估模型
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
```
在这个示例中,我们加载情感分类数据集,使用Word2Vec训练词向量模型,并使用CountVectorizer将文本数据转换为词向量特征。然后,使用LogisticRegression训练情感分类器,并使用测试集评估模型的准确性。