如何用python进行英文文本情感分析
时间: 2024-02-10 21:40:17 浏览: 112
在Python中进行英文文本情感分析可以使用自然语言处理库NLTK(Natural Language Toolkit)和文本分类器库scikit-learn。以下是一个简单的情感分析代码示例:
```python
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
# 下载必要的库和数据
nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')
# 加载数据集
reviews = [
('This is an excellent movie', 'positive'),
('The move was fantastic I like it', 'positive'),
('You should watch it, it’s brilliant', 'positive'),
('It’s a very boring movie', 'negative'),
('I did not like the movie', 'negative'),
('The movie was horrible', 'negative')
]
# 预处理数据
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))
def preprocess(document):
document = document.lower() # 转换为小写
words = word_tokenize(document) # 分词
words = [lemmatizer.lemmatize(word) for word in words] # 词形还原
words = [word for word in words if word not in stop_words] # 去除停用词
document = ' '.join(words)
return document
# 特征提取
corpus = [preprocess(review[0]) for review in reviews]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus).toarray()
# 训练分类器
y = [review[1] for review in reviews]
classifier = MultinomialNB()
classifier.fit(X, y)
# 预测情感
text = 'The movie was not good'
text = preprocess(text)
X_test = vectorizer.transform([text]).toarray()
y_pred = classifier.predict(X_test)
print(y_pred)
```
在上面的代码中,我们首先下载必要的库和数据,然后加载数据集并进行预处理。接下来,我们使用TF-IDF向量化器提取文本特征,并使用多项式朴素贝叶斯分类器训练模型。最后,我们使用训练好的模型对新的文本进行情感预测。
阅读全文