首页对评论语料进行情感标注抽取情感词汇 python代码

对评论语料进行情感标注抽取情感词汇 python代码

时间: 2023-10-13 13:52:42 浏览: 127

要对评论语料进行情感标注并抽取情感词汇，你可以使用Python中的一些自然语言处理工具和库。下面是一个示例代码，使用NLTK库进行情感标注和情感词汇抽取： ```python import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from nltk.sentiment import SentimentIntensityAnalyzer # 下载必要的NLTK数据 nltk.download('vader_lexicon') nltk.download('stopwords') nltk.download('punkt') def extract_sentiment_words(text): # 初始化情感强度分析器 sid = SentimentIntensityAnalyzer() # 分词并移除停用词 stop_words = set(stopwords.words('english')) word_tokens = word_tokenize(text.lower()) filtered_tokens = [w for w in word_tokens if not w in stop_words] # 对每个单词进行情感分析，并抽取情感词汇 sentiment_words = [] for word in filtered_tokens: sentiment_score = sid.polarity_scores(word) if sentiment_score['compound'] != 0.0: # 根据情感强度判断是否为情感词汇 sentiment_words.append(word) return sentiment_words # 示例评论语料 comment = "This movie is really great! The acting was amazing and the plot was engaging." # 提取情感词汇 sentiment_words = extract_sentiment_words(comment) print(sentiment_words) ``` 这段代码使用NLTK库中的`SentimentIntensityAnalyzer`进行情感分析，使用`stopwords`库移除停用词，使用`word_tokenize`函数进行分词。你可以根据需要修改代码以适应你的具体任务和数据。

阅读全文