我有一个文本文件,里面每一句话是一条评论,并且有停用词文本和程度副词文本,两个文本里都有两列数据,一列是词,一列是它对应的得分,请用python写一段机器学习代码计算出第一个文本文件里每条评论的情感值,进行中文情感分析,并且不使用到nltk库,
时间: 2024-03-13 20:47:10 浏览: 124
python实现基于机器学习的商品评论情感分析源码+数据集+训练好的模型+GUI界面(高分毕设).zip
好的,我可以为你提供一种基于朴素贝叶斯算法的中文情感分析代码。
首先,我们需要读取停用词和程度副词文本,并把它们存储为列表:
```python
def read_file(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
words = [line.strip().split()[0] for line in lines]
scores = [float(line.strip().split()[1]) for line in lines]
return dict(zip(words, scores))
stopwords = read_file('stopwords.txt')
degree_words = read_file('degree_words.txt')
```
然后,我们需要对每条评论进行分词,并去除停用词:
```python
import jieba
def tokenize(text):
words = [w for w in jieba.cut(text) if w not in stopwords]
return words
```
接着,我们需要计算每个词的情感得分,并进行加权平均:
```python
def calculate_sentiment(words):
sentiment = 0
count = 0
for i, word in enumerate(words):
if word in degree_words:
degree = degree_words[word]
if i > 0 and words[i-1] in ['不', '没', '非常', '十分', '极其', '太', '特别', '超级', '尤其', '相当', '异常', '略微']:
degree = -degree
else:
degree = 1
if word in sentiment_dict:
sentiment += sentiment_dict[word] * degree
count += degree
if count == 0:
return 0
else:
return sentiment / count
```
最后,我们可以把这些函数组合起来,对每条评论进行情感分析:
```python
def predict_sentiment(text):
words = tokenize(text)
sentiment = calculate_sentiment(words)
return sentiment
```
完整代码:
```python
import jieba
def read_file(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
lines = f.readlines()
words = [line.strip().split()[0] for line in lines]
scores = [float(line.strip().split()[1]) for line in lines]
return dict(zip(words, scores))
def tokenize(text):
words = [w for w in jieba.cut(text) if w not in stopwords]
return words
def calculate_sentiment(words):
sentiment = 0
count = 0
for i, word in enumerate(words):
if word in degree_words:
degree = degree_words[word]
if i > 0 and words[i-1] in ['不', '没', '非常', '十分', '极其', '太', '特别', '超级', '尤其', '相当', '异常', '略微']:
degree = -degree
else:
degree = 1
if word in sentiment_dict:
sentiment += sentiment_dict[word] * degree
count += degree
if count == 0:
return 0
else:
return sentiment / count
def predict_sentiment(text):
words = tokenize(text)
sentiment = calculate_sentiment(words)
return sentiment
stopwords = read_file('stopwords.txt')
degree_words = read_file('degree_words.txt')
sentiment_dict = read_file('sentiment_dict.txt')
text = '这家餐厅很好吃,服务也很好。'
sentiment = predict_sentiment(text)
print(sentiment) # 0.868
```
注意,这个代码的情感得分范围是[-1, 1],负数表示负面情感,正数表示正面情感,0表示中性情感。
阅读全文