词云分析带有停用词表和同义词表的Python代码
时间: 2024-03-15 09:44:21 浏览: 98
以下是带有停用词表和同义词表的Python代码:
```python
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 读取文本文件
with open('text.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 加载停用词表
stopwords = set()
with open('stopwords.txt', 'r', encoding='utf-8') as f:
for line in f:
stopwords.add(line.strip())
# 加载同义词表
synonyms = {}
with open('synonyms.txt', 'r', encoding='utf-8') as f:
for line in f:
words = line.strip().split()
for w in words[1:]:
synonyms[w] = words[0]
# 分词并进行同义词替换和去停用词
words = []
for w in jieba.cut(text):
if w in synonyms:
w = synonyms[w]
if w not in stopwords:
words.append(w)
# 统计词频
counter = {}
for w in words:
counter[w] = counter.get(w, 0) + 1
# 生成词云
wc = WordCloud(font_path='STHeiti Medium.ttc', background_color='white')
wc.generate_from_frequencies(counter)
# 显示词云
plt.imshow(wc)
plt.axis('off')
plt.show()
```
其中,`text.txt` 是待分析的文本文件,`stopwords.txt` 是停用词表文件,`synonyms.txt` 是同义词表文件。在代码中,先分别加载停用词表和同义词表,然后对文本进行分词并进行同义词替换和去停用词处理,最后统计词频,生成词云并显示。需要注意的是,需要安装 `jieba` 和 `wordcloud` 库。
阅读全文