python数据预处理,jieba分词,去除停用词表,词频分析,并生成词云图代码
时间: 2023-04-02 16:03:55 浏览: 308
以下是 Python 数据预处理、jieba 分词、去除停用词表、词频分析并生成词云图的代码:
```python
import jieba
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 读取文本文件
with open('text.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.cut(text)
# 去除停用词表
stopwords = ['的', '了', '是', '我', '你', '他', '她', '它', '我们', '你们', '他们', '她们', '它们']
filtered_words = []
for word in words:
if word not in stopwords:
filtered_words.append(word)
# 词频分析
word_freq = {}
for word in filtered_words:
if word in word_freq:
word_freq[word] += 1
else:
word_freq[word] = 1
# 生成词云图
wordcloud = WordCloud(font_path='msyh.ttc', background_color='white', width=800, height=600).generate_from_frequencies(word_freq)
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
```
希望能对你有所帮助!
阅读全文