打卡桌面中的22012515文件夹,读取其中的文档,将高频词用wordcloud可视化为词云,用python语言
时间: 2023-06-12 11:08:27 浏览: 86
好的,首先需要安装一些必要的库,包括`jieba`用于中文分词和`wordcloud`用于生成词云图。
```python
!pip install jieba
!pip install wordcloud
```
然后我们可以开始读取文件并进行分词,计算高频词。
```python
import os
import jieba
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 读取文件夹中所有的文档
folder_path = '22012515'
file_list = os.listdir(folder_path)
doc_list = []
for file_name in file_list:
file_path = os.path.join(folder_path, file_name)
with open(file_path, 'r', encoding='utf-8') as f:
doc_list.append(f.read())
# 对所有文档进行分词,统计高频词
word_count = Counter()
for doc in doc_list:
words = jieba.cut(doc)
for word in words:
if len(word) > 1: # 只统计长度大于1的词语
word_count[word] += 1
# 获取前50个高频词
top50_words = word_count.most_common(50)
```
接下来,我们可以将高频词可视化为词云。
```python
# 生成词云图
wordcloud = WordCloud(font_path='simhei.ttf', background_color='white', width=800, height=600)
wordcloud.generate_from_frequencies(dict(top50_words))
# 显示词云图
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
```
这样就可以生成一个高频词的词云图了。你可以根据需要调整字体、背景颜色、词云图大小等参数。
完整的代码如下:
```python
import os
import jieba
from collections import Counter
from wordcloud import WordCloud
import matplotlib.pyplot as plt
# 读取文件夹中所有的文档
folder_path = '22012515'
file_list = os.listdir(folder_path)
doc_list = []
for file_name in file_list:
file_path = os.path.join(folder_path, file_name)
with open(file_path, 'r', encoding='utf-8') as f:
doc_list.append(f.read())
# 对所有文档进行分词,统计高频词
word_count = Counter()
for doc in doc_list:
words = jieba.cut(doc)
for word in words:
if len(word) > 1: # 只统计长度大于1的词语
word_count[word] += 1
# 获取前50个高频词
top50_words = word_count.most_common(50)
# 生成词云图
wordcloud = WordCloud(font_path='simhei.ttf', background_color='white', width=800, height=600)
wordcloud.generate_from_frequencies(dict(top50_words))
# 显示词云图
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
```
阅读全文