升级版,对给定文件统计其中单词出现的频率,并以词云图的形式显示。可以借助第3方包,也可以自行设计显示至少前10个频率最高的单词。运
时间: 2024-03-26 18:38:58 浏览: 85
行环境:
- Python 3.6+
- jieba 0.42.1
- wordcloud 1.8.1
- matplotlib 3.2.2
具体实现步骤如下:
1. 安装所需第三方包
```
pip install jieba
pip install wordcloud
pip install matplotlib
```
2. 读取文件并进行分词处理
```python
import jieba
def read_file(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
return content
def segment_words(content):
words = jieba.cut(content)
return [word for word in words if len(word) > 1]
```
3. 统计单词频率并获取前10个频率最高的单词
```python
def count_words(words):
word_count = {}
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
return word_count
def get_top10_words(word_count):
top_words = sorted(word_count.items(), key=lambda x: x[1], reverse=True)[:10]
return dict(top_words)
```
4. 生成词云图并显示
```python
import wordcloud
import matplotlib.pyplot as plt
def generate_wordcloud(top_words):
wc = wordcloud.WordCloud(width=400, height=300, background_color='white')
wc.generate_from_frequencies(top_words)
return wc
def show_wordcloud(wc):
plt.imshow(wc)
plt.axis('off')
plt.show()
```
完整代码如下:
```python
import jieba
import wordcloud
import matplotlib.pyplot as plt
def read_file(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
return content
def segment_words(content):
words = jieba.cut(content)
return [word for word in words if len(word) > 1]
def count_words(words):
word_count = {}
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
return word_count
def get_top10_words(word_count):
top_words = sorted(word_count.items(), key=lambda x: x[1], reverse=True)[:10]
return dict(top_words)
def generate_wordcloud(top_words):
wc = wordcloud.WordCloud(width=400, height=300, background_color='white')
wc.generate_from_frequencies(top_words)
return wc
def show_wordcloud(wc):
plt.imshow(wc)
plt.axis('off')
plt.show()
if __name__ == '__main__':
file_path = 'test.txt'
content = read_file(file_path)
words = segment_words(content)
word_count = count_words(words)
top_words = get_top10_words(word_count)
wc = generate_wordcloud(top_words)
show_wordcloud(wc)
```
在运行上述代码前,需要将文件路径替换为实际的文件路径,然后运行即可生成词云图并显示。