利用jieba分词对三国演义.txt进行分词,除去单个字,统计每个词出现的频率,将频率出现最高的20个词打印出来。
时间: 2024-12-16 20:16:23 浏览: 21
利用jieba库对《三国演义.txt》文件进行分词处理,首先需要安装jieba以及相关的依赖,然后按照以下步骤操作:
1. **安装包**:
```bash
pip install jieba
```
2. **读取文本**:
使用Python的内置函数`open()`打开txt文件并读取内容。
3. **分词处理**:
```python
import jieba
from collections import Counter
with open('三国演义.txt', 'r', encoding='utf-8') as file:
text = file.read()
words = jieba.lcut(text)
```
4. **去除非词语元素(如单个字)**:
```python
filtered_words = [word for word in words if len(word) > 1]
```
5. **统计词频**:
```python
word_counts = Counter(filtered_words)
```
6. **获取高频词**:
```python
top_20_words = word_counts.most_common(20)
```
7. **打印结果**:
```python
for word, freq in top_20_words:
print(f'{word}: {freq}')
```
阅读全文