使用Python对《西游记》进行分词、停用词过滤和词频统计
时间: 2024-03-16 16:44:37 浏览: 332
基于Python实现词典的分词方法或统计分词方法【100010996】
好的,下面是完整的代码:
```python
import jieba
from collections import Counter
# 打开《西游记》文本
with open('journey_to_the_west.txt', 'r', encoding='utf-8') as f:
text = f.read()
# 分词
words = jieba.lcut(text)
# 去除停用词
stopwords = []
with open('stopwords.txt', 'r', encoding='utf-8') as f:
for line in f:
stopwords.append(line.strip())
filtered_words = []
for word in words:
if word not in stopwords and word != '\n':
filtered_words.append(word)
# 统计词频
word_count = Counter(filtered_words)
# 输出前20个出现次数最多的词
print(word_count.most_common(20))
```
这段代码会输出前20个出现次数最多的词及其出现次数。需要注意的是,需要先下载停用词表,将其保存为`stopwords.txt`文件。
阅读全文