要求只使用jieba库实现中文词频统计与文件读写:对test.txt进行中文文本的词频统计,统计文件中出现词频最多的且词语长度不小于3个字符的前10个词语,将词语及其出现的词频数按照词频数递减排序后输出到result.txt文档中。代码要求简单易懂
时间: 2024-10-10 08:08:59 浏览: 48
python jieba分词并统计词频后输出结果到Excel和txt文档方法
可以使用Python的jieba库来实现这个任务。下面是一个简单的示例代码:
```python
import jieba.analyse
from collections import Counter
# 分词并去除单字词
def tokenize_and_filter(text):
words = [word for word in jieba.cut(text) if len(word) >= 3]
return words
# 对指定文件进行分词统计
def word_frequency(file_path):
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
words = tokenize_and_filter(text)
counter = Counter(words)
# 获取词频最高的前10个词
top_10_words = counter.most_common(10)
return top_10_words
# 输出结果到result.txt
def write_to_file(top_10_words, output_path):
with open(output_path, 'w', encoding='utf-8') as f:
for word, freq in top_10_words:
f.write(f'{word}: {freq}\n')
# 示例文件路径和输出路径
input_file = 'test.txt'
output_file = 'result.txt'
top_10 = word_frequency(input_file)
write_to_file(top_10, output_file)
阅读全文