首页使用jieba库进行中文分词和词频统计的python代码

使用jieba库进行中文分词和词频统计的python代码

时间: 2024-02-06 21:01:59 浏览: 129

以下是使用jieba库进行中文分词和词频统计的Python代码： ```python import jieba import collections # 读取文件 with open('example.txt', 'r', encoding='utf-8') as f: text = f.read() # 分词 words = jieba.cut(text) # 统计词频 word_counts = collections.Counter(words) # 输出词频前10的词语 for word, count in word_counts.most_common(10): print(word, count) ``` 上述代码中，首先使用`jieba.cut`方法将文本进行分词，然后使用`collections.Counter`方法统计各个词语出现的次数。最后，使用`most_common`方法输出出现频率最高的前10个词语及其出现次数。需要注意的是，该代码需要先安装jieba库，可以使用以下命令进行安装： ```python pip install jieba ``` 同时，需要将代码中的`example.txt`替换为实际的文本文件路径。

阅读全文