首页Python文件文本词频统计代码

Python文件文本词频统计代码

时间: 2023-11-19 18:56:39 浏览: 95

以下是Python实现文本词频统计的代码： ```python import re # 定义函数，传入文件名和要统计的单词个数 def word_count(file_name, num): # 打开文件 with open(file_name, 'r') as f: # 读取文件内容 content = f.read() # 将单词转换为小写，并去掉标点符号 words = re.findall(r'\b[a-z]+\b', content.lower()) # 统计单词出现的次数 word_dict = {} for word in words: if word not in word_dict: word_dict[word] = 1 else: word_dict[word] += 1 # 排序并输出结果 sorted_word_dict = sorted(word_dict.items(), key=lambda x: x[1], reverse=True) for i in range(num): print(sorted_word_dict[i][0], sorted_word_dict[i][1]) # 调用函数进行测试 word_count('test.txt', 10) ``` 代码解释： 1. `re.findall(r'\b[a-z]+\b', content.lower())` 用正则表达式匹配文件内容中的单词，`\b` 表示单词的边界，`[a-z]+` 表示一个或多个小写字母。 2. `word_dict` 用字典来保存单词出现的次数，如果单词不存在则添加到字典中，否则将出现次数加一。 3. `sorted_word_dict` 将字典按照值进行排序，`key=lambda x: x[1]` 表示按照值进行排序，`reverse=True` 表示降序排列。 4. `for i in range(num)` 只输出前面 num 个单词和出现次数。

阅读全文