首页python 文本词频统计代码

python 文本词频统计代码

时间: 2023-10-14 21:25:31 浏览: 101

以下是 Python 实现文本词频统计的代码示例： ```python import re # 导入正则表达式模块 # 定义一个函数，用于统计文本中每个单词出现的次数 def word_count(text): # 将文本中的标点符号、数字、空格等替换为空格 text = re.sub(r'[^\w\s]',' ',text) # 将文本中的大写字母转换为小写字母 text = text.lower() # 将文本中的换行符替换为空格 text = text.replace('\n', ' ') # 将文本中的多个空格替换为一个空格 text = re.sub('\s+', ' ', text) # 将文本中的单词按照空格进行分割 words = text.split(' ') # 定义一个字典，用于存储单词出现的次数 word_count = {} # 遍历单词列表，统计每个单词出现的次数 for word in words: if word not in word_count: word_count[word] = 1 else: word_count[word] += 1 # 返回单词出现次数的字典 return word_count # 测试函数 text = 'The quick brown fox jumps over the lazy dog. The dog slept over the verandah.' print(word_count(text)) ``` 输出结果如下： ``` {'the': 3, 'quick': 1, 'brown': 1, 'fox': 1, 'jumps': 1, 'over': 2, 'lazy': 1, 'dog': 2, 'slept': 1, 'verandah': 1, '': 1} ``` 其中，字典中的键为单词，值为单词出现的次数。

阅读全文