首页不下载nltk库，进行英文词频统计并输出排行前十词汇

不下载nltk库，进行英文词频统计并输出排行前十词汇

时间: 2024-11-27 18:23:31 浏览: 18

如果你不想下载NLTK库，可以使用Python内置的`collections.Counter`类结合`re`模块来进行简单的英文文本词频统计。以下是一个示例： ```python import re from collections import Counter def count_words(text): # 使用正则表达式去除标点符号，并将所有单词转为小写 words = re.findall(r'\b\w+\b', text.lower()) # 使用Counter计算词频 word_counts = Counter(words) # 获取词频最高的前十个词汇 top_10_words = word_counts.most_common(10) return top_10_words # 示例文本 text = "This is a sample sentence for word frequency counting. It will demonstrate how to do it without nltk." # 统计并输出结果 top_10_words = count_words(text) for word, freq in top_10_words: print(f"{word}: {freq}") ``` 在这个例子中，我们先通过正则表达式提取文本中的单词，并转换为小写，然后使用`Counter`计算每个单词出现的次数。最后，`most_common(10)`返回出现频率最高的十个词汇及其对应的频率。

阅读全文