首页使用Python中的Counter类进行词频统计

使用Python中的Counter类进行词频统计

时间: 2024-02-18 13:00:05 浏览: 27

好的，下面我来给你讲一下如何使用Python中的Counter类进行词频统计。首先，我们需要将文本进行分词，可以使用Python中的nltk或者jieba等分词工具。以nltk为例，我们可以使用以下代码将文本进行分词： ```python import nltk text = "This is a sample text for tokenization. It contains multiple sentences." tokens = nltk.word_tokenize(text) ``` 接下来，我们可以使用Python中的Counter类来进行词频统计。以下是使用Counter类进行词频统计的示例代码： ```python from collections import Counter # 统计词频 word_counts = Counter(tokens) # 打印出现频率最高的前10个词 top_10 = word_counts.most_common(10) print(top_10) ``` 运行结果如下： ``` [('.', 2), ('This', 1), ('is', 1), ('a', 1), ('sample', 1), ('text', 1), ('for', 1), ('tokenization', 1), ('It', 1), ('contains', 1)] ``` 可以看到，上述代码使用Counter类统计了文本中每个词出现的次数，并打印出现频率最高的前10个词。希望这个例子可以帮助你理解如何使用Python中的Counter类进行词频统计。