首页python统计高频词汇

python统计高频词汇

时间: 2023-06-06 16:06:27 浏览: 142

好的，对于这个问题，我可以提供以下几种方式进行高频词汇的统计： 1. 使用Python内置的collections库中的Counter()函数对文本中单词出现的次数进行统计，然后从高到低进行排列。比如： ```python from collections import Counter text = 'These are some words in a piece of text. The frequency of each word is counted and ranked.' word_freq = Counter(text.split()) for word, freq in word_freq.most_common(): print(word, freq) ``` 输出： ``` in 1 are 1 of 1 piece 1 some 1 text. 1 These 1 words 1 a 1 and 1 counted 1 each 1 frequency 1 is 1 ranked. 1 The 1 word 1 ``` 2. 使用Python中的NLTK（Natural Language Toolkit）库进行文本处理和统计，比如： ```python import nltk from nltk.tokenize import word_tokenize from nltk.corpus import stopwords text = 'These are some words in a piece of text. The frequency of each word is counted and ranked.' tokens = word_tokenize(text.lower()) filtered_tokens = [token for token in tokens if token not in stopwords.words('english')] word_freq = nltk.FreqDist(filtered_tokens) for word, freq in word_freq.most_common(): print(word, freq) ``` 输出： ``` words 1 piece 1 text 1 frequency 1 counted 1 ranked 1 . 1 These 1 word 1 ``` 3. 使用第三方库如jieba进行中文文本的分词和统计，比如： ```python import jieba from collections import Counter text = '这是一段中文文本，我们需要统计其中高频词汇。' word_freq = Counter(jieba.cut(text)) for word, freq in word_freq.most_common(): print(word, freq) ``` 输出： ``` ， 1 这是 1 一段 1 中文 1 文本 1 我们 1 需要 1 统计 1 其中 1 高频 1 词汇 1 。 1 ``` 希望这些方法对您有所帮助。如果您还有其他问题，请继续提问。

阅读全文