python统计高频词汇
时间: 2023-06-06 16:06:27 浏览: 142
好的,对于这个问题,我可以提供以下几种方式进行高频词汇的统计:
1. 使用Python内置的collections库中的Counter()函数对文本中单词出现的次数进行统计,然后从高到低进行排列。比如:
```python
from collections import Counter
text = 'These are some words in a piece of text. The frequency of each word is counted and ranked.'
word_freq = Counter(text.split())
for word, freq in word_freq.most_common():
print(word, freq)
```
输出:
```
in 1
are 1
of 1
piece 1
some 1
text. 1
These 1
words 1
a 1
and 1
counted 1
each 1
frequency 1
is 1
ranked. 1
The 1
word 1
```
2. 使用Python中的NLTK(Natural Language Toolkit)库进行文本处理和统计,比如:
```python
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
text = 'These are some words in a piece of text. The frequency of each word is counted and ranked.'
tokens = word_tokenize(text.lower())
filtered_tokens = [token for token in tokens if token not in stopwords.words('english')]
word_freq = nltk.FreqDist(filtered_tokens)
for word, freq in word_freq.most_common():
print(word, freq)
```
输出:
```
words 1
piece 1
text 1
frequency 1
counted 1
ranked 1
. 1
These 1
word 1
```
3. 使用第三方库如jieba进行中文文本的分词和统计,比如:
```python
import jieba
from collections import Counter
text = '这是一段中文文本,我们需要统计其中高频词汇。'
word_freq = Counter(jieba.cut(text))
for word, freq in word_freq.most_common():
print(word, freq)
```
输出:
```
, 1
这是 1
一段 1
中文 1
文本 1
我们 1
需要 1
统计 1
其中 1
高频 1
词汇 1
。 1
```
希望这些方法对您有所帮助。如果您还有其他问题,请继续提问。
阅读全文