词频统计python
时间: 2023-11-07 21:04:14 浏览: 146
词频统计是指统计一段文本中各个单词出现的频率。在Python中,可以使用不同的方法进行词频统计。以下是几种常见的方法:
1. 原始字典自写代码统计:
```python
wordcount = {}
for word in all_words:
wordcount[word] = wordcount.get(word, 0) + 1
sorted_wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)
```
2. 使用第三方库jieba进行中文词频统计:
```python
import jieba
from collections import Counter
wordcount = Counter()
for word in jieba.cut(text):
if len(word) > 1 and word not in stop_words:
wordcount[word] += 1
sorted_wordcount = wordcount.most_common(10)
```
3. 使用原生API进行英文词频统计:
```python
speech = speech_text.lower().split()
wordcount = {}
for word in speech:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
sorted_wordcount = sorted(wordcount.items(), key=lambda x: x[1], reverse=True)[:10]
```
阅读全文