Python英文词频统计
时间: 2024-06-14 10:02:18 浏览: 180
Python中进行英文词频统计通常使用`collections`模块中的`Counter`类,这是一个非常方便的数据结构,用于计数可哈希对象(如字符串)在列表或迭代器中的出现次数。以下是一个简单的步骤来统计英文文本中单词的频率:
1. 首先,你需要将文本分割成单词。在Python中,你可以使用`split()`函数,默认以空格为分隔符,或者使用`nltk`库的`word_tokenize`方法。
```python
from collections import Counter
import nltk
nltk.download('punkt') # 如果你还没有下载nltk的punkt数据
text = "This is a sample text for word frequency analysis."
words = nltk.word_tokenize(text.lower()) # 将所有单词转换为小写
```
2. 接着,使用`Counter`类计算词频:
```python
word_counts = Counter(words)
```
3. 最后,你可以遍历`word_counts`获取每个单词及其对应的频率:
```python
for word, count in word_counts.most_common(): # 或者 word_counts.items()
print(f"{word}: {count}")
```
相关问题
python英文词频统计
Sure, I can help you with that! To perform word frequency analysis in Python, you can use the Counter object from the collections module. Here's a sample code:
```python
from collections import Counter
# read in your text file (substitute 'filename.txt' with your own file)
with open('filename.txt', 'r') as f:
text = f.read()
# remove any unwanted characters or symbols
text = text.lower().replace('\n', ' ').replace('\r', ' ')
words = text.split(' ')
# count the frequency of each word
freq = Counter(words)
# display the most common words and their frequencies
for word, count in freq.most_common(10):
print(f'{word}: {count}')
```
This code reads in a text file, converts all words to lowercase, and removes new line and carriage return characters. It then splits the text into individual words, and uses the Counter object to count the frequency of each word. Finally, it displays the top 10 most common words and their frequencies. I hope this helps!
Python英文词频统计代码
以下是Python实现统计英文文章词频的方法,按照词频从高到低输出前十位的代码实现:
引用:
```python
def words_list():
with open(r'file_path', 'r') as f: # 将file_path替换成文章路径
words = f.read().split()
return words
def word_dic(words):
word_dict = {}
for word in words:
if word_dict.get(word): # 如果字典中已经有该单词,次数加1
word_dict[word] += 1
else:
word_dict[word] = 1 # 如果没有出现过,这个单词的计数为1
return word_dict
def word_fre(word_dict):
fre_dict = {}
for key, value in word_dict.items():
if value in fre_dict:
fre_dict[value].append(key) # 如果该单词出现次数已存在,将这个单词添加到次数的数组中
else:
fre_dict[value] = [key] # 如果该单词出现次数不存在,创建次数为key的数组,并将该单词添加进去
return fre_dict
def word_sort(fre_dict):
fre_list = list(fre_dict.keys())
fre_list.sort(reverse=True)
word_sort_list = []
for fre in fre_list:
words = fre_dict[fre]
if len(words) == 1:
word_sort_list.append(words + ':' + str(fre))
else:
for word in words:
word_sort_list.append(word + ':' + str(fre))
return word_sort_list
print(word_sort(word_fre(word_dic(words_list())))[-10:]) # 输出频率前十的单词
```
引用:
```python
import collections
def words_list():
with open(r'file_path', 'r') as f: # 将file_path替换成文章路径
words = f.read().split()
return words
words = words_list()
w = collections.Counter(words)
for word,cnt in w.most_common(10):
print(word + ':' + str(cnt))
```
阅读全文