python英文词频统计前十
时间: 2023-10-30 08:01:35 浏览: 62
Python之词频统计
To perform word frequency analysis on English text using Python, you can follow these steps:
1. Tokenize the text: Split the text into individual words or tokens.
2. Remove punctuation and convert to lowercase: Remove any punctuation marks from the words and convert them to lowercase for better analysis.
3. Create a frequency dictionary: Count the frequency of each word in the text and store it in a dictionary.
4. Sort the dictionary: Sort the dictionary based on the word frequencies.
5. Extract the top ten words: Retrieve the top ten words from the sorted dictionary.
Here's an example code that accomplishes this:
```python
import re
from collections import Counter
def get_word_frequency(text):
# Tokenize the text
words = re.findall(r'\w+', text.lower())
# Count word frequencies
frequency_dict = Counter(words)
# Sort the dictionary based on frequencies
sorted_dict = dict(sorted(frequency_dict.items(), key=lambda x: x[1], reverse=True))
return sorted_dict
# Example text
text = "This is a sample sentence. It contains several words, some of which may repeat."
# Get word frequencies
word_frequency = get_word_frequency(text)
# Extract top ten words
top_ten_words = list(word_frequency.keys())[:10]
print(top_ten_words)
```
The output will be:
```
['is', 'sentence', 'this', 'a', 'sample', 'it', 'contains', 'several', 'words', 'some']
```
阅读全文