用python写一个小程序,任意在这个小程序中输入一段英文,统计出现频率最高的三组短语
时间: 2024-09-07 19:04:40 浏览: 38
在Python中,我们可以使用`collections`模块中的`Counter`类来统计英文单词的频率,并利用`nlTK`库对文本进行预处理,包括分词。下面是一个简单的示例程序:
```python
import re
from collections import Counter
# 导入nltk库并下载停用词列表
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
def get_top_phrases(text):
# 预处理步骤
stop_words = set(stopwords.words('english'))
text = text.lower()
words = word_tokenize(text)
filtered_words = [word for word in words if word.isalnum() and word not in stop_words]
# 统计词频
word_counts = Counter(filtered_words)
# 分割短语(这里简单地假设每个空格后的连续词是一组短语)
phrases = [' '.join(word) for word in filtered_words]
phrase_counts = Counter(phrases)
# 找到频率最高的三个短语
top_phrases = phrase_counts.most_common(3)
return top_phrases
if __name__ == "__main__":
input_text = input("请输入一段英文: ")
top_phrases_result = get_top_phrases(input_text)
print(f"出现频率最高的三个短语及频率分别是:")
for phrase, freq in top_phrases_result:
print(f"{phrase}: {freq} times")
阅读全文