首页统计英文短句'The city was covered by the green trees I like the green city'中各单词的词频

统计英文短句'The city was covered by the green trees I like the green city'中各单词的词频

时间: 2024-10-19 11:12:44 浏览: 44

首先，我们需要对给定的英文短句进行分词，并移除标点符号。然后我们可以使用Python的`collections`库中的`Counter`类来计算每个单词的频率。以下是具体的步骤： 1. 安装`nltk`库（如果还没安装），用于英文分词。如果你还没有安装，可以在命令行中运行 `pip install nltk`，然后导入`punkt`模块进行分词： ```python import nltk from nltk.tokenize import word_tokenize from collections import Counter # 下面这一步是下载nltk的数据包，首次运行可能会有提示 nltk.download('punkt') ``` 2. 分词并移除标点： ```python sentence = "The city was covered by the green trees I like the green city" tokens = word_tokenize(sentence.lower()) # 转为小写并进行分词 tokens = [token for token in tokens if token.isalpha()] # 仅保留字母字符，忽略标点 ``` 3. 计算词频： ```python word_counts = Counter(tokens) ``` 4. 打印词频： ```python for word, frequency in word_counts.items(): print(f"'{word}': {frequency}") ``` 在这个例子中，你会得到如下的词频统计数据： ``` 'the': 2 'city': 2 'green': 2 'trees': 1 'i': 1 like': 1 ```

阅读全文