统计英文短句'The city was covered by the green trees I like the green city'中各单词的词频
时间: 2024-10-19 10:12:44 浏览: 29
首先,我们需要对给定的英文短句进行分词,并移除标点符号。然后我们可以使用Python的`collections`库中的`Counter`类来计算每个单词的频率。以下是具体的步骤:
1. 安装`nltk`库(如果还没安装),用于英文分词。如果你还没有安装,可以在命令行中运行 `pip install nltk`,然后导入`punkt`模块进行分词:
```python
import nltk
from nltk.tokenize import word_tokenize
from collections import Counter
# 下面这一步是下载nltk的数据包,首次运行可能会有提示
nltk.download('punkt')
```
2. 分词并移除标点:
```python
sentence = "The city was covered by the green trees I like the green city"
tokens = word_tokenize(sentence.lower()) # 转为小写并进行分词
tokens = [token for token in tokens if token.isalpha()] # 仅保留字母字符,忽略标点
```
3. 计算词频:
```python
word_counts = Counter(tokens)
```
4. 打印词频:
```python
for word, frequency in word_counts.items():
print(f"'{word}': {frequency}")
```
在这个例子中,你会得到如下的词频统计数据:
```
'the': 2
'city': 2
'green': 2
'trees': 1
'i': 1
like': 1
```
阅读全文