建立一个新的工程,根据所给的一段文本,构建单词表和出现次数的字典。 输出频率最高的10个单词,去除冠词、代词和介词等无意义的单词。 可以定义多个函数分别实现独立的功能。在主模块中串联各个函数,实现总体上的功能。
时间: 2024-09-12 22:04:35 浏览: 41
要实现这个任务,你可以按照以下步骤构建单词表和词频字典:
1. **预处理文本**[^1]:
```python
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
def preprocess_text(text):
# 分词
words = word_tokenize(text)
# 去除停用词(如冠词、代词和介词)
stop_words = set(stopwords.words('english'))
filtered_words = [word.lower() for word in words if word not in stop_words and word.isalpha()]
return filtered_words
```
2. **创建单词表**:
```python
def create_word_list(filtered_words):
unique_words = list(set(filtered_words))
return unique_words
```
3. **统计词频**:
```python
def count_word_frequency(words):
word_freq = {}
for word in words:
if word in word_freq:
word_freq[word] += 1
else:
word_freq[word] = 1
return word_freq
```
4. **找出最常见的单词**:
```python
def top_n_words(word_freq, n=10):
sorted_freq = sorted(word_freq.items(), key=lambda x: x[1], reverse=True)
return sorted_freq[:n]
```
5. **整合成主程序**:
```python
def main(text):
preprocessed = preprocess_text(text)
word_list = create_word_list(preprocessed)
word_freq = count_word_frequency(word_list)
top_10_words = top_n_words(word_freq)
return top_10_words
```
现在你可以调用`main`函数并传入一段文本来得到结果。例如:
```python
text = "Your text goes here."
top_10 = main(text)
for word, freq in top_10:
print(f"{word}: {freq}")
```
阅读全文