在nlp_lab下,编写Python代码,从给定的文本文件中统计出最常出现的三个单词不使用jieba,单词长度大于等于5
时间: 2024-09-06 10:03:19 浏览: 25
要在`nlp_lab`环境中,不使用`jieba`库从文本文件中统计最常见的三个长度大于等于5的单词,你可以按照以下步骤操作:
1. 导入所需的模块[^1]:
```python
from collections import Counter
import re
```
2. 读取文本文件(假设文件名为`text_file.txt`):
```python
with open('text_file.txt', 'r', encoding='utf-8') as file:
text = file.read()
```
3. 清洗文本并保留长度大于等于5的英文单词:
```python
cleaned_text = re.findall(r'\b[a-zA-Z]{5,}\b', text)
```
这里使用正则表达式`\b[a-zA-Z]{5,}\b`匹配长度大于等于5的英文单词。
4. 使用`Counter`计算词频:
```python
word_counts = Counter(cleaned_text)
```
5. 获取最常见的三个单词:
```python
most_common_words = word_counts.most_common(3)
```
6. 打印结果:
```python
for word, freq in most_common_words:
print(f"{word}: {freq}")
```
完整的代码示例:
```python
# ... (导入语句)
with open('text_file.txt', 'r', encoding='utf-8') as file:
text = file.read()
cleaned_text = re.findall(r'\b[a-zA-Z]{5,}\b', text)
word_counts = Counter(cleaned_text)
most_common_words = word_counts.most_common(3)
for word, freq in most_common_words:
print(f"{word}: {freq}")
```