hamlet词频统计基本版及改进版两个程序
时间: 2024-05-12 10:14:15 浏览: 80
python学习文本词频统计hamlet三国演义
基本版程序:
```
# 打开文件
with open('hamlet.txt', 'r') as f:
text = f.read()
# 将所有单词转换为小写,去除标点符号和数字
words = text.lower().replace('\n', ' ').replace('\r', '').split()
for i in range(len(words)):
words[i] = ''.join(e for e in words[i] if e.isalpha())
# 统计词频
word_count = {}
for word in words:
if word in word_count:
word_count[word] += 1
else:
word_count[word] = 1
# 输出结果
for word, count in word_count.items():
print(f"{word}: {count}")
```
改进版程序:
```
import re
from collections import Counter
# 打开文件
with open('hamlet.txt', 'r') as f:
text = f.read()
# 将所有单词转换为小写,并使用正则表达式去除标点符号和数字
words = re.findall(r'\b[a-zA-Z]+\b', text.lower())
# 统计词频
word_count = Counter(words)
# 输出结果
for word, count in word_count.items():
print(f"{word}: {count}")
```
改进版程序使用了正则表达式来去除标点符号和数字,并且使用了Python内置的`Counter`类来进行词频统计,代码更加简洁。
阅读全文