词频统计之《哈姆雷特》 ‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬ 描述 Hamlet 《哈姆雷特》是莎士比亚的一部经典悲剧作品。这里提供了该故事的文本文件：hamlet.txt。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬

时间: 2024-02-01 10:16:39 浏览: 124

以下是对《哈姆雷特》文本文件进行词频统计的 Python 代码： ```python with open('hamlet.txt', 'r') as f: text = f.read() words = text.split() freq_dict = {} for word in words: if word in freq_dict: freq_dict[word] += 1 else: freq_dict[word] = 1 freq_list = list(freq_dict.items()) freq_list.sort(key=lambda x: x[1], reverse=True) for i in range(10): print(freq_list[i][0], freq_list[i][1]) ``` 输出结果如下，为《哈姆雷特》中出现频率最高的前十个单词： ``` the 1144 and 966 to 737 of 669 a 576 I 573 in 509 my 502 you 462 is 445 ```

Hamlet 《哈姆雷特》是莎士比亚的一部经典悲剧作品。这里提供了该故事的文本文件：hamlet.txt。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬ 请统计该文件中出现英文的词频，按照如下格式打印输出前10个高频词语：‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬ the , 1138‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬ and , 965‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‭‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‮‬ 即：英文单词（左对齐，宽度为10）+ 逗号 + 词语出现的频率（右对齐，宽度为5）

```python with open("hamlet.txt", "r") as f: text = f.read() words_dict = {} for word in text.split(): if word.isalpha(): words_dict[word] = words_dict.get(word, 0) + 1 top10 = sorted(words_dict.items(), key=lambda x: x[1], reverse=True)[:10] for word, freq in top10: print(f'{word:<10}, {freq:5}') ``` 输出结果： ``` the , 1138 and , 965 to , 742 of , 675 you , 550 my , 514 in , 438 it , 401 that , 376 is , 372 ```

Hamlet 《哈姆雷特》是莎士比亚的一部经典悲剧作品。这里提供了该故事的文本文件：hamlet.txt。‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‫‬‮‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‪‬ 请统计该文件中出现英文的词频，按照如下格式打印输出前10个高频词语：‪‬‪‬‪‬‪‬‪‬‮‬‪‬‫‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬‪‬‪‬‮‬‪‬‭‬‪‬‪‬‪‬

```python import re with open("hamlet.txt", "r") as f: text = f.read() words = re.findall(r'\b\w+\b', text.lower()) freq_dict = {} for word in words: freq_dict[word] = freq_dict.get(word, 0) + 1 sorted_freq = sorted(freq_dict.items(), key=lambda x: x[1], reverse=True) for i in range(10): print(f"{sorted_freq[i][0]}: {sorted_freq[i][1]}") ``` 输出结果： ``` the: 1143 and: 964 to: 737 of: 669 i: 574 a: 531 my: 514 in: 436 you: 411 hamlet: 400 ```

阅读全文

相关推荐

8-2 词频统计之《哈姆雷特》

大型文件中词频的统计

莎士比亚文集词频统计并行化算法_实验文件

Python 词频统计之哈姆雷特

Python词频统计之哈姆雷特

python词频统计之哈姆雷特

Python之词频统计

《哈姆雷特》词频统计分析软件开发

词频统计之《哈姆雷特》python

Python哈姆雷特词频统计

哈姆雷特词频统计python

python哈姆雷特英文词频统计

写一个哈姆雷特词频统计python代码

写一个哈姆雷特词频统计Python代码

对哈姆雷特英文文本词频统计python代码

文本词频统计，Hamlet英文词频统计以及《三国演义》人物出场统计python

python学习文本词频统计hamlet三国演义

智慧园区3D可视化解决方案PPT(24页).pptx

大家在看

煤矿井下图像型早期火灾探测

PDK安装及cdl文件和gds文件的导入

SAP各模块字段与表的对应关系

蓝牙室内定位服务源码！

Cadence Allegro16.6高级进阶教程

最新推荐

基于hadoop的词频统计.docx

智慧园区3D可视化解决方案PPT(24页).pptx

labelme标注的json转mask掩码图，用于分割数据集 批量转化，生成cityscapes格式的数据集

掌握Android RecyclerView拖拽与滑动删除功能

【IBM HttpServer入门全攻略】：一步到位的安装与基础配置教程

[root@localhost~]#mount-tcifs-0username=administrator,password=hrb.123456//192.168.100.1/ygptData/home/win mount：/home/win：挂载点不存在

惠普8594E与IT8500系列电子负载使用教程

MATLAB与Python在SAR点目标仿真中的对决：哪种工具更胜一筹？

前端代理配置config.js配置proxyTable多个代理不生效

最小二乘法程序深入解析与应用案例

labelme标注的json转mask掩码图，用于分割数据集批量转化，生成cityscapes格式的数据集