首页统计一个文件mylove.txt中每个英文单词出现的次数,打印输出词频出现最高的10个单词,并保存到“词频统计txt”文件中

统计一个文件mylove.txt中每个英文单词出现的次数,打印输出词频出现最高的10个单词,并保存到“词频统计txt”文件中

时间: 2024-10-20 07:05:57 浏览: 38

首先，我们可以使用Python编写一个脚本来完成这个任务。这里我们将使用内置的`collections.Counter`来统计词频，并利用`pandas`库对数据进行排序和截取前10条记录。假设你的文件内容都是英文文本。 ```python import re from collections import Counter import pandas as pd # 读取文件内容 with open("mylove.txt", "r") as file: text = file.read().lower() # 全部转小写以便统一处理 # 使用正则表达式提取单词 words = re.findall(r'\b\w+\b', text) # 计算单词频率 word_counts = Counter(words) # 将Counter转换成DataFrame便于操作 df_word_counts = pd.DataFrame.from_dict(word_counts, orient='index').reset_index() df_word_counts.columns = ['Word', 'Frequency'] # 按照频率降序排列 sorted_df = df_word_counts.sort_values(by='Frequency', ascending=False) # 打印并保存前10个高频单词 top_10_words = sorted_df.head(10) print(top_10_words) # 保存到文件 with open("词频统计.txt", "w", newline='', encoding="utf-8") as output_file: top_10_words.to_csv(output_file, index=False, header=True) ``` 运行这段代码后，它会打印出词频最高的10个单词以及它们的频率，并将结果写入到"词频统计.txt"文件中。

阅读全文