首页统计xiyouji5.txt共有多少字，获得该文本的频率分布（提示，可使用FreqDist()函数），基于该分布，绘制前1000个高频词汇的频率分布图。我需要最后的完整代码。要求图不准出现乱码 D:\xiyouji5.txt

统计xiyouji5.txt共有多少字，获得该文本的频率分布（提示，可使用FreqDist()函数），基于该分布，绘制前1000个高频词汇的频率分布图。我需要最后的完整代码。要求图不准出现乱码 D:\xiyouji5.txt

时间: 2024-10-16 10:15:55 浏览: 19

为了统计 "xiyouji5.txt" 文件的总字数，并获取其字符频率分布，然后绘制前1000个高频词汇的频率分布图，你可以使用 Python 中的 `collections` 模块的 `Counter` 函数和 `matplotlib` 库。以下是完整的代码示例： ```python import os import re from collections import Counter import matplotlib.pyplot as plt # 定义文件路径 filename = "D:\\xiyouji5.txt" # 打开并读取文本 with open(filename, 'r', encoding='utf-8') as file: text_content = file.read().replace('\n', '') # 移除换行符，避免影响字符计数 # 统计字符频率 char_freq = Counter(text_content) # 分析前1000个高频字符 most_common_chars = char_freq.most_common(1000) # 绘制频率分布图 plt.figure(figsize=(10, 5)) plt.title('前1000个高频字符频率分布') plt.xlabel('字符') plt.ylabel('频率') for char, freq in most_common_chars: plt.bar(char, freq) plt.xticks(rotation=90) # 旋转 x 轴标签以避免过于拥挤 plt.tight_layout() plt.savefig('character_frequency_distribution.png', encoding='utf-8') # 确保图片编码为 utf-8 避免乱码 # 输出总字数和最常见字符信息 total_chars = sum(char_freq.values()) print(f"文件 '{filename}' 总共包含 {total_chars} 个字符.") print(f"前1000个最常见字符的频率分布图已保存为 'character_frequency_distribution.png'.") print("最常见字符列表：", most_common_chars) #

阅读全文