首页选择《三国演义》，要求统计总字数（排除标点符号），并统计出使用的常用汉字的个数以及使用频次最高的20个汉字。

选择《三国演义》，要求统计总字数（排除标点符号），并统计出使用的常用汉字的个数以及使用频次最高的20个汉字。

时间: 2024-02-27 21:51:27 浏览: 183

好的，我可以帮你实现这个功能。以下是代码实现： ```python import jieba import re from collections import Counter # 读取文本文件 with open('三国演义.txt', 'r', encoding='utf-8') as f: text = f.read() # 去除标点符号 text = re.sub(r'[^\u4e00-\u9fa5]', '', text) # 统计总字数 total_words = len(text) # 使用jieba分词，获取词频 words = list(jieba.cut(text)) word_counts = Counter(words) # 统计常用汉字个数 common_words = [word[0] for word in word_counts.most_common(500)] common_words_count = len(common_words) # 统计使用频次最高的20个汉字 top_20_words = word_counts.most_common(20) print("总字数:", total_words) print("常用汉字个数:", common_words_count) print("使用频次最高的20个汉字:") for word, count in top_20_words: print(word, count) ``` 代码首先读取本地的《三国演义》文本文件，然后使用正则表达式去除标点符号，统计总字数。接着使用`jieba`分词，获取词频，并从中筛选出常用汉字，统计常用汉字个数。最后，使用`Counter`库统计使用频次最高的20个汉字，并输出结果。注意，以上代码中需要将`jieba`库安装后才能使用，可以使用以下命令进行安装： ```python !pip install jieba ``` 另外，需要将《三国演义》文本文件放在代码所在的同一目录下，并命名为“三国演义.txt”。

阅读全文