首页可以帮我写一个python程序计算文本文件英语字母、英语单词、汉字、汉语词的熵

可以帮我写一个python程序计算文本文件英语字母、英语单词、汉字、汉语词的熵

时间: 2023-05-23 21:01:36 浏览: 139

当然可以，以下是一个简单的Python程序，可以计算文本文件中英语字母、英语单词、汉字、汉语词的熵： ``` import math import re # 打开文件 with open('filename.txt', 'r', encoding='utf-8') as f: content = f.read() # 英文字母熵 english_letters = re.findall('[a-zA-Z]', content) english_letter_count = len(english_letters) english_letter_freq = {x: english_letters.count(x) / english_letter_count for x in set(english_letters)} english_letter_entropy = sum([-p * math.log2(p) for p in english_letter_freq.values()]) # 英文单词熵 english_words = re.findall('[a-zA-Z]+', content) english_word_count = len(english_words) english_word_freq = {x: english_words.count(x) / english_word_count for x in set(english_words)} english_word_entropy = sum([-p * math.log2(p) for p in english_word_freq.values()]) # 汉字熵 chinese_chars = re.findall('[\u4e00-\u9fa5]', content) chinese_char_count = len(chinese_chars) chinese_char_freq = {x: chinese_chars.count(x) / chinese_char_count for x in set(chinese_chars)} chinese_char_entropy = sum([-p * math.log2(p) for p in chinese_char_freq.values()]) # 中文词熵 chinese_words = re.findall('[\u4e00-\u9fa5]+', content) chinese_word_count = len(chinese_words) chinese_word_freq = {x: chinese_words.count(x) / chinese_word_count for x in set(chinese_words)} chinese_word_entropy = sum([-p * math.log2(p) for p in chinese_word_freq.values()]) # 输出结果 print("英语字母熵：", english_letter_entropy) print("英语单词熵：", english_word_entropy) print("汉字熵：", chinese_char_entropy) print("汉字词熵：", chinese_word_entropy) ```

阅读全文