可以帮我写一个python程序计算文本文件英语字母、英语单词、汉字、汉语词的熵
时间: 2023-05-23 21:01:36 浏览: 139
当然可以,以下是一个简单的Python程序,可以计算文本文件中英语字母、英语单词、汉字、汉语词的熵:
```
import math
import re
# 打开文件
with open('filename.txt', 'r', encoding='utf-8') as f:
content = f.read()
# 英文字母熵
english_letters = re.findall('[a-zA-Z]', content)
english_letter_count = len(english_letters)
english_letter_freq = {x: english_letters.count(x) / english_letter_count for x in set(english_letters)}
english_letter_entropy = sum([-p * math.log2(p) for p in english_letter_freq.values()])
# 英文单词熵
english_words = re.findall('[a-zA-Z]+', content)
english_word_count = len(english_words)
english_word_freq = {x: english_words.count(x) / english_word_count for x in set(english_words)}
english_word_entropy = sum([-p * math.log2(p) for p in english_word_freq.values()])
# 汉字熵
chinese_chars = re.findall('[\u4e00-\u9fa5]', content)
chinese_char_count = len(chinese_chars)
chinese_char_freq = {x: chinese_chars.count(x) / chinese_char_count for x in set(chinese_chars)}
chinese_char_entropy = sum([-p * math.log2(p) for p in chinese_char_freq.values()])
# 中文词熵
chinese_words = re.findall('[\u4e00-\u9fa5]+', content)
chinese_word_count = len(chinese_words)
chinese_word_freq = {x: chinese_words.count(x) / chinese_word_count for x in set(chinese_words)}
chinese_word_entropy = sum([-p * math.log2(p) for p in chinese_word_freq.values()])
# 输出结果
print("英语字母熵:", english_letter_entropy)
print("英语单词熵:", english_word_entropy)
print("汉字熵:", chinese_char_entropy)
print("汉字词熵:", chinese_word_entropy)
```
阅读全文