首页python词频统计之哈姆雷特

python词频统计之哈姆雷特

时间: 2023-07-03 12:31:29 浏览: 241

好的，以下是一个简单的Python程序，用于统计哈姆雷特中的词频。 ```python import string from collections import Counter # 读取文本文件 with open('hamlet.txt', 'r', encoding='utf-8') as f: text = f.read() # 去除标点符号和停用词 stopwords = ['the', 'and', 'to', 'of', 'that', 'in', 'you', 'my', 'is', 'it', 'not', 'with', 'this', 'but', 'for', 'your', 'be', 'as', 'have', 'him', 'so', 'will', 'on', 'me'] text = text.translate(str.maketrans('', '', string.punctuation)) text = ' '.join([word.lower() for word in text.split() if word.lower() not in stopwords]) # 统计词频 word_counts = Counter(text.split()) # 输出结果 print(word_counts.most_common(10)) ``` 这个程序首先使用`with open()`函数打开文件`hamlet.txt`，然后读取整个文本。接着，我们使用`translate()`函数去除标点符号，使用列表推导式去除停用词，并将所有单词转换为小写形式。最后，我们使用`Counter()`函数统计每个单词的出现次数，并使用`most_common()`方法输出出现次数最多的前10个单词。请注意，这只是一个简单的例子，实际上，我们可以使用更复杂的技术来进行词频统计，例如NLTK和spaCy等自然语言处理工具。

阅读全文