首页利用python统计红楼梦人物出场次数，并排出前20名，同时绘制词云图

利用python统计红楼梦人物出场次数，并排出前20名，同时绘制词云图

时间: 2023-08-18 14:55:11 浏览: 331

可以利用Python中的文本处理库NLTK和词云库WordCloud来完成这个任务。首先，你需要先安装NLTK和WordCloud库。你可以使用以下命令来安装它们： ``` pip install nltk pip install wordcloud ``` 接下来，你需要下载红楼梦的文本数据。你可以在网上找到红楼梦的文本文件，将其保存为一个文本文件（比如，red_chamber.txt）。下面是一个示例代码，用于统计红楼梦人物出场次数并绘制词云图： ```python import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize from wordcloud import WordCloud import matplotlib.pyplot as plt # 读取文本文件 with open('red_chamber.txt', 'r', encoding='utf-8') as file: text = file.read() # 设置停用词 stop_words = set(stopwords.words('chinese')) # 分词 tokens = word_tokenize(text) # 去除停用词和标点符号 filtered_tokens = [word for word in tokens if word.isalpha() and word not in stop_words] # 统计人物出场次数 character_counts = nltk.FreqDist(filtered_tokens) # 获取前20名人物 top_characters = character_counts.most_common(20) # 输出前20名人物及其出场次数 for character, count in top_characters: print(character, count) # 绘制词云图 wordcloud = WordCloud(width=800, height=400, background_color='white').generate_from_frequencies(character_counts) plt.figure(figsize=(10, 5)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis('off') plt.show() ``` 以上代码会输出红楼梦中前20名人物及其出场次数，并绘制一个词云图显示人物出场频率。你可以根据需要调整词云图的大小、颜色等参数。

阅读全文