任务二：三国演义人名词云。设计一个程序，读出“Romance of the Three Kingdoms.txt”文件中的三国演义全文，将常见人名进行去重后生成词云，并列出词频最高的 10-20 个词，并形成词云（可以有不同的形状）。例：'玄德'，'刘备','玄德曰','刘皇叔','皇叔'都是同一个人

以下是实现该任务的Python代码： ```python import jieba from wordcloud import WordCloud from PIL import Image import numpy as np # 读取文本文件 with open('Romance of the Three Kingdoms.txt', 'r', encoding='utf-8') as f: text = f.read() # 使用jieba分词，获取人名 words = jieba.cut(text) names = [] for word in words: if len(word) == 2 and word[0] != word[1]: names.append(word) # 对人名进行去重 names = list(set(names)) # 统计人名出现的次数 name_count = {} for name in names: count = text.count(name) if count > 0: name_count[name] = count # 按照出现次数排序，选择前20个 sorted_name_count = sorted(name_count.items(), key=lambda item: item[1], reverse=True)[:20] # 输出前20个词和它们的频率 for name, count in sorted_name_count: print(name, count) # 生成词云 mask = np.array(Image.open('mask.png')) wc = WordCloud(font_path='msyh.ttc', background_color='white', mask=mask, max_words=2000) wc.generate_from_frequencies(name_count) wc.to_file('wordcloud.png') ``` 上述代码首先使用jieba分词工具对文本进行分词，获取人名。然后对人名进行去重，并统计每个人名出现的次数。接着按照出现次数排序，选择前20个人名并输出它们的频率。最后使用WordCloud库生成词云，其中使用了一个形状为mask.png的图片作为词云的形状。生成的词云保存在wordcloud.png文件中。需要注意的是，上述代码中使用了一个字体文件msyh.ttc，需要提前下载并放置到代码所在的目录下。同时，需要准备一个形状为mask.png的图片作为词云的形状，可以使用任意形状的图片。

相关推荐

python——三国演义词云.zip

Romance.of.the.Three.Kingdoms.XIV.torrent

cPPThe-romance-of-Three-Kingdoms.rar_游戏_Visual_C++_

设计一个程序，读出“Romance of the Three Kingdoms.txt”文件中的三 国演义全文，将常见人名进行去重后生成词云，并列出词频最高的 10-20 个词， 并形成词云，不用去重

将地址为"C:\Users\le779\Desktop\人工智能第二次作业\Three Kingdom Romance.txt"的txt文件导入python，

利用imdb电影数据文件，使用mapreduce算法用python实现，完成下列任务： 1）找出平均评分最高的十部电影id； 2）打印上题中的电影名称，并显示其类别标签；

Python中如何将字符串{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}转化为字典类型

R语言中plot画图设置字体为Times New Romance

给我生成1000条有name genre type episodes rating members description picture 几个字段的动漫CSV数据文件

用Scala的spark写一个项目，包含代码和数据

在scala中利用ratings.csv和movies.csv，编程实现RDD转Data f r a me

用数据分析python写一个图书管理系统

给我生成1000条有name genre type episodes rating members description picture 几个字段的动漫数据

csv文件全球电影票房可视化代码六种类型

designer-pro date-picker源码报错 Cannot read properties of undefined (reading 'focus')

最新推荐

zigbee-cluster-library-specification

管理建模和仿真的文件

MATLAB图像处理算法宝典：从理论到实战

matlab中1/x的非线性规划

JSBSim Reference Manual

"互动学习：行动中的多样性与论文攻读经历"

MATLAB数据可视化黑科技：洞悉数据背后的秘密

优化算法计算单位面积年平均输出热功率的代码

c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf

关系数据表示学习

设计一个程序，读出“Romance of the Three Kingdoms.txt”文件中的三国演义全文，将常见人名进行去重后生成词云，并列出词频最高的 10-20 个词，并形成词云，不用去重