import jieba file=open(r"C:\Users\yikandan\Desktop\红楼梦.txt","r",encoding='utf-8') txt=file.read() wordsList=jieba.lcut(txt) actors=[('贾宝玉',"宝玉"),("林黛玉","黛玉"),("薛宝钗","宝钗"),("王熙凤","凤姐"),("贾母","老太太"),("袭人",),("探春",),('贾琏',),('王夫人','夫人')] dictActors={} for actor in actors: if len(actor)==2: count1=wordsList.count(actor[0]) count2=wordsList.count(actor[1]) dictActors[actor[0]]=count1+count2 else: count1=wordsList.count(actor[0]) dictActors[actor[0]]=count1+count2 items=list(dictActors.items()) items.sort(key=lambda x:x[1],reverse=True) for i in range(len(items)): word,count=items[i] print("{}\t{}".format(word,count))给我逐行解释这段代码

时间: 2023-11-02 16:06:23 浏览: 79

这段代码的作用是读取《红楼梦》这本小说，并统计其中出现的人物名字及其出现的次数。第一行导入了一个分词库jieba。第二行打开了一个文件，读取了《红楼梦》这本小说的文本内容，并赋值给变量txt。第三行使用jieba对txt进行分词，将分词结果存储在列表wordsList中。第四行定义了一个列表actors，其中包含了小说中出现的人物名字。每个元素都是一个元组，如果一个人物有多个名字，那么这个元组就包含多个元素。第五行定义了一个空字典dictActors，用于统计每个人物名字在小说中出现的次数。第六行开始遍历actors中的元素。如果元素是一个包含两个人物名字的元组，那么分别统计这两个人物名字在小说文本中出现的次数，并将这两个次数相加得到这个人物总的出现次数。然后将人物名字作为键，出现次数作为值存入字典dictActors中。如果元素只包含一个人物名字，那么直接统计这个人物名字在小说文本中出现的次数，并将人物名字作为键，出现次数作为值存入字典dictActors中。第十行将字典dictActors中的所有键值对以列表的形式取出来，赋值给变量items。第十一行对items进行排序，排序的方式是按照每个键所对应的值（出现次数）进行降序排序。第十三行开始遍历排序后的列表items，将每个人物名字及其出现次数以一定的格式输出到屏幕上。

import jieba f=open('荷塘月色.txt') article_text=f.read() f.close() article=jieba.lcut(article_text) dic={} for word in article: if word not in dic: dic[word]=1 else: dic[word]+=1 swd=sorted(list(dic.items()),key=lambda lst:lst[1],reverse=True) f1=open('filename.txt', encoding='utf-8') text = f.read() stop_wds=f1.read() f1.close() for kword,times in swd: if kword not in stop_wds: print(kword,times) 哪里有问题进行改正并写出

在第13行，应该将`f`改为`f1`，即： ``` stop_wds=f1.read() ``` 此外，第12行应该加上关闭文件的操作，即： ``` f1.close() ``` 改正后的代码如下： ``` import jieba f = open('荷塘月色.txt') article_text = f.read() f.close() article = jieba.lcut(article_text) dic = {} for word in article: if word not in dic: dic[word] = 1 else: dic[word] += 1 swd = sorted(list(dic.items()), key=lambda lst:lst[1], reverse=True) f1 = open('filename.txt', encoding='utf-8') text = f1.read() stop_wds = text.split('\n') f1.close() for kword, times in swd: if kword not in stop_wds: print(kword, times) ```

import jieba import wordcloud import imageio mask = imageio.imread('C:/Users/Febird/Desktop/1.png') #设定一个词云背景 with open('C:/Users/Febird/Desktop/傲世.txt', encoding='utf-8') as f: t = f.read() # 打开需要制作词云图的文件 ls = jieba.lcut(t) txt = " ".join(ls) # 将文章中的词组提出来 stopwords_file = open('C:/Users/Febird/Desktop/stopWord.txt', 'r', encoding='utf-8') stopwords = [(words.strip()) for words in stopwords_file.readlines()] w = wordcloud.WordCloud(width=4000, height=4000, stopwords=stopwords, font_path="msyh.ttc", colormap='hot', background_color='pink', mask=mask) # 定义一个词云 w.generate(txt) #生成词云 w.to_file(r'C:/Users/Febird/Desktop/test_3.png') #将词组变量txt导入词云对象w中并保存添加代码完成排名前五十个词的词云

在代码中添加以下部分可以完成排名前五十个词的词云： ``` from collections import Counter # 统计词频 words_count = Counter(ls) # 获取词频前50的词及其频次 top50_words = words_count.most_common(50) # 将排名前50的词转换为一个字符串 top50_str = "" for word, count in top50_words: top50_str += (word + " ") * count # 生成词云 w.generate(top50_str) ``` 完整代码如下： ``` import jieba import wordcloud import imageio from collections import Counter mask = imageio.imread('C:/Users/Febird/Desktop/1.png') with open('C:/Users/Febird/Desktop/傲世.txt', encoding='utf-8') as f: t = f.read() ls = jieba.lcut(t) txt = " ".join(ls) stopwords_file = open('C:/Users/Febird/Desktop/stopWord.txt', 'r', encoding='utf-8') stopwords = [(words.strip()) for words in stopwords_file.readlines()] w = wordcloud.WordCloud(width=4000, height=4000, stopwords=stopwords, font_path="msyh.ttc", colormap='hot', background_color='pink', mask=mask) # 统计词频 words_count = Counter(ls) # 获取词频前50的词及其频次 top50_words = words_count.most_common(50) # 将排名前50的词转换为一个字符串 top50_str = "" for word, count in top50_words: top50_str += (word + " ") * count # 生成词云 w.generate(top50_str) w.to_file(r'C:/Users/Febird/Desktop/test_3.png') ```

阅读全文

相关推荐

jieba-0_jieba安装0.4.1_jieba-0.42.1.tar_jieba-0.42.1.tar.gz_jieba-

可用于分析人物出现频率的 嵩 红楼梦.txt

jieba-0.42.1-py3-none-any.whl

import jieba word = input() dicts = open("dict.txt","r",encoding='utf-8') for line in dicts.readlines(): a=jieba.lcut(line) if word in a: print(line) break else: print("这个词我不明白") dicts.close()

036GraphTheory(图论) matlab代码.rar

026SVM用于分类时的参数优化，粒子群优化算法，用于优化核函数的c,g两个参数(SVM PSO)Matlab代码.rar

大家在看

以下为转载Plasma工作原理介紹-plasma等离子处理

Oracle ASCP Profiles (Chinese version)

arcgis标准分幅图制作与生产

《程序设计基础》历年试题及答案.pdf

RealTek2797用户手册，最新

最新推荐

036GraphTheory(图论) matlab代码.rar

026SVM用于分类时的参数优化，粒子群优化算法，用于优化核函数的c,g两个参数(SVM PSO)Matlab代码.rar

HTML挑战：30天技术学习之旅

【CodeBlocks精通指南】：一步到位安装wxWidgets库（新手必备）

andorid studio 配置ERROR: Cause: unable to find valid certification path to requested target

VC++实现文件顺序读写操作的技巧与实践

【大数据时代必备：Hadoop框架深度解析】：掌握核心组件，开启数据科学之旅

opencv的demo程序

NeuronTransportIGA: 使用IGA进行神经元材料传输模拟

【Linux多系统管理大揭秘】：专家级技巧助你轻松驾驭

可用于分析人物出现频率的嵩红楼梦.txt