brand = ' '.join([i for i in df['品牌']]).strip().split() c = dict(Counter(brand)) cdf = pd.DataFrame(c.values(), c.keys(), ["数量"])解释代码什么意思

这段代码的作用是统计一个数据框(df)中“品牌”列中出现的品牌数量，并将统计结果存储在一个新的数据框(cdf)中。具体实现步骤如下：第一行代码将“品牌”列中的品牌名用空格连接起来，去除首尾空格，并按照空格符将其拆分成一个包含多个品牌名的列表。第二行代码使用Counter()函数统计该列表中每个品牌名出现的次数，并将统计结果存储在一个字典变量(c)中。第三行代码使用这个字典变量(c)来创建一个新的数据框(cdf)，其中品牌名作为行索引，品牌数量作为列“数量”的值。

def _load(self): with open(self.txt_filelist, "r") as f: self.relpaths = f.read().splitlines() l1 = len(self.relpaths) self.relpaths = self._filter_relpaths(self.relpaths) print("Removed {} files from filelist during filtering.".format(l1 - len(self.relpaths))) self.synsets = [p.split("/")[0] for p in self.relpaths] self.abspaths = [os.path.join(self.datadir, p) for p in self.relpaths] unique_synsets = np.unique(self.synsets) class_dict = dict((synset, i) for i, synset in enumerate(unique_synsets)) if not self.keep_orig_class_label: self.class_labels = [class_dict[s] for s in self.synsets] else: self.class_labels = [self.synset2idx[s] for s in self.synsets] with open(self.human_dict, "r") as f: human_dict = f.read().splitlines() human_dict = dict(line.split(maxsplit=1) for line in human_dict) self.human_labels = [human_dict[s] for s in self.synsets] labels = { "relpath": np.array(self.relpaths), "synsets": np.array(self.synsets), "class_label": np.array(self.class_labels), "human_label": np.array(self.human_labels), } if self.process_images: self.size = retrieve(self.config, "size", default=256) self.data = ImagePaths(self.abspaths, labels=labels, size=self.size, random_crop=self.random_crop, ) else: self.data = self.abspaths解析

这段代码是用于加载数据的，主要做了以下几件事情： 1. 从文件列表中读取文件路径，并进行过滤； 2. 提取文件的类别标签（即synset）并保存在变量self.synsets中； 3. 将相对路径转为绝对路径，并保存在变量self.abspaths中； 4. 如果keep_orig_class_label为False，则将类别标签转为数字标签； 5. 从human_dict文件中读取类别标签对应的人类可读标签，并保存在变量self.human_labels中； 6. 将文件路径、类别标签、数字标签和人类可读标签保存在labels变量中； 7. 如果process_images为True，则调用ImagePaths类对图像进行处理，并保存在变量self.data中； 8. 如果process_images为False，则直接将文件路径保存在变量self.data中。

对下列代码加上每一行总词数密度以及每一个数字化词频的密度import pandas as pd import re from collections import Counter # 读取词典txt文件 with open('词典.txt', 'r', encoding='utf-8') as f: list_dict = [line.strip() for line in f] # 读取F列数据 df = pd.read_excel('实验体.xlsx') list_f = df['Answer'].tolist() # 统计每行文本中的词频 dict_count_list = [] for text in list_f: # 匹配文本中的词列表 text = str(text) words = re.findall('|'.join(list_dict), text) # 统计每个词在该行文本中的出现次数 dict_count = Counter(words) dict_count_list.append(dict_count) # 将每行文本的词频统计结果合并为一个DataFrame对象 df_count = pd.DataFrame(dict_count_list) df_count.index = df.index # 输出为Excel文件 writer = pd.ExcelWriter('数实验体100.xlsx') df_count.to_excel(writer, sheet_name='Sheet1') writer._save()

import pandas as pd import re from collections import Counter # 读取词典txt文件 with open('词典.txt', 'r', encoding='utf-8') as f: list_dict = [line.strip() for line in f] # 读取F列数据 df = pd.read_excel('实验体.xlsx') list_f = df['Answer'].tolist() # 统计每行文本中的词频 dict_count_list = [] for text in list_f: # 匹配文本中的词列表 text = str(text) words = re.findall('|'.join(list_dict), text) # 统计每个词在该行文本中的出现次数 dict_count = Counter(words) dict_count_list.append(dict_count) # 将每行文本的词频统计结果合并为一个DataFrame对象 df_count = pd.DataFrame(dict_count_list) df_count.index = df.index # 输出为Excel文件 writer = pd.ExcelWriter('数实验体100.xlsx') df_count.to_excel(writer, sheet_name='Sheet1') writer._save() 每一行总词数密度：在这段代码中，总词数密度是指每一行代码中的总词数。每一行的总词数不同，因此密度也不同。数字化词频的密度：在这段代码中，数字化词频的密度是指每个数字化词在该行文本中出现的次数。每个数字化词出现的次数不同，因此密度也不同。

阅读全文

brand = ' '.join([i for i in df['品牌']]).strip().split() c = dict(Counter(brand)) cdf = pd.DataFrame(c.values(), c.keys(), ["数量"])解释代码什么意思

相关推荐

官方最新版Curl 7.84.0 for Windows 64位发布

Python3.x新特性解析：从print到dict的重大变革

利用Lookup in Dict.cn-crx插件提升在线查词效率

dict = {} for i in df['学历要求']: if i not in dict.keys(): dict[i] = 0 else: dict[i] += 1 index = list(dict.keys()) print('index:',index) num = [] for i in index: num.append(dict[i]) print('num:',num) plt.bar(index, num, width=0.5) plt.savefig(city+'_'+kind+'_'+'学历要求.png') plt.show()

大家在看

华为CloudIVS 3000技术主打胶片v1.0（C20190226）.pdf

dosbox:适用于Android的DosBox Turbo FreeBox

功率谱密度：时间历程的功率谱密度。-matlab开发

南京工业大学Python程序设计语言题库及答案

Windows6.1--KB2533623-x64.zip

最新推荐

解决Tensorflow2.0 tf.keras.Model.load_weights() 报错处理问题

一个基于Qt Creator（qt,C++）实现中国象棋人机对战

热带雨林自驾游自然奇观探索.doc

Windows下操作Linux图形界面的VNC工具

【SketchUp Ruby API：从入门到精通】

VMware虚拟机打开虚拟网络编辑器出现由于找不到vnetlib.dll,无法继续执行代码。重新安装程序可能会解决问题

基于Preact的高性能PWA实现定期天气信息更新

从停机到上线，EMC VNX5100控制器SP更换的实战演练

ubuntu labelme中文版安装

全新免费HTML5商业网站模板发布