df =pd.read_excel("D:\HBVdata\HBV_P.xlsx") smile_list =df['Smiles'].tolist() mols =[Chem.MolFromSmiles(smile) for smile in smile_list] fingerprints =[Chem.RDKFingerprint(mol) for mol in mols] dg =pd.read_csv("D:\HBVdata\hbvfrag_recap_delete_duplicate.csv") smi_list =dg['mol'].tolist() submols =[Chem.MolFromSmarts(smi) for smi in smi_list] count_dict = {} for submol_idx, submol in enumerate(submols): count = 0 for mol in mols: if mol.HasSubstructMatch(submol): count += 1 smi = smi_list[submol_idx] count_dict[smi] = count total_count =sum(count_dict.values()) print(total_count) counts =[count_dict.values()] print(counts) freq =[ct / total_count for ct in counts]针对这段代码用字典的值除以total_count
时间: 2024-03-01 13:49:16 浏览: 107
操作模式-详解pandas库pd.read_excel操作读取excel文件参数整理与实例
这段代码是用来计算分子库中每个子结构出现的频率的。在计算频率时,使用了字典的值除以total_count的方法。具体来说,代码中的count_dict是一个字典,它的键是子结构的SMILES字符串,值是该子结构在分子库中出现的次数。total_count是所有子结构在分子库中出现的总次数。在计算频率时,代码通过将count_dict的值列表除以total_count来得到每个子结构的出现频率。最后,这些频率存储在freq列表中。
阅读全文