首页帮我写一个用化合物名称在pubchem数据库批量爬取CID和Isomeric SMILES的代码

帮我写一个用化合物名称在pubchem数据库批量爬取CID和Isomeric SMILES的代码

时间: 2024-06-10 13:08:10 浏览: 390

当然，我可以帮你写一个这样的代码。你可以使用PubChemPy库来进行这个任务。首先，你需要在PubChem中搜索每个化合物，然后从搜索结果中提取CID和Isomeric SMILES。以下是代码示例： ``` import pubchempy as pcp import pandas as pd # 假设化合物名称的列表为compounds_list compounds_list = ['水', '氢氧化钠', '氯化钠'] results = [] for compound in compounds_list: try: # 在PubChem中搜索化合物 search_results = pcp.get_compounds(compound, 'name') # 从搜索结果中提取CID和Isomeric SMILES cid = search_results[0].cid isomeric_smiles = search_results[0].isomeric_smiles # 将结果追加到结果列表中 results.append({'Compound': compound, 'CID': cid, 'Isomeric SMILES': isomeric_smiles}) except: # 如果搜索失败，则将结果追加到结果列表中，但CID和Isomeric SMILES都为空值 results.append({'Compound': compound, 'CID': None, 'Isomeric SMILES': None}) # 将结果转换为DataFrame results_df = pd.DataFrame(results) # 将结果保存到CSV文件中 results_df.to_csv('compounds_info.csv', index=False) ``` 这个代码将在PubChem中搜索你提供的每个化合物，并从每个搜索结果中提取CID和Isomeric SMILES。然后，它将结果存储到一个CSV文件中，其中包含化合物名称、CID和Isomeric SMILES。注意，有些化合物可能无法在PubChem中找到，如果这样的话，代码将不会崩溃，而是在结果列表中将CID和Isomeric SMILES设置为null。

阅读全文