首页用爬虫获取https://dbaasp.org/peptide-card?id=DBAASPR_1页面Activity Against Target Species下的表格信息，输出到excel

用爬虫获取https://dbaasp.org/peptide-card?id=DBAASPR_1页面Activity Against Target Species下的表格信息，输出到excel

时间: 2024-10-20 13:04:23 浏览: 35

使用Python的网络爬虫库如`requests`和数据解析库如`BeautifulSoup`可以从网页上抓取数据。首先，你需要安装这两个库，如果你还没有的话： ```bash pip install requests pip install beautifulsoup4 ``` 然后，可以编写一个简单的脚本来爬取并保存数据到Excel文件。这里是一个基本的示例，假设我们只需要活动表的数据： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 网页URL url = "https://dbaasp.org/peptide-card?id=DBAASPR_1" # 发送GET请求 response = requests.get(url) # 检查请求是否成功 if response.status_code == 200: # 使用BeautifulSoup解析HTML内容 soup = BeautifulSoup(response.text, 'html.parser') # 查找Activity Against Target Species部分的表格 table = soup.find('table', {'class': 'table table-striped table-hover'}) if table: # 将表格数据提取成列表，每一行是一次迭代的结果 data = [] for row in table.find_all('tr'): cols = [col.text.strip() for col in row.find_all(['th', 'td'])] data.append(cols) # 创建DataFrame，并将数据写入Excel文件 df = pd.DataFrame(data[1:], columns=data[0]) # 第一行通常是表头 df.to_excel('activity_against_target_species.xlsx', index=False) else: print(f"Failed to fetch the page with status code {response.status_code}") ``` 这个脚本会尝试从指定URL下载HTML，然后解析出包含活动对目标物种表格的部分，并将其存储为Excel文件。请注意，实际操作可能会因为网站结构变化而需要调整。

阅读全文