首页https://www.cnipa.gov.cn/tjxx/jianbao/year2008/a/a1.html 抓取这个网页的表格

https://www.cnipa.gov.cn/tjxx/jianbao/year2008/a/a1.html 抓取这个网页的表格

时间: 2023-08-09 08:05:49 浏览: 45

你可以使用 `requests` 库和 `BeautifulSoup` 库来抓取这个网页的表格。以下是一个示例代码： ```python import requests from bs4 import BeautifulSoup import pandas as pd # 设置请求头 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} # 发送请求 url = 'https://www.cnipa.gov.cn/tjxx/jianbao/year2008/a/a1.html' response = requests.get(url, headers=headers) # 解析 HTML soup = BeautifulSoup(response.text, 'html.parser') # 找到表格 table = soup.find('table') # 将表格转换为 DataFrame df = pd.read_html(str(table))[0] # 打印 DataFrame print(df) ``` 在上面的示例代码中，我们首先设置了请求头，然后使用 `requests.get()` 方法发送了请求，并使用 `BeautifulSoup` 解析了 HTML。接着，我们使用 `soup.find()` 方法找到了表格，并使用 `pd.read_html()` 方法将表格转换为 DataFrame。最后，我们打印了 DataFrame。需要注意的是，这个网页中有多个表格，如果你需要抓取其他表格，可以修改 `soup.find()` 方法中的参数来定位不同的表格。