首页写一个python程序，爬取http://www.letpub.com.cn/index.php?page=grant网站的数据，并要求可以翻页

写一个python程序，爬取http://www.letpub.com.cn/index.php?page=grant网站的数据，并要求可以翻页

时间: 2023-06-14 22:08:01 浏览: 154

python 爬取网页

python 爬取网页爬取豆瓣网top250电影

以下是一个简单的 Python 爬虫程序，用于爬取 http://www.letpub.com.cn/index.php?page=grant 网站的数据，并可以翻页。代码使用 requests 和 BeautifulSoup 库。 ```python import requests from bs4 import BeautifulSoup page_num = 1 # 起始页码 max_page_num = 10 # 最大页码 while page_num <= max_page_num: url = f"http://www.letpub.com.cn/index.php?page=grant&sort=&currentpage={page_num}" res = requests.get(url) soup = BeautifulSoup(res.text, "html.parser") # 找到表格中的所有行 rows = soup.select("table tr")[1:] for row in rows: # 获取每一行中的单元格 cells = row.select("td") # 解析出所需的数据 grant_title = cells[0].text.strip() grant_number = cells[1].text.strip() grant_agency = cells[2].text.strip() grant_amount = cells[3].text.strip() grant_duration = cells[4].text.strip() # 打印结果 print(grant_title, grant_number, grant_agency, grant_amount, grant_duration) # 翻页 page_num += 1 ``` 该程序将从第一页开始爬取数据，直到爬取到第 `max_page_num` 页为止。在每一页中，程序使用 BeautifulSoup 库来解析 HTML 并找到表格中的所有行。对于每一行，程序解析出所需的数据，并将其打印到控制台中。最后，程序将页码加一以翻页。

阅读全文