python循环爬取豆瓣top250书籍信息,爬十页
时间: 2024-10-19 11:13:38 浏览: 23
Python爬虫通过循环结构可以方便地爬取网页数据,这里我们以BeautifulSoup库为例,演示如何爬取豆瓣Top250书籍的信息,假设目标页面的URL是"www.douban.com/top250",每页包含10本书的信息。首先需要安装requests和beautifulsoup4库:
```python
pip install requests beautifulsoup4
```
然后编写一个简单的爬虫脚本:
```python
import requests
from bs4 import BeautifulSoup
# 设置初始URL和分页数
start_url = 'https://movie.douban.com/top250'
page_num = 10
for i in range(1, page_num + 1):
# 构建URL
url = f'{start_url}?start={i*25}&filter=' # 每页显示25本书,所以start从0开始递增
# 发送GET请求
response = requests.get(url)
# 检查请求是否成功
if response.status_code == 200:
# 解析HTML内容
soup = BeautifulSoup(response.text, 'html.parser')
# 找到所有电影元素
books = soup.find_all('div', class_='hd')
for book in books:
title = book.a['title'] # 书名
rating = book.strong.string # 评分
print(f'第{i}页 - 书名:{title}, 评分:{rating}')
else:
print(f'请求第{i}页失败,状态码:{response.status_code}')
阅读全文