首页帮我爬取一段豆瓣top250的代码，要求爬取页码数可控（第几页到第几页）

帮我爬取一段豆瓣top250的代码，要求爬取页码数可控（第几页到第几页）

时间: 2023-10-07 14:13:29 浏览: 32

好的，以下是Python代码实现： ```python import requests from bs4 import BeautifulSoup def get_movie_info(url): res = requests.get(url) soup = BeautifulSoup(res.text, 'html.parser') items = soup.find_all(class_='item') for item in items: rank = item.find(class_='pic').em.text title = item.find(class_='title').a.text year = item.find(class_='bd').p.text.strip().split('\n')[-1] rating = item.find(class_='rating_num').text quote = item.find(class_='inq') if quote: quote = quote.text else: quote = '' print(f'{rank} {title} {year} {rating} {quote}') def get_top250(start, end): for i in range(start, end+1): url = f'https://movie.douban.com/top250?start={(i-1)*25}&filter=' get_movie_info(url) get_top250(1, 3) # 爬取第1页到第3页的数据 ``` 解释一下上面的代码： - `get_movie_info` 函数用于爬取一页的电影信息，其中 `url` 参数为该页的链接。 - `get_top250` 函数用于爬取指定页码范围内的电影信息，其中 `start` 和 `end` 分别为开始页码和结束页码。 - 在 `get_top250` 中，用 `range` 函数循环遍历每一页的链接，然后调用 `get_movie_info` 函数进行爬取。 - 在 `get_movie_info` 中，使用 `requests` 库获取页面内容，然后使用 `BeautifulSoup` 库解析页面并查找电影信息。最后打印出电影信息。注意：爬虫可能会被网站屏蔽，请勿过度使用。