首页帮我生成一个爬取豆瓣图书TOP250的pythondaima

帮我生成一个爬取豆瓣图书TOP250的pythondaima

时间: 2024-05-12 07:19:11 浏览: 141

以下是一个简单的Python爬虫代码，用于爬取豆瓣图书TOP250的书名、作者、评分和链接： ```python import requests from bs4 import BeautifulSoup url = 'https://book.douban.com/top250' headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} def get_book_info(url): res = requests.get(url, headers=headers) soup = BeautifulSoup(res.text, 'html.parser') book_list = soup.find('div', {'class': 'article'}).find_all('tr', {'class': 'item'}) for book in book_list: title = book.find('div', {'class': 'pl2'}).find('a')['title'] author = book.find('p', {'class': 'pl'}).text rating = book.find('span', {'class': 'rating_nums'}).text link = book.find('div', {'class': 'pl2'}).find('a')['href'] print('书名：', title) print('作者：', author) print('评分：', rating) print('链接：', link) print('--------------------------------------') if __name__ == '__main__': for i in range(0, 250, 25): get_book_info(url + '?start=' + str(i)) ``` 该代码使用了requests库和BeautifulSoup库，通过requests发送请求，然后使用BeautifulSoup解析HTML页面，从而获取书名、作者、评分和链接等信息。使用循环遍历所有TOP250的页面，最后输出结果。注意，为了防止被豆瓣屏蔽，我们在请求头中加入了User-Agent。

阅读全文