帮我生成一个爬取豆瓣图书TOP250的pythondaima
时间: 2024-05-12 07:19:11 浏览: 141
以下是一个简单的Python爬虫代码,用于爬取豆瓣图书TOP250的书名、作者、评分和链接:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://book.douban.com/top250'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
def get_book_info(url):
res = requests.get(url, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
book_list = soup.find('div', {'class': 'article'}).find_all('tr', {'class': 'item'})
for book in book_list:
title = book.find('div', {'class': 'pl2'}).find('a')['title']
author = book.find('p', {'class': 'pl'}).text
rating = book.find('span', {'class': 'rating_nums'}).text
link = book.find('div', {'class': 'pl2'}).find('a')['href']
print('书名:', title)
print('作者:', author)
print('评分:', rating)
print('链接:', link)
print('--------------------------------------')
if __name__ == '__main__':
for i in range(0, 250, 25):
get_book_info(url + '?start=' + str(i))
```
该代码使用了requests库和BeautifulSoup库,通过requests发送请求,然后使用BeautifulSoup解析HTML页面,从而获取书名、作者、评分和链接等信息。使用循环遍历所有TOP250的页面,最后输出结果。注意,为了防止被豆瓣屏蔽,我们在请求头中加入了User-Agent。
阅读全文