python爬取豆瓣图书top250的代码
时间: 2023-07-23 09:52:20 浏览: 91
以下是Python爬取豆瓣图书top250的代码:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://book.douban.com/top250'
def get_html(url):
r = requests.get(url)
r.encoding = 'utf-8'
return r.text
def parse_html(html):
soup = BeautifulSoup(html, 'html.parser')
books = soup.find_all('tr', class_='item')
for book in books:
title = book.find('div', class_='pl2').find('a')['title']
author = book.find('p', class_='pl').text.strip().split('/')[0]
score = book.find('span', class_='rating_nums').text
print('书名:{}\n作者:{}\n评分:{}\n'.format(title, author, score))
if __name__ == '__main__':
for i in range(0, 250, 25):
html = get_html(url + '?start=' + str(i))
parse_html(html)
```
这段代码使用了Requests库发送HTTP请求获取HTML源码,并使用BeautifulSoup库解析HTML提取数据。通过循环遍历每一页的数据,依次输出每本书的书名、作者和评分。
阅读全文