用python编写程序爬取豆瓣读书排行榜并储存为csv文件
时间: 2024-01-30 17:04:11 浏览: 82
Python 爬取豆瓣读书
好的,以下是用Python编写程序爬取豆瓣读书排行榜并储存为CSV文件的代码:
```python
import requests
from bs4 import BeautifulSoup
import csv
url = 'https://book.douban.com/top250?start='
# 创建一个csv文件并写入表头
with open('douban_books.csv', 'w', encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerow(['书名', '作者', '出版社', '出版日期', '评分', '评价人数'])
for i in range(0, 250, 25):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
res = requests.get(url + str(i), headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
books = soup.select('table')[0].select('tr')[1:]
for book in books:
name = book.select('td')[1].select('a')[0]['title']
author = book.select('td')[1].select('p')[0].text.split('/')[0].strip()
publisher = book.select('td')[1].select('p')[0].text.split('/')[-3].strip()
date = book.select('td')[1].select('p')[0].text.split('/')[-2].strip()
score = book.select('td')[2].select('span')[1].text.strip()
num = book.select('td')[2].select('span')[2].text.strip()[:-3]
with open('douban_books.csv', 'a', encoding='utf-8', newline='') as f:
writer = csv.writer(f)
writer.writerow([name, author, publisher, date, score, num])
```
这段代码使用requests和BeautifulSoup库来爬取豆瓣读书排行榜的信息,并将信息存储在一个CSV文件中。在代码中,我们首先创建了一个CSV文件,并写入了表头。然后,我们循环遍历排行榜的每一页,用BeautifulSoup库解析出每本书的信息,并将信息写入CSV文件中。最终,我们得到了一个包含豆瓣读书排行榜前250本书信息的CSV文件。
阅读全文