利用xpath爬取豆瓣图书top250的书名、国籍、作者、出版社、出版时间、售价、评分、评价人数以及评论并保存到csv中
时间: 2023-09-12 17:10:49 浏览: 207
Python—爬取豆瓣Top250(正则和Xpath比较)简单介绍Xpath
好的,以下是利用XPath爬取豆瓣图书top250的书籍信息并保存到csv文件中的Python示例代码:
```python
import requests
from lxml import etree
import csv
url = 'https://book.douban.com/top250'
# 发送HTTP请求
response = requests.get(url)
# 解析HTML文档
html = etree.HTML(response.text)
# 提取每本书的信息
book_items = html.xpath('//tr[@class="item"]')
# 创建csv文件并写入表头
with open('douban_book_top250.csv', 'w', newline='', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(['书名', '国籍', '作者', '出版社', '出版时间', '售价', '评分', '评价人数', '评论'])
# 遍历每本书
for item in book_items:
# 书名
book_name = item.xpath('.//div[@class="pl2"]/a/@title')[0]
# 书籍信息
book_info_list = item.xpath('.//p[@class="pl"]/text()')[0]
book_info_list = book_info_list.split('/')
book_info_list = [i.strip() for i in book_info_list]
# 国籍
book_country = book_info_list[0]
# 作者
book_author = book_info_list[1]
# 出版社
book_publisher = book_info_list[2]
# 出版时间
book_pubdate = book_info_list[3]
# 售价
book_price = book_info_list[4]
# 评分
book_rating = item.xpath('.//span[@class="rating_nums"]/text()')[0]
# 评价人数
book_votes = item.xpath('.//span[@class="pl"]/text()')[0]
book_votes = book_votes.strip().strip('(').strip().strip(')').strip()
# 评论
book_comment = item.xpath('.//p[@class="quote"]/span/text()')[0]
# 将书籍信息写入csv文件
writer.writerow([book_name, book_country, book_author, book_publisher, book_pubdate, book_price, book_rating, book_votes, book_comment])
print('豆瓣图书top250信息已保存到douban_book_top250.csv文件中。')
```
以上代码会爬取豆瓣图书top250页面上的书籍信息,并将其保存到名为douban_book_top250.csv的csv文件中。您可以根据需要进一步完善代码,例如添加翻页功能,或者提取更多的书籍信息等。
阅读全文