如何用python和xpath语言实现:从网页“https://book.douban.com/subject/24531956/”中提取作者、出版社、出版年、页数、定价、装帧、丛书、ISBN的信息,并且以字典的形式体现,保存到一个json文件中
时间: 2023-05-21 17:02:18 浏览: 124
可以使用Python中的requests和lxml库来实现。以下是代码示例:
```python
import requests
from lxml import etree
import json
url = 'https://book.douban.com/subject/24531956/'
response = requests.get(url)
html = etree.HTML(response.text)
author = html.xpath('//div[@id="info"]/span[1]/a/text()')[0]
publisher = html.xpath('//div[@id="info"]/span[2]/text()')[0]
pub_year = html.xpath('//div[@id="info"]/span[3]/text()')[0]
page_num = html.xpath('//div[@id="info"]/span[4]/text()')[0]
price = html.xpath('//div[@id="info"]/span[5]/text()')[0]
binding = html.xpath('//div[@id="info"]/span[6]/text()')[0]
series = html.xpath('//div[@id="content"]/div[@class="subject clearfix"]/div[@id="info"]/a/text()')[0]
isbn = html.xpath('//div[@id="content"]/div[@class="subject clearfix"]/div[@id="info"]/text()')[9].strip()
book_info = {
'author': author,
'publisher': publisher,
'pub_year': pub_year,
'page_num': page_num,
'price': price,
'binding': binding,
'series': series,
'isbn': isbn
}
with open('book_info.json', 'w', encoding='utf-8') as f:
json.dump(book_info, f, ensure_ascii=False)
```
这段代码会从指定的网页中提取出书籍的作者、出版社、出版年、页数、定价、装帧、丛书、ISBN等信息,并将其保存到一个名为book_info.json的JSON文件中。
阅读全文