首页用正则表达式爬取京东图书销量榜（https://book.jd.com/booktop/0-0-0.html）上所有图书的以下信息：（1）书名——bookName （2）作者——author （3）定价——definePrice （4）折扣——discount （5）售价——sellPrice

用正则表达式爬取京东图书销量榜（https://book.jd.com/booktop/0-0-0.html）上所有图书的以下信息：（1）书名——bookName （2）作者——author （3）定价——definePrice （4）折扣——discount （5）售价——sellPrice

时间: 2023-11-13 19:05:17 浏览: 45

好的，这是一个比较典型的爬取数据的需求，可以用Python的爬虫库和正则表达式来实现。下面是代码示例： ```python import requests import re url = 'https://book.jd.com/booktop/0-0-0.html' response = requests.get(url) html = response.content.decode('utf-8') # 爬取书名、作者、定价、折扣、售价 pattern = r'<div class="p-img"><a href="//(.*?)" target="_blank" title="(.*?)">' pattern += r'</a></div><div class="p-detail"><a href="//(.*?)" target="_blank" title="(.*?)">' pattern += r'</a>(.*?)￥(.*?)' pattern += r'￥(.*?)' pattern += r'<a href="//(.*?)" target="_blank">￥(.*?)</a></div>' result = re.findall(pattern, html) for item in result: bookName = item[1] author = item[4] definePrice = item[5] discount = str(round(float(item[6])/float(item[5]), 2)) sellPrice = item[7] print(bookName, author, definePrice, discount, sellPrice) ``` 解释一下代码： 1. 用requests库发送GET请求获取京东图书销量榜页面的HTML代码； 2. 用正则表达式匹配HTML代码中的书名、作者、定价、折扣、售价等信息； 3. 循环遍历匹配结果，并将每一本书的信息打印出来。这里用到的正则表达式比较长，但是也比较简单，可以分段理解。其中，r表示原始字符串，方便正则表达式的书写，.*?表示非贪婪匹配，可以匹配任意字符，直到匹配到下一个字符为止。