首页python爬虫淘宝翻页爬取

python爬虫淘宝翻页爬取

时间: 2023-12-20 12:05:21 浏览: 194

以下是Python爬虫淘宝翻页爬取的示例代码： ```python import requests from lxml import etree import time def get_datas(url, page_num): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'} for j in range(1, page_num + 1): try: num = browser.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/div[2]/input') num.clear() num.send_keys(j + 1) browser.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/div[2]/span[3]').click() time.sleep(5) print("已爬取{}页，程序休息.....{}s".format(j, 5)) except: continue response = requests.get(url, headers=headers) html = etree.HTML(response.text) items = html.xpath('//div[@class="item J_MouserOnverReq "]') for item in items: title = item.xpath('.//div[@class="title"]/a/text()')[0] price = item.xpath('.//div[@class="price g_price g_price-highlight"]/strong/text()')[0] print(title, price) if __name__ == '__main__': url = 'https://s.taobao.com/search?q=%E5%B0%8F%E7%B1%B3%E6%89%8B%E6%9C%BA&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20211028&ie=utf8' page_num = 3 get_datas(url, page_num) ```

阅读全文