python爬虫淘宝翻页爬取
时间: 2023-12-20 10:05:21 浏览: 173
python爬取淘宝网页
以下是Python爬虫淘宝翻页爬取的示例代码:
```python
import requests
from lxml import etree
import time
def get_datas(url, page_num):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
for j in range(1, page_num + 1):
try:
num = browser.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/div[2]/input')
num.clear()
num.send_keys(j + 1)
browser.find_element_by_xpath('//*[@id="mainsrp-pager"]/div/div/div/div[2]/span[3]').click()
time.sleep(5)
print("已爬取{}页,程序休息.....{}s".format(j, 5))
except:
continue
response = requests.get(url, headers=headers)
html = etree.HTML(response.text)
items = html.xpath('//div[@class="item J_MouserOnverReq "]')
for item in items:
title = item.xpath('.//div[@class="title"]/a/text()')[0]
price = item.xpath('.//div[@class="price g_price g_price-highlight"]/strong/text()')[0]
print(title, price)
if __name__ == '__main__':
url = 'https://s.taobao.com/search?q=%E5%B0%8F%E7%B1%B3%E6%89%8B%E6%9C%BA&imgfile=&js=1&stats_click=search_radio_all%3A1&initiative_id=staobaoz_20211028&ie=utf8'
page_num = 3
get_datas(url, page_num)
```
阅读全文