Python3.9通过xpath获取中关村网站平板电脑所有字段
时间: 2023-07-12 11:54:15 浏览: 203
首先,需要使用Python的第三方库requests和lxml来获取和解析中关村网站的页面。具体代码如下:
```python
import requests
from lxml import etree
url = "http://detail.zol.com.cn/tablepc/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
}
response = requests.get(url, headers=headers)
response.encoding = "utf-8"
html = etree.HTML(response.text)
```
接下来,使用xpath来获取平板电脑的所有字段。可以先使用浏览器的开发者工具来查看网页源代码和元素的路径。例如,中关村网站平板电脑的名称、价格和链接分别对应以下xpath表达式:
```python
name_xpath = '//*[@id="J_PicMode"]/li/div/a/h3/text()'
price_xpath = '//*[@id="J_PicMode"]/li/div/div[2]/span[1]/b/text()'
link_xpath = '//*[@id="J_PicMode"]/li/div/a/@href'
```
使用xpath的`xpath()`方法即可获取对应的元素。具体代码如下:
```python
names = html.xpath(name_xpath)
prices = html.xpath(price_xpath)
links = html.xpath(link_xpath)
```
最后,可以将这些字段放到一个列表中,以便后续处理。完整代码如下:
```python
import requests
from lxml import etree
url = "http://detail.zol.com.cn/tablepc/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
}
response = requests.get(url, headers=headers)
response.encoding = "utf-8"
html = etree.HTML(response.text)
name_xpath = '//*[@id="J_PicMode"]/li/div/a/h3/text()'
price_xpath = '//*[@id="J_PicMode"]/li/div/div[2]/span[1]/b/text()'
link_xpath = '//*[@id="J_PicMode"]/li/div/a/@href'
names = html.xpath(name_xpath)
prices = html.xpath(price_xpath)
links = html.xpath(link_xpath)
data = []
for i in range(len(names)):
item = {}
item["name"] = names[i]
item["price"] = prices[i]
item["link"] = links[i]
data.append(item)
print(data)
```
阅读全文