python使用xpath获取第一个li标签
时间: 2023-06-04 19:02:40 浏览: 574
可以使用以下代码:
```
from lxml import etree
html = """
<body>
<ul class="list">
<li>First item</li>
<li>Second item</li>
<li>Third item</li>
</ul>
</body>
"""
selector = etree.HTML(html)
li_text = selector.xpath('//ul[@class="list"]/li[1]/text()')[0]
print(li_text)
```
输出结果为:`First item`。
相关问题
Python3.9通过xpath获取中关村网站平板电脑所有字段
首先,需要使用Python的第三方库requests和lxml来获取和解析中关村网站的页面。具体代码如下:
```python
import requests
from lxml import etree
url = "http://detail.zol.com.cn/tablepc/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
}
response = requests.get(url, headers=headers)
response.encoding = "utf-8"
html = etree.HTML(response.text)
```
接下来,使用xpath来获取平板电脑的所有字段。可以先使用浏览器的开发者工具来查看网页源代码和元素的路径。例如,中关村网站平板电脑的名称、价格和链接分别对应以下xpath表达式:
```python
name_xpath = '//*[@id="J_PicMode"]/li/div/a/h3/text()'
price_xpath = '//*[@id="J_PicMode"]/li/div/div[2]/span[1]/b/text()'
link_xpath = '//*[@id="J_PicMode"]/li/div/a/@href'
```
使用xpath的`xpath()`方法即可获取对应的元素。具体代码如下:
```python
names = html.xpath(name_xpath)
prices = html.xpath(price_xpath)
links = html.xpath(link_xpath)
```
最后,可以将这些字段放到一个列表中,以便后续处理。完整代码如下:
```python
import requests
from lxml import etree
url = "http://detail.zol.com.cn/tablepc/"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
}
response = requests.get(url, headers=headers)
response.encoding = "utf-8"
html = etree.HTML(response.text)
name_xpath = '//*[@id="J_PicMode"]/li/div/a/h3/text()'
price_xpath = '//*[@id="J_PicMode"]/li/div/div[2]/span[1]/b/text()'
link_xpath = '//*[@id="J_PicMode"]/li/div/a/@href'
names = html.xpath(name_xpath)
prices = html.xpath(price_xpath)
links = html.xpath(link_xpath)
data = []
for i in range(len(names)):
item = {}
item["name"] = names[i]
item["price"] = prices[i]
item["link"] = links[i]
data.append(item)
print(data)
```
python xpath用法
以下是Python中使用XPath的基本用法:
1. 导入lxml库和requests库
```python
import requests
from lxml import etree
```
2. 发送请求并获取HTML文本
```python
url = 'http://example.com'
response = requests.get(url)
html = response.text
```
3. 解析HTML文本并使用XPath进行定位
```python
# 将HTML文本转换为Element对象
selector = etree.HTML(html)
# 使用XPath定位元素
# 获取所有a标签的href属性
links = selector.xpath('//a/@href')
# 获取第一个h1标签的文本内容
title = selector.xpath('//h1/text()')[0]
```
4. XPath语法实例应用
```python
# 获取所有p标签的文本内容
p_list = selector.xpath('//p/text()')
# 获取第一个div标签的class属性值
div_class = selector.xpath('//div/@class')[0]
# 获取第二个ul标签下的所有li标签的文本内容
li_list = selector.xpath('//ul[2]/li/text()')
# 获取所有class属性值为'item'的div标签下的a标签的href属性
a_list = selector.xpath('//div[@class="item"]//a/@href')
```
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![-](https://csdnimg.cn/download_wenku/file_type_column_c1.png)
![-](https://csdnimg.cn/download_wenku/file_type_column_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)