使用xpath获取页面元素怎么写
时间: 2024-05-01 10:19:35 浏览: 13
您可以使用以下代码来使用xpath获取页面元素:
```
from lxml import etree
html = '''
<html>
<head>
<title>Test Page</title>
</head>
<body>
<div class="test">
<h1>Heading 1</h1>
<p class="first">Paragraph 1</p>
<p>Paragraph 2</p>
</div>
<div class="test">
<h1>Heading 2</h1>
<p class="first">Paragraph 3</p>
<p>Paragraph 4</p>
</div>
</body>
</html>
'''
tree = etree.HTML(html)
# Get all the div elements with class "test"
div_elements = tree.xpath('//div[@class="test"]')
# Get the first h1 element within each div element
headings = []
for div in div_elements:
h1_element = div.xpath('.//h1')[0]
headings.append(h1_element.text)
# Get all the p elements with class "first"
first_paragraphs = tree.xpath('//p[@class="first"]')
# Get the text content of each p element
paragraphs = []
for p in first_paragraphs:
paragraphs.append(p.text)
print(headings) # Output: ["Heading 1", "Heading 2"]
print(paragraphs) # Output: ["Paragraph 1", "Paragraph 3"]
```
这段代码使用lxml库将HTML解析成DOM树,然后使用xpath表达式获取DOM树中的元素。在这个例子中,我们使用xpath表达式`//div[@class="test"]`来获取所有`<div>`元素中class属性为test的元素;使用表达式`.//h1`来获取当前节点下第一个`<h1>`元素;使用表达式`//p[@class="first"]`来获取class属性为first的所有`<p>`元素;使用表达式`p.text`获取`<p>`元素的文本内容。