吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 吴昊 20200722041 题量: 27 满分:100.0 截止日期:2023-05-23 12:00 吴昊 39' 57'' 数据采集与网络爬虫第一次阶段测试 返回 26 交卷 已知hello.html文件的内容如下。 <html> <body> <div> <ul> <li class="item-0"><a href="link1.html">first item </a></li> <li class="item-1"><a href="link2.html">second item </a></li> <li class="item-inactive"><a href="link3.html">third item </a></li> <li class="item-0"><a href="link4.html">fourth item </a></li> <li class="item-0"><a href="link5.html">fifth item </a></li> </ul></div></body></html> 请使用lxml库分别按照如下要求查找hello.html文件中的指定节点 (1)编写程序,查找所有名称为li的节点,并输出查找的结果 (2)编写程序,查找class属性值为item-0的所有节点,并输出查找的结果。 (3)编写程序,查找<li>下href属性值为link1.html的名为a的子节点,并输出查找的结果
时间: 2023-07-16 09:16:48 浏览: 140
以下是三个问题的答案:
(1)查找所有名称为li的节点,并输出查找的结果:
```python
from lxml import etree
html = '''
<html>
<body>
<div>
<ul>
<li class="item-0"><a href="link1.html">first item </a></li>
<li class="item-1"><a href="link2.html">second item </a></li>
<li class="item-inactive"><a href="link3.html">third item </a></li>
<li class="item-0"><a href="link4.html">fourth item </a></li>
<li class="item-0"><a href="link5.html">fifth item </a></li>
</ul>
</div>
</body>
</html>
'''
# 使用lxml解析html字符串
html_tree = etree.HTML(html)
# 查找所有名称为li的节点
li_nodes = html_tree.xpath('//li')
# 输出查找结果
for li_node in li_nodes:
print(etree.tostring(li_node, encoding='utf-8').decode('utf-8'))
```
输出结果如下:
```
<li class="item-0"><a href="link1.html">first item </a></li>
<li class="item-1"><a href="link2.html">second item </a></li>
<li class="item-inactive"><a href="link3.html">third item </a></li>
<li class="item-0"><a href="link4.html">fourth item </a></li>
<li class="item-0"><a href="link5.html">fifth item </a></li>
```
(2)查找class属性值为item-0的所有节点,并输出查找的结果:
```python
from lxml import etree
html = '''
<html>
<body>
<div>
<ul>
<li class="item-0"><a href="link1.html">first item </a></li>
<li class="item-1"><a href="link2.html">second item </a></li>
<li class="item-inactive"><a href="link3.html">third item </a></li>
<li class="item-0"><a href="link4.html">fourth item </a></li>
<li class="item-0"><a href="link5.html">fifth item </a></li>
</ul>
</div>
</body>
</html>
'''
# 使用lxml解析html字符串
html_tree = etree.HTML(html)
# 查找class属性值为item-0的所有节点
item_0_nodes = html_tree.xpath('//li[@class="item-0"]')
# 输出查找结果
for item_0_node in item_0_nodes:
print(etree.tostring(item_0_node, encoding='utf-8').decode('utf-8'))
```
输出结果如下:
```
<li class="item-0"><a href="link1.html">first item </a></li>
<li class="item-0"><a href="link4.html">fourth item </a></li>
<li class="item-0"><a href="link5.html">fifth item </a></li>
```
(3)查找<li>下href属性值为link1.html的名为a的子节点,并输出查找的结果:
```python
from lxml import etree
html = '''
<html>
<body>
<div>
<ul>
<li class="item-0"><a href="link1.html">first item </a></li>
<li class="item-1"><a href="link2.html">second item </a></li>
<li class="item-inactive"><a href="link3.html">third item </a></li>
<li class="item-0"><a href="link4.html">fourth item </a></li>
<li class="item-0"><a href="link5.html">fifth item </a></li>
</ul>
</div>
</body>
</html>
'''
# 使用lxml解析html字符串
html_tree = etree.HTML(html)
# 查找<li>下href属性值为link1.html的名为a的子节点
a_node = html_tree.xpath('//li/a[@href="link1.html"]')
# 输出查找结果
print(etree.tostring(a_node[0], encoding='utf-8').decode('utf-8'))
```
输出结果如下:
```
<a href="link1.html">first item </a>
```
阅读全文