没有合适的资源?快使用搜索试试~ 我知道了~
首页Python lxml模块的基本使用方法分析
本文实例讲述了Python lxml模块的基本使用方法。分享给大家供大家参考,具体如下: 1 lxml的安装 安装方式:pip install lxml 2 lxml的使用 2.1 lxml模块的入门使用 导入lxml 的 etree 库 (导入没有提示不代表不能用) from lxml import etree 利用etree.HTML,将字符串转化为Element对象,Element对象具有xpath的方法,返回结果的列表,能够接受bytes类型的数据和str类型的数据 html = etree.HTML(text) ret_list = html.xpath(xpath字符串)
资源详情
资源评论
资源推荐

Python lxml模块的基本使用方法分析模块的基本使用方法分析
本文实例讲述了Python lxml模块的基本使用方法。分享给大家供大家参考,具体如下:
1 lxml的安装的安装
安装方式:pip install lxml
2 lxml的使用的使用
2.1 lxml模块的入门使用模块的入门使用
导入lxml 的 etree 库 (导入没有提示不代表不能用)
from lxml import etree
利用etree.HTML,将字符串转化为Element对象,Element对象具有xpath的方法,返回结果的列表,能够接受bytes类型的数据和
str类型的数据
html = etree.HTML(text)
ret_list = html.xpath("xpath字符串")
把转化后的element对象转化为字符串,返回bytes类型结果 etree.tostring(element)
假设我们现有如下的html字符换,尝试对他进行操作
<div> <ul>
<li class="item-1"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
<li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >second item</a></li>
<li class="item-inactive"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external
nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
<li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fourth item</a></li>
<li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fifth item</a> # 注意,此处缺少一个 </li> 闭合标签
</ul> </div>
from lxml import etree
text = ''' <div> <ul>
<li class="item-1"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
<li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >second item</a></li>
<li class="item-inactive"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external
nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
<li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fourth item</a></li>
<li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fifth item</a>
</ul> </div> '''
html = etree.HTML(text)
print(type(html))
handeled_html_str = etree.tostring(html).decode()
print(handeled_html_str)
输出为
<class ‘lxml.etree._Element’>
<html><body><div> <ul>
<li class=”item-1″><a href=”link1.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” >first item</a></li>
<li class=”item-1″><a href=”link2.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >second item</a>
</li>
<li class=”item-inactive”><a href=”link3.html” rel=”external nofollow” rel=”external nofollow” rel=”external
nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >third
item</a></li>
<li class=”item-1″><a href=”link4.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >fourth item</a></li>
<li class=”item-0″><a href=”link5.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >fifth item</a>


















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0