没有合适的资源?快使用搜索试试~ 我知道了~
首页Python lxml模块的基本使用方法分析
本文实例讲述了Python lxml模块的基本使用方法。分享给大家供大家参考,具体如下: 1 lxml的安装 安装方式:pip install lxml 2 lxml的使用 2.1 lxml模块的入门使用 导入lxml 的 etree 库 (导入没有提示不代表不能用) from lxml import etree 利用etree.HTML,将字符串转化为Element对象,Element对象具有xpath的方法,返回结果的列表,能够接受bytes类型的数据和str类型的数据 html = etree.HTML(text) ret_list = html.xpath(xpath字符串)
资源详情
资源推荐
Python lxml模块的基本使用方法分析模块的基本使用方法分析
本文实例讲述了Python lxml模块的基本使用方法。分享给大家供大家参考,具体如下:
1 lxml的安装的安装
安装方式:pip install lxml
2 lxml的使用的使用
2.1 lxml模块的入门使用模块的入门使用
导入lxml 的 etree 库 (导入没有提示不代表不能用)
from lxml import etree
利用etree.HTML,将字符串转化为Element对象,Element对象具有xpath的方法,返回结果的列表,能够接受bytes类型的数据和
str类型的数据
html = etree.HTML(text)
ret_list = html.xpath("xpath字符串")
把转化后的element对象转化为字符串,返回bytes类型结果 etree.tostring(element)
假设我们现有如下的html字符换,尝试对他进行操作
<div> <ul>
<li class="item-1"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
<li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >second item</a></li>
<li class="item-inactive"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external
nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
<li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fourth item</a></li>
<li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fifth item</a> # 注意,此处缺少一个 </li> 闭合标签
</ul> </div>
from lxml import etree
text = ''' <div> <ul>
<li class="item-1"><a href="link1.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >first item</a></li>
<li class="item-1"><a href="link2.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >second item</a></li>
<li class="item-inactive"><a href="link3.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external
nofollow" rel="external nofollow" rel="external nofollow" >third item</a></li>
<li class="item-1"><a href="link4.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fourth item</a></li>
<li class="item-0"><a href="link5.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow"
rel="external nofollow" rel="external nofollow" >fifth item</a>
</ul> </div> '''
html = etree.HTML(text)
print(type(html))
handeled_html_str = etree.tostring(html).decode()
print(handeled_html_str)
输出为
<class ‘lxml.etree._Element’>
<html><body><div> <ul>
<li class=”item-1″><a href=”link1.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” >first item</a></li>
<li class=”item-1″><a href=”link2.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >second item</a>
</li>
<li class=”item-inactive”><a href=”link3.html” rel=”external nofollow” rel=”external nofollow” rel=”external
nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >third
item</a></li>
<li class=”item-1″><a href=”link4.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >fourth item</a></li>
<li class=”item-0″><a href=”link5.html” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow”
rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” rel=”external nofollow” >fifth item</a>
weixin_38621082
- 粉丝: 9
- 资源: 948
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功