<span class="comment-info"> <a class="" href="https://www.douban.com/people/duduxiongzhifu/">嘟嘟熊之父</a> <span>看过</span> <span class="allstar20 rating" title="较差"></span> <span class="comment-time" title="2023-06-02 14:46:24"> 2023-06-02 14:46:24 </span> <span class="comment-location">北京</span>如何提取其中的a标签中的信息
时间: 2023-07-19 20:23:34 浏览: 63
您可以使用正则表达式或者BeautifulSoup库来提取a标签中的信息。
使用正则表达式:
```python
import re
html = '<span class="comment-info"> <a class="" href="https://www.douban.com/people/duduxiongzhifu/">嘟嘟熊之父</a> <span>看过</span> <span class="allstar20 rating" title="较差"></span> <span class="comment-time" title="2023-06-02 14:46:24"> 2023-06-02 14:46:24 </span> <span class="comment-location">北京</span>'
pattern = r'<a.*?>(.*?)</a>'
result = re.findall(pattern, html)
print(result)
```
输出结果为:['嘟嘟熊之父']
使用BeautifulSoup库:
```python
from bs4 import BeautifulSoup
html = '<span class="comment-info"> <a class="" href="https://www.douban.com/people/duduxiongzhifu/">嘟嘟熊之父</a> <span>看过</span> <span class="allstar20 rating" title="较差"></span> <span class="comment-time" title="2023-06-02 14:46:24"> 2023-06-02 14:46:24 </span> <span class="comment-location">北京</span>'
soup = BeautifulSoup(html, 'html.parser')
a_tag = soup.find('a')
print(a_tag.string)
```
输出结果为:嘟嘟熊之父
相关推荐
<html> <head> <meta charset="utf-8" /> <title>星星国际影城</title> <script src="js/main.js" type="text/javascript"></script> </head> <body> 登录| 注册| 全国服务热线:888-8888888 < img src="images/logo.jpg" title="" alt="logo" /> 首页 影片 影城 购票 影城活动 会员服务 顾客互动 <input type="text" placeholder="搜索影片" name="search"/> <input type="button" name="btn" value="" /> < img src="images/b1.png" /> < img src="images/b2.png" /> < img src="images/b3.png" /> 在每行代码进行解释
<template> <el-row class="header"> <el-col :span="24"> 后台管理系统 注销 </el-col> </el-row> <el-row class="content"> <el-col :span="6" class="sidebar"> <el-menu :default-active="activeMenu" class="menu"> <el-menu-item index="1"> 首页 </el-menu-item> <el-menu-item index="/category"> 分类管理 <router-link to="/category"> <el-menu-item>分类管理</el-menu-item> </router-link> </el-menu-item> <el-menu-item index="3"> 系统设置 </el-menu-item> </el-menu> </el-col> <el-col :span="18" class="main"> <router-view ></router-view> </el-col> </el-row> <el-row class="footer"> <el-col :span="24"> 版权所有 © 2022 后台管理系统 </el-col> </el-row> </template>还是会跳到新页面
Response was of unexpected text/html ContentType. Incoming portion of HTML stream: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><HTML><HEAD><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>CXF - Service list</title></head><body>Available SOAP services:
IBgsOutServicedeclarationqureyStatusqureyDecldeclarationBgsoutDocument Endpoint address: http://sz.91etg.com:9310/etgwebservice/iBgsOut
WSDL : {http://impl.service.bgsOut.winlu.com/}BgsOutServiceImplService
Target namespace
WSDL : {http://impl.service.bgsOut.winlu.com/}BgsOutServiceImplService
Target namespace