<span class="comment-info"> <a class="" href="https://www.douban.com/people/duduxiongzhifu/">嘟嘟熊之父</a> <span>看过</span> <span class="allstar20 rating" title="较差"></span> <span class="comment-time" title="2023-06-02 14:46:24"> 2023-06-02 14:46:24 </span> <span class="comment-location">北京</span>如何提取其中的a标签中的信息
时间: 2023-07-19 18:23:34 浏览: 171
您可以使用正则表达式或者BeautifulSoup库来提取a标签中的信息。
使用正则表达式:
```python
import re
html = '<span class="comment-info"> <a class="" href="https://www.douban.com/people/duduxiongzhifu/">嘟嘟熊之父</a> <span>看过</span> <span class="allstar20 rating" title="较差"></span> <span class="comment-time" title="2023-06-02 14:46:24"> 2023-06-02 14:46:24 </span> <span class="comment-location">北京</span>'
pattern = r'<a.*?>(.*?)</a>'
result = re.findall(pattern, html)
print(result)
```
输出结果为:['嘟嘟熊之父']
使用BeautifulSoup库:
```python
from bs4 import BeautifulSoup
html = '<span class="comment-info"> <a class="" href="https://www.douban.com/people/duduxiongzhifu/">嘟嘟熊之父</a> <span>看过</span> <span class="allstar20 rating" title="较差"></span> <span class="comment-time" title="2023-06-02 14:46:24"> 2023-06-02 14:46:24 </span> <span class="comment-location">北京</span>'
soup = BeautifulSoup(html, 'html.parser')
a_tag = soup.find('a')
print(a_tag.string)
```
输出结果为:嘟嘟熊之父
阅读全文