首页soup.findall的用法

soup.findall的用法

时间: 2024-06-09 13:09:42 浏览: 163

`soup.findall()`是BeautifulSoup库中的一个方法，用于查找HTML或XML文档中所有符合条件的标签或文本节点，并返回一个列表。例如，如果我们想要查找所有的`<a>`标签，可以使用以下代码： ```python from bs4 import BeautifulSoup html_doc = """ <html> <body> <a href="https://www.google.com">Google</a> <a href="https://www.baidu.com">Baidu</a> <a href="https://www.bing.com">Bing</a> </body> </html> """ soup = BeautifulSoup(html_doc, 'html.parser') links = soup.findall('a') # 查找所有的a标签 for link in links: print(link['href']) # 打印每个链接的href属性 ``` 输出： ``` https://www.google.com https://www.baidu.com https://www.bing.com ``` `soup.findall()`方法接受一个参数，可以是标签的名称、属性的名称或一个正则表达式对象。例如，我们可以查找所有`class`属性为`content`的`<div>`标签： ```python divs = soup.findall('div', {'class': 'content'}) ``` 或者使用正则表达式查找所有以`h`开头的标签： ```python import re tags = soup.findall(re.compile('^h')) ```

阅读全文