首页BeautifulSoup的find_all

BeautifulSoup的find_all

时间: 2023-11-30 20:02:36 浏览: 94

`find_all()`是BeautifulSoup中的一个方法，用于查找HTML文档中所有满足指定条件的标签。它的语法如下： ```python find_all(name=None, attrs={}, recursive=True, text=None, limit=None, **kwargs) ``` 其中，各参数的含义如下： - `name`：标签名或标签名列表，例如`'a'`、`['a', 'div']`。 - `attrs`：属性名和属性值的字典，例如`{'class': 'item', 'id': 'title'}`。 - `recursive`：是否递归查找子孙标签，默认为`True`。 - `text`：标签文本内容或正则表达式，例如`'hello'`、`'world'`、`'^\d+$'`。 - `limit`：返回结果集的最大数量，例如`10`，表示只返回前10个结果。 - `**kwargs`：可以用来传递自定义的过滤函数或其他参数。 `find_all()`方法会返回一个包含所有符合条件的标签的列表。例如，以下代码查找所有`<a>`标签，并打印它们的文本内容： ```python from bs4 import BeautifulSoup html_doc = """ <html> <head><title>测试文档</title></head> <body> <a href="https://www.baidu.com">百度搜索</a> <a href="https://www.google.com">谷歌搜索</a> </body> </html> """ soup = BeautifulSoup(html_doc, 'html.parser') links = soup.find_all('a') for link in links: print(link.text) ``` 输出结果为： ``` 百度搜索谷歌搜索 ``` 这里的`find_all('a')`表示查找所有`<a>`标签，返回一个包含两个标签的列表。然后使用`for`循环遍历这个列表，打印每个标签的`text`属性，即标签的文本内容。

阅读全文