list1 = soup.find_all什么意思
时间: 2024-06-03 12:10:27 浏览: 19
。list1 = soup.find_all() 是 Beautiful Soup 中的一个方法,find_all() 方法用于查找网页中特定的 HTML 标签和属性,返回一个符合条件的元素列表。该方法可以传入三个参数,分别为标签名、属性和文本,在没有传入参数的情况下,find_all() 方法会查找所有标签和内容。返回的结果以列表的形式存储,可以通过遍历来进行操作。
相关问题
data = soup.find('div', {'class': 'report-list report-list__item'}).find_all('div')[1].find_all('table')[0].find_all('tr')[1:-1] AttributeError: 'NoneType' object has no attribute 'find_all'
这个错误提示通常是因为`find()`或`find_all()`方法没有找到指定的标签或属性,返回了`None`对象,而`None`对象并没有`find_all()`方法。
你可以在运行`find()`或`find_all()`方法之前,先判断一下对象是否为`None`,避免这个错误的发生。例如:
```
data_div = soup.find('div', {'class': 'report-list report-list__item'})
if data_div:
data_table = data_div.find_all('div')[1].find_all('table')[0]
if data_table:
data = data_table.find_all('tr')[1:-1]
```
这样就可以避免对`None`对象调用`find_all()`方法导致的错误。
修改这段代码并写出来import requests from bs4 import BeautifulSoup # 爬取的页面数量 num_pages = 5 # 打开每个页面并获取 HTML 内容 for page_num in range(1, num_pages + 1): url = f"https://www.bernama.com/bm/index.php/archive.php?page={page_num}" response = requests.get(url) html_content = response.text # 解析 HTML 内容,获取新闻标题和内容 soup = BeautifulSoup(html_content, "html.parser") news_list = soup.find_all("div", class_="row news-row") for news in news_list: title = news.find("a").text.strip() link = news.find("a")["href"] news_response = requests.get(link) news_html = news_response.text news_soup = BeautifulSoup(news_html, "html.parser") content_list = news_soup.find_all("div", class_="col-md-12 news-body") content = "\n".join([p.text for p in content_list[0].find_all("p")]) print(content) # 将新闻标题和内容保存到 txt 文件 with open(f"page{page_num}_{title}.txt", "w") as f: f.write(f"Title: {title}\n") f.write(f"Link: {link}\n") f.write(f"Content:\n{content}\n")
import requests
from bs4 import BeautifulSoup
# 爬取的页面数量
num_pages = 5
# 打开每个页面并获取 HTML 内容
for page_num in range(1, num_pages + 1):
url = f"https://www.bernama.com/bm/index.php/archive.php?page={page_num}"
response = requests.get(url)
html_content = response.text
# 解析 HTML 内容,获取新闻标题和内容
soup = BeautifulSoup(html_content, "html.parser")
news_list = soup.find_all("div", class_="row news-row")
for news in news_list:
title = news.find("a").text.strip()
link = news.find("a")["href"]
# 打开每个新闻链接并获取 HTML 内容
news_response = requests.get(link)
news_html = news_response.text
# 解析新闻 HTML 内容,获取新闻内容
news_soup = BeautifulSoup(news_html, "html.parser")
content_list = news_soup.find_all("div", class_="col-md-12 news-body")
content = "\n".join([p.text for p in content_list[0].find_all("p")])
# 将新闻标题和内容保存到 txt 文件
with open(f"page{page_num}_{title}.txt", "w") as f:
f.write(f"Title: {title}\n")
f.write(f"Link: {link}\n")
f.write(f"Content:\n{content}\n")