首页python爬取网页图片保存

python爬取网页图片保存

时间: 2023-10-29 12:07:43 浏览: 79

python爬取网页图片

可以使用Python中的requests和BeautifulSoup库来实现爬取网页图片并保存到本地。首先，使用requests库发送HTTP请求，获取网页的HTML文本。然后，使用BeautifulSoup库解析HTML文本，找到所有的图片标签。最后，对于每一张图片，使用requests库发送HTTP请求，获取图片的二进制数据，并将其保存到本地。下面是一个示例代码： ```python import os import requests from bs4 import BeautifulSoup url = 'https://www.example.com' response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 创建保存图片的文件夹 if not os.path.exists('images'): os.mkdir('images') # 遍历所有的图片标签 for img in soup.find_all('img'): img_url = img.get('src') if not img_url.startswith('http'): img_url = url + '/' + img_url # 发送HTTP请求，获取图片的二进制数据 response = requests.get(img_url) content_type = response.headers['Content-Type'] if 'image' not in content_type: continue # 保存图片到本地 filename = os.path.join('images', os.path.basename(img_url)) with open(filename, 'wb') as f: f.write(response.content) ``` 在上面的代码中，我们首先定义了一个要爬取的网页的URL，然后使用requests库发送HTTP请求，获取网页的HTML文本。接着，使用BeautifulSoup库解析HTML文本，找到所有的图片标签。我们遍历所有的图片标签，获取图片的URL，如果图片URL不是完整的URL，则需要拼接上网页的URL。然后，我们使用requests库发送HTTP请求，获取图片的二进制数据，并将其保存到本地。我们将图片保存到一个名为“images”的文件夹中，文件名使用图片URL的basename，即图片的文件名。

阅读全文