请你用Python编写一个功能强大的爬虫,爬取网址https://pic.netbian.com/index.html的图片并显示爬取结果
时间: 2024-10-12 17:13:11 浏览: 33
在Python中,我们可以使用requests库获取网页内容,BeautifulSoup解析HTML,然后使用正则表达式或专门的图像识别库如Pillow来提取图片链接。下面是一个简单的示例,展示如何使用BeautifulSoup爬取并保存图片:
```python
import requests
from bs4 import BeautifulSoup
import re
import os
# 请求目标页面
url = "https://pic.netbian.com/index.html"
response = requests.get(url)
content = response.text
# 使用BeautifulSoup解析HTML
soup = BeautifulSoup(content, 'html.parser')
# 找到所有的img标签
img_tags = soup.find_all('img')
# 爬取图片URL
image_urls = [img['src'] for img in img_tags]
# 创建存储目录(如果不存在)
if not os.path.exists('downloaded_images'):
os.makedirs('downloaded_images')
# 下载并保存图片
for i, url in enumerate(image_urls):
# 处理相对路径为http开头的情况,确保完整的URL
if not url.startswith("http"):
base_url = re.search(r'(http|https)://', content).group()
full_url = f"{base_url}{url}"
else:
full_url = url
try:
with requests.get(full_url, stream=True) as img_request:
img_request.raise_for_status() # 检查请求是否成功
filename = f'downloaded_images/image_{i+1}.jpg'
with open(filename, 'wb') as f:
for chunk in img_request.iter_content(chunk_size=8192): # 分块下载,防止内存溢出
if chunk:
f.write(chunk)
f.flush()
except Exception as e:
print(f"Error downloading image {full_url}: {e}")
print("图片已成功爬取并保存至downloaded_images文件夹。")
阅读全文