首页写出python代码请利用xpth解析图片路径，爬取该页面下所有图片并保存https://pic.netbian.com/4kdongman/

写出python代码请利用xpth解析图片路径，爬取该页面下所有图片并保存https://pic.netbian.com/4kdongman/

时间: 2024-11-03 20:22:22 浏览: 29

在Python中，我们可以使用BeautifulSoup库配合requests库来抓取网页上的图片，并使用lxml库作为BeautifulSoup的一个强大支持，它内置了对XPath的支持。下面是一个示例代码，用于解析指定URL下的图片并将它们下载到本地： ```python import requests from bs4 import BeautifulSoup import lxml import os # 指定目标图片URL url = "https://pic.netbian.com/4kdongman/" # 使用requests获取网页内容 response = requests.get(url) content = response.content # 使用lxml的html解析器 soup = BeautifulSoup(content, 'lxml') # 定义XPath表达式来选择所有的img标签 xpath_expression = '//img[@src]' # 找到所有匹配的图片源路径 image_paths = soup.xpath(xpath_expression) # 创建一个保存图片的目录，如果不存在的话 if not os.path.exists("images"): os.makedirs("images") # 遍历图片路径并下载 for image_path in image_paths: # 获取图片的完整URL img_url = url + image_path.get('src') # 下载图片并保存 with requests.get(img_url, stream=True) as img_request: file_name = os.path.join("images", img_url.split("/")[-1]) with open(file_name, 'wb') as f: for chunk in img_request.iter_content(chunk_size=1024): if chunk: # filter out keep-alive new chunks f.write(chunk) # 结果 print(f"已下载 {len(image_paths)} 张图片到 'images' 目录.")

阅读全文