首页使用urllib库爬取https://www.jb51.net/article/220663.htm中的所有图片，并直接将所有图片下载到当前项目下的zhangsanxx文件夹下，zhangsan为你的姓名，xx为你的班级学号

使用urllib库爬取https://www.jb51.net/article/220663.htm中的所有图片，并直接将所有图片下载到当前项目下的zhangsanxx文件夹下，zhangsan为你的姓名，xx为你的班级学号

时间: 2024-11-03 19:14:14 浏览: 49

在Python中，我们可以使用`urllib`库配合`requests`和`BeautifulSoup`来爬取网页上的图片并保存。首先，需要安装这两个库，如果没有安装可以使用`pip install requests beautifulsoup4`命令安装。下面是一个简单的步骤说明： 1. 导入必要的库： ```python import os import urllib.request as req from bs4 import BeautifulSoup ``` 2. 定义要爬取的URL和目标文件夹路径： ```python url = "https://www.jb51.net/article/220663.htm" image_folder = "./zhangsan" + str(123) # 123为你的班级学号，这里仅做示例 ``` 3. 确保文件夹存在，如果不存在则创建： ```python if not os.path.exists(image_folder): os.makedirs(image_folder) ``` 4. 使用`requests`获取网页内容，然后解析HTML： ```python response = req.get(url) soup = BeautifulSoup(response.text, 'html.parser') ``` 5. 找到所有的img标签，提取src属性（图片链接）： ```python img_tags = soup.find_all('img') image_urls = [img['src'] for img in img_tags] ``` 6. 对于每个图片链接，下载并保存到指定文件夹： ```python for url in image_urls: file_name = os.path.join(image_folder, os.path.basename(url)) with open(file_name, 'wb') as f: f.write(req.urlopen(url).read()) ```

阅读全文