第2关:爬取网站实训图片并下载
时间: 2024-06-09 16:09:44 浏览: 13
好的,这是一个比较有趣的任务。首先,我们需要确定要爬取哪个网站以及要爬取哪些图片。假设我们要爬取Unsplash网站上的风景图片,可以使用Python中的requests和Beautiful Soup模块进行网页爬取和解析,再使用Python中的urllib模块进行图片下载。
以下是实现步骤:
1.导入所需模块:
```python
import requests
from bs4 import BeautifulSoup
import os
import urllib.request
```
2.使用requests模块获取网页内容:
```python
url = 'https://unsplash.com/search/photos/landscape'
response = requests.get(url)
```
3.使用Beautiful Soup解析网页内容,获取图片链接:
```python
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = []
for img in img_tags:
urls.append(img['src'])
```
4.创建保存图片的文件夹:
```python
if not os.path.exists('landscape'):
os.makedirs('landscape')
```
5.使用urllib模块下载图片:
```python
for url in urls:
filename = os.path.join('landscape', url.split('/')[-1])
try:
urllib.request.urlretrieve(url, filename)
print('Success: ', url)
except:
print('Failed: ', url)
```
完整代码如下:
```python
import requests
from bs4 import BeautifulSoup
import os
import urllib.request
url = 'https://unsplash.com/search/photos/landscape'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = []
for img in img_tags:
urls.append(img['src'])
if not os.path.exists('landscape'):
os.makedirs('landscape')
for url in urls:
filename = os.path.join('landscape', url.split('/')[-1])
try:
urllib.request.urlretrieve(url, filename)
print('Success: ', url)
except:
print('Failed: ', url)
```
运行代码后,程序将会自动下载Unsplash网站上的风景图片,并保存在名为“landscape”的文件夹中。