python爬取两千张哈士奇图片
时间: 2023-10-29 08:07:49 浏览: 38
可以使用Python中的requests和beautifulsoup库来爬取图片,具体步骤如下:
1. 导入requests和beautifulsoup库
```python
import requests
from bs4 import BeautifulSoup
```
2. 定义一个函数,用于获取图片链接
```python
def get_image_urls(keyword):
urls = []
for i in range(0, 2000, 20):
url = 'https://www.google.com/search?q={}&rlz=1C1GCEU_zh-CNCN832CN832&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiJ6KHJwM_iAhWJyDgGHQICD3QQ_AUIDigB&biw=1366&bih=657&start={}'.format(keyword, i)
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
for img in soup.find_all('img'):
urls.append(img.get('src'))
return urls
```
该函数可以根据关键词获取图片链接,每次获取20张图片,循环获取2000张图片。
3. 定义一个函数,用于下载图片
```python
def download_images(urls):
for i, url in enumerate(urls):
response = requests.get(url)
with open('image{}.jpg'.format(i+1), 'wb') as f:
f.write(response.content)
```
该函数可以根据图片链接下载图片,并以'image数字.jpg'的形式保存在本地。
4. 调用函数,实现图片的爬取和下载
```python
if __name__ == '__main__':
keyword = '哈士奇'
urls = get_image_urls(keyword)
download_images(urls)
```
以上代码可以实现爬取并下载2000张哈士奇图片,保存在当前目录下。