按照关键字从百度爬取图片,python代码
时间: 2023-07-22 17:03:39 浏览: 109
python爬虫之爬取百度云源代码.7z
下面是一个使用Python编写的示例代码,可以根据关键字从百度图片搜索中爬取图片:
```python
import requests
import os
def download_images(keyword, num_images):
url = 'http://image.baidu.com/search/flip'
params = {
'tn': 'baiduimage',
'word': keyword,
'pn': 0,
'rn': 30,
'gsm': '3c',
}
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
'Referer': 'http://image.baidu.com',
}
try:
os.makedirs(keyword)
except OSError as e:
if e.errno != 17:
raise
for i in range(0, num_images, 30):
params['pn'] = i
resp = requests.get(url, params=params, headers=headers)
resp.raise_for_status()
data = resp.json()
for j, img in enumerate(data['data']):
try:
img_url = img['objURL']
img_resp = requests.get(img_url, timeout=10)
img_resp.raise_for_status()
with open(os.path.join(keyword, f'{i + j}.jpg'), 'wb') as f:
f.write(img_resp.content)
print(f'Successfully downloaded {i + j}.jpg')
except Exception as e:
print(f'Error occurred while downloading {i + j}.jpg: {str(e)}')
if __name__ == '__main__':
keyword = input('Enter keyword: ')
num_images = int(input('Enter number of images to download: '))
download_images(keyword, num_images)
```
请注意,此代码使用了requests库来发送HTTP请求,因此您需要确保已安装该库。此外,代码中使用了os和json库来处理文件和JSON数据。在运行代码之前,请确保您已安装这些库。
使用此代码,您可以输入要搜索的关键字和要下载的图像数量,然后它将从百度图片搜索中爬取图像并保存到以关键字命名的文件夹中。请注意,爬取图片可能会涉及到版权问题,请确保遵守相关法律法规。
阅读全文