用Python语言编写爬取百度图片中猫和狗的图片各100张,将图片存入相应文件夹,两个文件夹分别为cat和dog的代码
时间: 2024-05-04 07:15:55 浏览: 34
很高兴为您提供以下Python代码来爬取百度图片中的猫和狗图片,并将它们分别存储在cat和dog的文件夹中。请确保在运行代码之前安装了必要的库。
```python
import requests
import os
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
def save_image(image_url, folder):
try:
if not os.path.exists(folder):
os.makedirs(folder)
response = requests.get(image_url, headers=headers)
filepath = os.path.join(folder, os.path.basename(image_url))
with open(filepath, 'wb') as f:
f.write(response.content)
print(f'Saved image: {filepath}')
except Exception as e:
print(f'Failed to save image: {image_url}')
print(e)
def get_image_urls(keyword, num_images):
try:
url = f'https://image.baidu.com/search/index?tn=baiduimage&word={keyword}&pn=0&rn={num_images}'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
results = soup.find_all('img', {'class': 'main_img'})
image_urls = []
for result in results:
url = result['data-imgurl']
image_urls.append(url)
return image_urls
except Exception as e:
print(f'Failed to get image URLs for {keyword}')
print(e)
cat_image_urls = get_image_urls('猫', 100)
dog_image_urls = get_image_urls('狗', 100)
for url in cat_image_urls:
save_image(url, 'cat')
for url in dog_image_urls:
save_image(url, 'dog')
```
这段代码将通过使用BeautifulSoup解析HTML代码检索图片,并将其保存到文件夹中。请确保您的网络连接正常,并且您还没有在爬取时违反任何有关数据使用的法规。
阅读全文