能否帮我写一段爬取百度图片的程序,并加上一些反反爬措施
时间: 2024-05-13 19:13:48 浏览: 80
当然可以,以下是一个简单的Python程序,使用BeautifulSoup和requests库来爬取百度图片,并加上了一些基本的反反爬措施:
```
import requests
from bs4 import BeautifulSoup
import os
base_url = 'https://image.baidu.com/search/index?tn=baiduimage&word='
keyword = 'puppy'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
def get_page_urls(url):
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'lxml')
img_urls = []
for img in soup.find_all('img', class_='main_img'):
img_urls.append(img['src'])
return img_urls
def download_imgs(img_urls, folder):
if not os.path.exists(folder):
os.makedirs(folder)
for i, img_url in enumerate(img_urls):
print('Downloading {}/{}...'.format(i + 1, len(img_urls)))
try:
response = requests.get(img_url, headers=headers, timeout=10)
with open(folder + '/{}.jpg'.format(i), 'wb') as f:
f.write(response.content)
except:
print('Fail to download {}...'.format(img_url))
def main():
url = base_url + keyword
img_urls = get_page_urls(url)
download_imgs(img_urls, keyword)
if __name__ == '__main__':
main()
```
注意:此程序仅供学习参考,实际应用中需要更加严格的反爬措施和代码优化。
阅读全文