能否帮我写一段爬取百度图片的程序
时间: 2024-06-10 17:08:22 浏览: 20
当然可以!你可以使用Python的第三方库requests和BeautifulSoup来爬取百度图片。以下是一个简单的代码段:
```python
import requests
from bs4 import BeautifulSoup
url = 'https://image.baidu.com/search/index?tn=baiduimage&word=美女'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
imgs = soup.find_all('img')
for img in imgs:
print(img['src'])
```
上面这段代码会根据关键词“美女”在百度图片中进行搜索,并输出所有搜索结果中的图片链接。你可以进一步完善这个代码,将获取到的图片链接下载到本地,或者使用其他第三方库进行存储和处理。
相关问题
能否帮我写一段爬取百度图片的程序,并加上一些反反爬措施
当然可以,以下是一个简单的Python程序,使用BeautifulSoup和requests库来爬取百度图片,并加上了一些基本的反反爬措施:
```
import requests
from bs4 import BeautifulSoup
import os
base_url = 'https://image.baidu.com/search/index?tn=baiduimage&word='
keyword = 'puppy'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
def get_page_urls(url):
html = requests.get(url, headers=headers).text
soup = BeautifulSoup(html, 'lxml')
img_urls = []
for img in soup.find_all('img', class_='main_img'):
img_urls.append(img['src'])
return img_urls
def download_imgs(img_urls, folder):
if not os.path.exists(folder):
os.makedirs(folder)
for i, img_url in enumerate(img_urls):
print('Downloading {}/{}...'.format(i + 1, len(img_urls)))
try:
response = requests.get(img_url, headers=headers, timeout=10)
with open(folder + '/{}.jpg'.format(i), 'wb') as f:
f.write(response.content)
except:
print('Fail to download {}...'.format(img_url))
def main():
url = base_url + keyword
img_urls = get_page_urls(url)
download_imgs(img_urls, keyword)
if __name__ == '__main__':
main()
```
注意:此程序仅供学习参考,实际应用中需要更加严格的反爬措施和代码优化。
帮我写一个爬取百度图片的python爬虫
以下是一个基本的爬取百度图片的Python爬虫:
```python
import requests
import os
def download_pic(pic_url, save_path):
response = requests.get(pic_url)
with open(save_path, 'wb') as f:
f.write(response.content)
def search_and_download(keyword, download_path):
url = 'https://image.baidu.com/search/index?tn=baiduimage&word={}'.format(keyword)
response = requests.get(url)
pic_urls = response.json()['data']
for i, pic_info in enumerate(pic_urls):
pic_url = pic_info['large_image_url']
save_path = os.path.join(download_path, '{}.jpg'.format(i))
download_pic(pic_url, save_path)
if __name__ == '__main__':
keyword = '猫'
download_path = './download'
if not os.path.exists(download_path):
os.mkdir(download_path)
search_and_download(keyword, download_path)
```
该爬虫首先构造了一个百度图片搜索的URL,然后发送请求获取搜索结果,解析出每张图片的URL,并下载保存到本地。需要注意的是,由于百度图片的URL可能会变化,因此该爬虫并不一定能够一直正常工作。
相关推荐
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![](https://csdnimg.cn/download_wenku/file_type_ask_c1.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![application/x-rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![pptx](https://img-home.csdnimg.cn/images/20210720083543.png)
![exe](https://img-home.csdnimg.cn/images/20210720083343.png)