Python爬虫 爬取百度网照片
时间: 2023-07-02 08:03:52 浏览: 148
使用Python爬虫爬取百度网照片可以通过以下步骤实现:
1. 导入需要的库:requests、re、os
```python
import requests
import re
import os
```
2. 定义函数,获取网页源代码
```python
def get_html(url):
try:
r = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return ""
```
3. 定义函数,从网页源代码中提取图片链接
```python
def get_img(html):
img_list = re.findall(r'"thumbURL":"(.*?)",', html)
return img_list
```
4. 定义函数,下载图片并保存
```python
def download_img(img_list, path):
if not os.path.exists(path):
os.makedirs(path)
for i, img_url in enumerate(img_list):
try:
r = requests.get(img_url, timeout=30)
with open(os.path.join(path, '{0}.jpg'.format(i)), 'wb') as f:
f.write(r.content)
print('图片{0}下载成功'.format(i))
except:
print('图片{0}下载失败'.format(i))
```
5. 调用上述函数实现爬虫
```python
if __name__ == '__main__':
url = 'https://image.baidu.com/search/index?tn=baiduimage&word=%E6%98%A5%E6%99%9A%E9%9B%AA%E6%99%AF'
html = get_html(url)
img_list = get_img(html)
path = './images'
download_img(img_list, path)
```
其中,url是需要爬取的网页地址,get_html函数用来获取网页源代码,get_img函数用来从网页源代码中提取图片链接,download_img函数用来下载图片并保存,path是保存图片的路径。
阅读全文