如何使用Python爬取图片?
时间: 2024-04-27 14:18:07 浏览: 9
以下是使用Python爬取图片的两种方法:
1. 使用urllib库和正则表达式爬取图片[^1]:
```python
import urllib.request
import re
def download_image(url, save_path):
response = urllib.request.urlopen(url)
image_data = response.read()
with open(save_path, 'wb') as f:
f.write(image_data)
def crawl_images(url):
html = urllib.request.urlopen(url).read().decode('utf-8')
img_urls = re.findall(r'<img.*?src="(.*?)".*?>', html)
for img_url in img_urls:
if img_url.startswith('http'):
download_image(img_url, img_url.split('/')[-1])
# 调用函数进行爬取
crawl_images('http://example.com')
```
2. 使用requests库和json解析爬取图片[^2]:
```python
import requests
import json
class BaiduSpider:
def __init__(self):
self.url = 'http://example.com?page={}&number={}'
self.headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
def parse_html(self, url):
img_html = requests.get(url=url, headers=self.headers).text
img_json = json.loads(img_html)
print(img_json)
def crawl(self):
page = 185 // 30 if 185 % 30 == 0 else 185 // 30 + 1
for number in range(page):
pn = number * 30
self.parse_html(self.url.format(pn, number))
# 调用函数进行爬取
baidu = BaiduSpider()
baidu.crawl()
```