如何用python写一个爬取网页 图片的程序
时间: 2024-12-24 21:41:06 浏览: 4
在Python中编写一个简单的网页图片爬虫,你可以使用requests库来获取网页内容,然后BeautifulSoup库解析HTML,定位到图片链接,最后使用urllib或者os模块下载图片。以下是一个基本的步骤:
1. 首先安装必要的库:
```
pip install requests beautifulsoup4
```
2. 爬取网页并解析HTML:
```python
import requests
from bs4 import BeautifulSoup
def fetch_images(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 定义图片标签,例如最常见的<img>标签
img_tags = soup.find_all('img')
# 提取图片URL
image_urls = [img['src'] for img in img_tags if 'src' in img.attrs]
return image_urls
# 使用函数
url_to_crawl = "http://example.com" # 替换为你想要爬取的网站地址
image_links = fetch_images(url_to_crawl)
print(image_links)
```
3. 下载图片(这里我们使用os库):
```python
import os
def download_images(url_list, save_path):
if not os.path.exists(save_path):
os.makedirs(save_path)
for url in url_list:
filename = os.path.join(save_path, os.path.basename(url)) # 获取文件名
with open(filename, 'wb') as f:
response = requests.get(url, stream=True)
for chunk in response.iter_content(1024): # 分块下载
if chunk:
f.write(chunk)
# 设置保存路径
save_dir = "./images"
download_images(image_links, save_dir)
阅读全文