编写一段Python网络爬虫根据输入内容爬取图片保存在相应的文件下不使用“beautifulsoup”
时间: 2024-09-24 07:26:38 浏览: 48
编写一个简单的Python网络爬虫来抓取图片并保存到指定目录,我们可以利用`requests`库获取网页内容,然后使用`re`模块处理HTML,定位图片URL,并使用`os`和`urllib`库下载图片。下面是一个基本的例子:
```python
import os
import requests
from urllib.parse import urljoin
# 指定要爬取的网站和目标图片文件夹路径
site_url = 'https://example.com'
image_folder = 'downloaded_images'
# 创建目标文件夹,如果不存在的话
if not os.path.exists(image_folder):
os.makedirs(image_folder)
def download_image(url, save_path):
response = requests.get(url)
with open(save_path, 'wb') as f:
f.write(response.content)
def find_image_urls(html_content):
# 此处假设图片URL都在img标签src属性里
img_pattern = r'src="([^"]+)"'
return [urljoin(site_url, match.group(1)) for match in re.finditer(img_pattern, html_content)]
def main():
target_url = input("请输入需要爬取的页面地址: ")
page_response = requests.get(target_url)
if page_response.status_code == 200:
images_on_page = find_image_urls(page_response.text)
for image_url in images_on_page:
filename = os.path.basename(image_url) # 获取图片文件名
save_path = os.path.join(image_folder, filename)
download_image(image_url, save_path)
print(f"已保存图片: {save_path}")
else:
print(f"无法访问该页面: {page_response.status_code}")
if __name__ == '__main__':
main()
```
阅读全文