首页编写简单的爬虫程序，爬取http://www.netbian.com/页面上10幅图像，并保存到本地文件夹中

编写简单的爬虫程序，爬取http://www.netbian.com/页面上10幅图像，并保存到本地文件夹中

时间: 2024-12-04 11:26:36 浏览: 35

要编写一个简单的爬虫程序来爬取指定网页上的图片，我们可以使用Python的`requests`和`BeautifulSoup`库。首先确保已安装这两个库，如有缺失可通过pip安装： ```shell pip install requests beautifulsoup4 ``` 然后，按照以下步骤创建一个简单的爬虫脚本： 1. 导入所需的库[^1]: ```python import requests from bs4 import BeautifulSoup import os ``` 2. 定义要抓取的URL[^2]: ```python url = "http://www.netbian.com/" ``` 3. 使用`requests.get`发送HTTP请求并解析HTML: ```python response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') ``` 4. 查找图片标签（通常为`<img>`），并提取src属性: ```python images = soup.find_all('img', src=True) ``` 5. 创建一个图片保存路径，如果不存在则创建: ```python image_dir = 'images_from_netbian' if not os.path.exists(image_dir): os.makedirs(image_dir) ``` 6. 保存图片到本地文件夹: ```python for img in images[:10]: # 获取前10张图片 img_url = img['src'] response_img = requests.get(img_url) with open(os.path.join(image_dir, img_url.split('/')[-1]), 'wb') as f: f.write(response_img.content) ``` 运行此脚本后，它将在指定目录下保存从`http://www.netbian.com/`页面上抓取的前10张图片。

阅读全文