写python代码获取网页https://pixabay.com/zh/editors_choice/?media_type=photo&pagi=1的图片 且网址的pagi参数可以从
时间: 2023-12-08 13:07:02 浏览: 105
您可以使用Python的requests和BeautifulSoup库来实现网页图片的爬取。以下是一个获取https://pixabay.com/zh/editors_choice/?media_type=photo&pagi=1页面图片的示例代码:
```python
import requests
from bs4 import BeautifulSoup
def download_image(url, filename):
response = requests.get(url)
with open(filename, 'wb') as file:
file.write(response.content)
print(f"Downloaded {filename}")
def scrape_images(page_url):
response = requests.get(page_url)
soup = BeautifulSoup(response.content, 'html.parser')
image_elements = soup.find_all('img')
for image_element in image_elements:
image_url = image_element['src']
if image_url.startswith('https://cdn.pixabay.com'):
filename = image_url.split('/')[-1]
download_image(image_url, filename)
# 构造页面URL
base_url = 'https://pixabay.com/zh/editors_choice/?media_type=photo&pagi='
start_page = 1
end_page = 5 # 设置想要爬取的页数范围
for page_num in range(start_page, end_page + 1):
page_url = base_url + str(page_num)
scrape_images(page_url)
```
上述代码中,我们定义了两个函数:`download_image`用于下载图片,`scrape_images`用于解析页面并获取所有图片的URL。在主程序中,我们构造了页面的URL,并使用`scrape_images`函数来爬取每一页的图片。
请注意,为了避免下载非法或者无关的图片,我们在代码中添加了一个判断条件,只下载URL以`https://cdn.pixabay.com`开头的图片。
您可以根据需要调整`start_page`和`end_page`变量来设定您想要爬取的页数范围。
阅读全文