python爬取项目最经典的360墙纸项目,墙纸网址:https://image.so.com/c?ch=wallpaper#/; 要求: 爬取5个页面墙纸,每页墙纸大约30张照片;
时间: 2024-10-09 10:04:46 浏览: 15
在Python中创建一个经典360壁纸项目的爬虫,你可以使用requests库来获取网页内容,然后BeautifulSoup库来解析HTML并提取图片链接。这里是一个基本示例,假设你想要爬取前5页,每页30张照片:
```python
import requests
from bs4 import BeautifulSoup
import time
# 定义爬取的总页数和每页图片数
total_pages = 5
images_per_page = 30
def get_image_links(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
# 找到所有图片的a标签
image_links = []
for a_tag in soup.find_all('a', class_='img'):
if 'data-src' in a_tag.attrs:
image_links.append(a_tag['data-src'])
return image_links
def save_images(image_links, page_num):
# 创建保存图片的目录(如果不存在)
if not os.path.exists(f'downloads/page_{page_num}'):
os.makedirs(f'downloads/page_{page_num}')
for i, link in enumerate(image_links[:images_per_page]):
response = requests.get(link)
filename = f'downloads/page_{page_num}/{i+1}.jpg' # 假设图片为.jpg格式
with open(filename, 'wb') as f:
f.write(response.content)
for page in range(1, total_pages + 1):
start_time = time.time()
print(f"Crawling page {page}...")
url = f"https://image.so.com/c?ch=wallpaper#/&pn={page}"
image_links = get_image_links(url)
save_images(image_links, page)
print(f"Downloaded {len(image_links)} images on page {page}, took {time.time() - start_time:.2f} seconds.")
print("Finished crawling and saving all images.")
```
这个脚本将下载每一页的前30张图片,总共500张,并将其保存到名为`downloads`的文件夹下,每一页对应一个子文件夹。