首页爬取https://sc.chinaz.com/tupian/xiaogouxiaomaotupian.html中的动物图片

爬取https://sc.chinaz.com/tupian/xiaogouxiaomaotupian.html中的动物图片

时间: 2024-01-02 22:04:56 浏览: 24

可以使用Python中的requests和beautifulsoup4库来实现网页爬取和解析。首先，使用requests库获取网页内容： ```python import requests url = "https://sc.chinaz.com/tupian/xiaogouxiaomaotupian.html" response = requests.get(url) html = response.text ``` 接下来，使用beautifulsoup4库解析网页内容，提取出所有图片的链接： ```python from bs4 import BeautifulSoup soup = BeautifulSoup(html, "html.parser") img_list = soup.find_all("img", class_="lazy") img_urls = [img["data-original"] for img in img_list] ``` 最后，使用requests库下载图片到本地： ```python for i, url in enumerate(img_urls): response = requests.get(url) with open(f"animal_{i}.jpg", "wb") as f: f.write(response.content) ``` 完整代码如下： ```python import requests from bs4 import BeautifulSoup url = "https://sc.chinaz.com/tupian/xiaogouxiaomaotupian.html" response = requests.get(url) html = response.text soup = BeautifulSoup(html, "html.parser") img_list = soup.find_all("img", class_="lazy") img_urls = [img["data-original"] for img in img_list] for i, url in enumerate(img_urls): response = requests.get(url) with open(f"animal_{i}.jpg", "wb") as f: f.write(response.content) ``` 注意：在爬取网页时要遵守相关法律法规，不能违反网站的规定。此外，爬取图片时要注意版权问题，确保自己的行为不会侵犯他人的权益。