url="https://www.woyaogexing.com/tupian/weimei/2023/213014.html" headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"} res=requests.get(url,headers=headers) soup=bs4.BeautifulSoup(res.text,'html.parser') img_soup=soup.find_all('img',class_="lazy") for ench in img_soup: img_url=ench["src"] print(type(img_url)) imgname=os.path.basename(img_url) imgpath=os.path.join('oooo',imgname) with open (imgpath,'w') as f: img_data=requests.get(img_url).content print(img_data) f.write(img_data)
时间: 2024-04-04 19:30:15 浏览: 147
这段代码的作用是爬取网页 https://www.woyaogexing.com/tupian/weimei/2023/213014.html 中所有带有 class="lazy" 的 img 标签的图片,并将图片保存到本地文件夹 oooo 中。
但是代码中存在一个问题,就是在打开文件的时候使用了 'w' 模式,这会将文件以文本形式打开并写入数据,而图片是二进制数据,因此应该使用 'wb' 模式打开文件。另外,建议使用 try-except 语句来处理请求或保存文件时可能出现的异常。
可以参考以下代码:
```
import os
import requests
import bs4
url = "https://www.woyaogexing.com/tupian/weimei/2023/213014.html"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
}
res = requests.get(url, headers=headers)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
img_soup = soup.find_all('img', class_="lazy")
for ench in img_soup:
img_url = ench["src"]
print(type(img_url))
img_name = os.path.basename(img_url)
img_path = os.path.join('oooo', img_name)
try:
with open(img_path, 'wb') as f:
img_data = requests.get(img_url).content
f.write(img_data)
print(f"Saved {img_name} successfully!")
except Exception as e:
print(f"Failed to save {img_name} due to {e}")
```
阅读全文