import requests from bs4 import BeautifulSoup url = "https://www.dpm.org.cn/lights/royal/p/1.html" #代码开始 r=requests.get(url) r.encoding='utf-8' soup=BeautifulSoup(r.text,'html.parser') pics=soup.find_all("div",class_="pic") xh=1 for pic in pics: x=pic.find("img") imglj=x.attrs["src"] imgmz=x.attrs["title"].strip() #代码开始 r=requests.get(imglj) cpmc="image//"+imgmz+".jpg" f1=open(cpmc,"bw") f1.write(r.content) f1.close() xh+=1 报错 No such file or directory: 'image//清沈全沈世杰沈世儒合笔婴戏图贴落.jpg'

时间: 2024-03-04 12:50:32 浏览: 145

这段代码报错是因为尝试在不存在的目录下创建文件。可以先在程序中加入一个创建目录的函数，如下所示： ```python import os def mkdir(path): folder = os.path.exists(path) if not folder: os.makedirs(path) url = "https://www.dpm.org.cn/lights/royal/p/1.html" r=requests.get(url) r.encoding='utf-8' soup=BeautifulSoup(r.text,'html.parser') pics=soup.find_all("div",class_="pic") xh=1 mkdir("image") # 创建目录 for pic in pics: x=pic.find("img") imglj=x.attrs["src"] imgmz=x.attrs["title"].strip() r=requests.get(imglj) cpmc="image/"+imgmz+".jpg" # 修改文件路径 f1=open(cpmc,"bw") f1.write(r.content) f1.close() xh+=1 ``` 这样就能够在程序所在目录下创建一个名为 `image` 的目录，并在其中存储下载的图片了。

修改代码，使得li_list的编码格式是utf-8import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.text soup = BeautifulSoup(page_response, 'html.parser',from_encoding='utf-8') li_list = soup.select('#mypagehtmlcontent p')

import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.content.decode('utf-8') soup = BeautifulSoup(page_response, 'html.parser') li_list = soup.select('#mypagehtmlcontent p') # 以下为测试代码，可以不用理会 for li in li_list: print(li.text)

优化这段代码import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text print(title) print(content)

可以使用以下方法进行优化： 1. 添加异常处理，以防止请求失败或解析失败导致程序崩溃： ```python import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" try: response = requests.get(url) response.raise_for_status() # 抛出异常，如果请求失败 except requests.exceptions.RequestException as e: print(e) sys.exit(1) soup = BeautifulSoup(response.content, "html.parser") try: title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text except AttributeError as e: print(e) sys.exit(1) print(title) print(content) ``` 2. 使用`with`语句来管理资源，以确保在使用完文件或网络连接后自动释放资源： ```python import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" try: with requests.get(url) as response: response.raise_for_status() soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text except (requests.exceptions.RequestException, AttributeError) as e: print(e) sys.exit(1) print(title) print(content) ``` 这种方法可以避免手动关闭连接或文件，而且代码更加简洁。

阅读全文

相关推荐

import reimport requestsfrom bs4 import BeautifulSoupimport t

import sys import os import urllib from bs4 import BeautifulSoup

BS4_BeautifulSoup.docx

url=https://www.dpm.org.cn/lights/royal/p/81.html 获取网页的源码，保存到gugong index.html文件中 2、任意5张图抓取到本地，并命名为1.jpg、2.jpg、3.jpg 用python做

以下代码爬取的内容是乱码，什么原因？from bs4 import BeautifulSoup import requests if name == 'main': url = 'https://www.pincai.com/article/2320333.htm' response = requests.get(url).text soup = BeautifulSoup(response, 'lxml')。帮我修改好代码

本关任务：编写一个获取故宫壁纸网页的第一张图片的信息。 学习视频 python故宫网页壁纸图片爬取(致远工作室） 故宫博物院壁纸的网页如下: https://www.dpm.org.cn/lights/royal/p/1.html

使用requests库和BeautifulSoup爬取该网站https://top.baidu.com/board?tab=realtime

TAIEX数据：可从https://www.twse.com.tw获取Json原始数据

使用Python通过requests库发送HTTP请求，并使用BeautifulSoup库分析HTML页面来抓取https://www.taobao.com/

爬https://www.shicimingju.com/book/xiyouji/1.html代码

爬取https://www.yanyunxiaoshuo.com/xs/228367/92679328.html网站的文本

用3种方式实现采集以下页面的新闻标题： https://www.solidot.org/ 方式1： urllib + 正则 完成相应的python代码

爬取https://quote.cngold.org/gjs/swhj_zghj.html网站黄金的价格

python爬取https://www.baidu.com/网站数据

大家在看

中子针孔成像点扩展函数模拟研究

华为组播PIM-SM过程总结

HCNP-WLAN-CEWA(H12-321)题库.pdf

汽车电子通信协议SAE J2284

异常处理-mipsCPU简介

最新推荐

掌握HTML/CSS/JS和Node.js的Web应用开发实践

管理建模和仿真的文件

计算机体系结构概述：基础概念与发展趋势

int a[][3]={{1,2},{4}}输出这个数组

勒玛算法研讨会项目：在线商店模拟与Qt界面实现

"互动学习：行动中的多样性与论文攻读经历"

【计算机组成原理精讲】：从零开始深入理解计算机硬件

vue2加载高德地图

Edge语法革新：打造WPF界面新体验

关系数据表示学习

本关任务：编写一个获取故宫壁纸网页的第一张图片的信息。学习视频 python故宫网页壁纸图片爬取(致远工作室）故宫博物院壁纸的网页如下: https://www.dpm.org.cn/lights/royal/p/1.html

用3种方式实现采集以下页面的新闻标题： https://www.solidot.org/ 方式1： urllib + 正则完成相应的python代码