import requests from bs4 import BeautifulSoup url = 'http://www.ehaizhu.com/sf_8421D63EEBFF41B6A6238831B049D024_304_37B2B541446.html' r = requests.get(url) soup = BeautifulSoup(r.content) content = soup.contents('/html/body/div[2]/div/div[3]/div[2]/div[3]/div[1]/div/div[2]/div/div/div/div[2]/p[1]')#返回值类型为list text = content.get_text()

这段代码是用python语言写的，目的是获取网页http://www.ehaizhu.com/sf_8421D63EEBFF41B6A6238831B049D024_304_37B2B541446.html的内容。首先使用requests库对该网页进行请求，然后采用BeautifulSoup库进行解析。最后，获取了该网页特定位置节点（/html/body/div[2]/div/div[3]/div[2]/div[3]/div[1]/div/div[2]/div/div/div/div[2]/p[1]）的内容，并将其存入content变量中。

修改代码，使得li_list的编码格式是utf-8import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.text soup = BeautifulSoup(page_response, 'html.parser',from_encoding='utf-8') li_list = soup.select('#mypagehtmlcontent p')

import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.content.decode('utf-8') soup = BeautifulSoup(page_response, 'html.parser') li_list = soup.select('#mypagehtmlcontent p') # 以下为测试代码，可以不用理会 for li in li_list: print(li.text)

优化这段代码import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text print(title) print(content)

可以使用以下方法进行优化： 1. 添加异常处理，以防止请求失败或解析失败导致程序崩溃： ```python import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" try: response = requests.get(url) response.raise_for_status() # 抛出异常，如果请求失败 except requests.exceptions.RequestException as e: print(e) sys.exit(1) soup = BeautifulSoup(response.content, "html.parser") try: title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text except AttributeError as e: print(e) sys.exit(1) print(title) print(content) ``` 2. 使用`with`语句来管理资源，以确保在使用完文件或网络连接后自动释放资源： ```python import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" try: with requests.get(url) as response: response.raise_for_status() soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text except (requests.exceptions.RequestException, AttributeError) as e: print(e) sys.exit(1) print(title) print(content) ``` 这种方法可以避免手动关闭连接或文件，而且代码更加简洁。

相关推荐

下载社会学相关公开数据的简单爬虫 http://www.dingxing.gov.cn/czyslist-394-more.

http://python-requests.org/库的透明持久缓存-Python开发

python requests.get带header

import requests from bs4 import BeautifulSoup url = 'http://itc.hzau.edu.cn/' response = requests.get(url) soup = BeautifulSoup(response.content, 'html.parser') tag = soup.find('li', class_='menu-item menu-item-type-custom menu-item-object-custom menu-item-764') link = tag.a['href'] print(link)

以下代码爬取的内容是乱码，什么原因？from bs4 import BeautifulSoup import requests if name == 'main': url = 'https://www.pincai.com/article/2320333.htm' response = requests.get(url).text soup = BeautifulSoup(response, 'lxml')。帮我修改好代码

用python爬取https://www.jsszfhcxjstzhfwpt.com:12443/?ticket=cffd8ab8ec439f56b838b10582917787&gotoUrl=http://www.jszwfw.gov.cn/jsjis/front/login.do?uuid=3Pjiipr99xV0#/dashboard所有内容的代码

最新推荐

2024-2030全球与中国盐氯化系统市场现状及未来发展趋势.docx

基于深度学习的积灰检测识别-图像分类源码+数据集.zip

c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf

管理建模和仿真的文件

爬虫与大数据分析：挖掘数据价值，洞察趋势

matchers和find

建筑供配电系统相关课件.pptx

"互动学习：行动中的多样性与论文攻读经历"

爬虫性能优化：让爬虫跑得更快，更稳

rplidarA3激光雷达数据的可视化和存储