import requests from bs4 import BeautifulSoup url = 'http://www.ehaizhu.com/sf_8421D63EEBFF41B6A6238831B049D024_304_37B2B541446.html' r = requests.get(url) soup = BeautifulSoup(r.content) content = soup.contents('/html/body/div[2]/div/div[3]/div[2]/div[3]/div[1]/div/div[2]/div/div/div/div[2]/p[1]')#返回值类型为list text = content.get_text()
时间: 2023-06-06 12:07:20 浏览: 51
这段代码是用python语言写的,目的是获取网页http://www.ehaizhu.com/sf_8421D63EEBFF41B6A6238831B049D024_304_37B2B541446.html的内容。首先使用requests库对该网页进行请求,然后采用BeautifulSoup库进行解析。最后,获取了该网页特定位置节点(/html/body/div[2]/div/div[3]/div[2]/div[3]/div[1]/div/div[2]/div/div/div/div[2]/p[1])的内容,并将其存入content变量中。
相关问题
修改代码,使得li_list的编码格式是utf-8import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.text soup = BeautifulSoup(page_response, 'html.parser',from_encoding='utf-8') li_list = soup.select('#mypagehtmlcontent p')
import requests
from bs4 import BeautifulSoup
url = 'https://www.icbc.com.cn/page/827855918799994880.html'
response = requests.get(url=url)
page_response = response.content.decode('utf-8')
soup = BeautifulSoup(page_response, 'html.parser')
li_list = soup.select('#mypagehtmlcontent p')
# 以下为测试代码,可以不用理会
for li in li_list:
print(li.text)
优化这段代码import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text print(title) print(content)
可以使用以下方法进行优化:
1. 添加异常处理,以防止请求失败或解析失败导致程序崩溃:
```python
import requests
from bs4 import BeautifulSoup
url = "https://www.gupang.com/202305/64619.html"
try:
response = requests.get(url)
response.raise_for_status() # 抛出异常,如果请求失败
except requests.exceptions.RequestException as e:
print(e)
sys.exit(1)
soup = BeautifulSoup(response.content, "html.parser")
try:
title = soup.find("h1", class_="title").text
content = soup.find("div", class_="content").text
except AttributeError as e:
print(e)
sys.exit(1)
print(title)
print(content)
```
2. 使用`with`语句来管理资源,以确保在使用完文件或网络连接后自动释放资源:
```python
import requests
from bs4 import BeautifulSoup
url = "https://www.gupang.com/202305/64619.html"
try:
with requests.get(url) as response:
response.raise_for_status()
soup = BeautifulSoup(response.content, "html.parser")
title = soup.find("h1", class_="title").text
content = soup.find("div", class_="content").text
except (requests.exceptions.RequestException, AttributeError) as e:
print(e)
sys.exit(1)
print(title)
print(content)
```
这种方法可以避免手动关闭连接或文件,而且代码更加简洁。