请优化上述代码,使其print出的不是乱码
时间: 2024-11-23 21:49:43 浏览: 5
如果你遇到打印结果为乱码的情况,可能是编码问题导致的。在Python中处理字符串编码时,需要确保输入和输出的编码格式一致。如果网页返回的是非UTF-8编码的数据,你需要手动设置相应的解码器。以下是修改后的代码,添加了解码步骤:
```python
import requests
from bs4 import BeautifulSoup
import chardet # 需要安装此库用于检测字符集,使用pip install chardet
def detect_encoding(response):
encoding = chardet.detect(response.content)['encoding']
return encoding
url = "https://www.ctbu.edu.cn/index/xxxw.htm"
response = requests.get(url)
encoding = detect_encoding(response) # 检测并确定编码
if response.status_code == 200:
try:
# 尝试设置正确的编码解析HTML
response_text = response.text.decode(encoding)
soup = BeautifulSoup(response_text, 'html.parser')
h4_nodes = soup.find_all('h4')
p_nodes = soup.find_all('p')
for node in h4_nodes + p_nodes:
print(f"标签: {node.name}, 内容: {node.text.strip()}")
except UnicodeDecodeError as e:
print("编码错误:", str(e))
else:
print(f"无法加载页面,状态码: {response.status_code}")
阅读全文