import requests from bs4 import BeautifulSoup url = "https://bj.zu.anjuke.com/fangyuan/p1/" headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3", } response = requests.get(url, headers=headers) soup = BeautifulSoup(response.text, "html.parser") house_list = soup.select(".zu-itemmod") for house in house_list: title = house.select(".house-title > a")[0].text.strip() price = house.select(".zu-side > p > strong")[0].text.strip() area = house.select(".details-item > span:nth-child(1)")[0].text.strip() print(title, price, area)这段代码报错：list index out of range

时间: 2023-06-09 15:02:10 浏览: 106

这个问题可能是因为 house_list 列表为空导致的。你可以检查一下网页源代码是否正确，以及是否选择出了正确的 class，以确保 house_list 中包含了需要的信息。你也可以在循环之前先判断一下 house_list 是否为空，如果不为空再执行循环。例如： if house_list: for house in house_list: # do something else: print("No house found on this page.")

优化这段代码import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" response = requests.get(url) soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text print(title) print(content)

可以使用以下方法进行优化： 1. 添加异常处理，以防止请求失败或解析失败导致程序崩溃： ```python import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" try: response = requests.get(url) response.raise_for_status() # 抛出异常，如果请求失败 except requests.exceptions.RequestException as e: print(e) sys.exit(1) soup = BeautifulSoup(response.content, "html.parser") try: title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text except AttributeError as e: print(e) sys.exit(1) print(title) print(content) ``` 2. 使用`with`语句来管理资源，以确保在使用完文件或网络连接后自动释放资源： ```python import requests from bs4 import BeautifulSoup url = "https://www.gupang.com/202305/64619.html" try: with requests.get(url) as response: response.raise_for_status() soup = BeautifulSoup(response.content, "html.parser") title = soup.find("h1", class_="title").text content = soup.find("div", class_="content").text except (requests.exceptions.RequestException, AttributeError) as e: print(e) sys.exit(1) print(title) print(content) ``` 这种方法可以避免手动关闭连接或文件，而且代码更加简洁。

修改代码，使得li_list的编码格式是utf-8import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.text soup = BeautifulSoup(page_response, 'html.parser',from_encoding='utf-8') li_list = soup.select('#mypagehtmlcontent p')

import requests from bs4 import BeautifulSoup url = 'https://www.icbc.com.cn/page/827855918799994880.html' response = requests.get(url=url) page_response = response.content.decode('utf-8') soup = BeautifulSoup(page_response, 'html.parser') li_list = soup.select('#mypagehtmlcontent p') # 以下为测试代码，可以不用理会 for li in li_list: print(li.text)

相关推荐

java坑爹的笔试题-gitee-bullshit-codes:从https://gitee.com/oschina/bullshit-cod

http://python-requests.org/库的透明持久缓存-Python开发

ofborg：@ofborg工具自动化https：//monitoring.nix.cidashboarddbofborg

使用requests库和BeautifulSoup爬取该网站https://top.baidu.com/board?tab=realtime

以下代码爬取的内容是乱码，什么原因？from bs4 import BeautifulSoup import requests if name == 'main': url = 'https://www.pincai.com/article/2320333.htm' response = requests.get(url).text soup = BeautifulSoup(response, 'lxml')。帮我修改好代码

url = 'https://weibo.com/ajax/statuses/searchProfile'爬取微博python

用python爬取https://www.jsszfhcxjstzhfwpt.com:12443/?ticket=cffd8ab8ec439f56b838b10582917787&gotoUrl=http://www.jszwfw.gov.cn/jsjis/front/login.do?uuid=3Pjiipr99xV0#/dashboard所有内容的代码

最新推荐

RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz

管理建模和仿真的文件

：YOLOv1目标检测算法：实时目标检测的先驱，开启计算机视觉新篇章

ActionContext.getContext().get()代码含义

c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf

"互动学习：行动中的多样性与论文攻读经历"

：YOLO目标检测算法的挑战与机遇：数据质量、计算资源与算法优化，探索未来发展方向

设计一个算法，输出在顺序表｛3，6，2，10，1，8，5，7，4，9｝中采用顺序方法查找关键字5的过程。

建筑供配电系统相关课件.pptx

关系数据表示学习