python 实现代码：当response.status_code == 404，continue

当response.status_code == 404时，代码会跳过当前循环，继续执行下一个循环。这是因为HTTP状态码404表示请求的资源不存在，所以我们可以在代码中通过continue语句跳过这个不存在的资源，继续执行后面的代码。

from requests_html import HTMLSession import os class Spider: def init(self): self.base_url = 'https://s3-ap-northeast-1.amazonaws.com/data.binance.vision/data/spot/daily/klines' self.pair = '1INCHBTC' self.interval = '1d' self.session = HTMLSession() def get_urls(self): urls = [] # 首页 response = self.session.get(f'{self.base_url}/{self.pair}/{self.interval}/') if response.status_code == 200: for link in response.html.links: if link.endswith('.zip'): urls.append(link) # 分页 while True: response = self.session.get(response.html.links[-1]) if response.status_code != 200: # 请求失败 break for link in response.html.links: if link.endswith('.zip'): urls.append(link) if 'CHECKSUM' in response.html.links[-1]: break return urls def download_files(self): urls = self.get_urls() if not urls: print('下载失败') return if not os.path.exists('download_files'): os.mkdir('download_files') for url in urls: file_name = url.split('/')[-1] file_path = f'download_files/{file_name}' if os.path.exists(file_path): # 文件已存在 print(f'{file_name} 已存在') continue response = self.session.get(url) if response.status_code != 200: # 请求失败 print(f'{file_name} 下载失败') continue with open(file_path, 'wb') as f: f.write(response.content) print(f'{file_name} 下载成功') def run(self): self.download_files()

这是一个 Python 爬虫程序，用于从 Binance 网站下载 1INCHBTC 在 1 天内的 K 线数据。程序使用了 requests_html 库来处理 HTTP 请求和解析 HTML。在 Spider 类中，构造函数 __init__ 定义了程序需要的一些基础参数，如 Binance 的数据下载链接、交易对、时间间隔、HTTP 会话等。方法 get_urls 用来获取数据文件的下载链接列表，方法 download_files 用来下载这些数据文件，如果文件已存在则不会重复下载。最后，在 run 方法中调用了 download_files 方法来运行整个程序。

from requests_html import HTMLSession import os class Spider: def init(self): self.base_url = 'https://s3-ap-northeast-1.amazonaws.com/data.binance.vision/data/spot/daily/klines' self.pair = '1INCHBTC' self.interval = '1d' self.session = HTMLSession() def get_urls(self): urls = [] # 首页 response = self.session.get(f'{self.base_url}/{self.pair}/{self.interval}/') if response.status_code == 200: for link in response.html.links: if link.endswith('.zip'): urls.append(link) # 分页 while True: response = self.session.get(response.html.links[-1]) if response.status_code != 200: # 请求失败 break for link in response.html.links: if link.endswith('.zip'): urls.append(link) if 'CHECKSUM' in response.html.links[-1]: break return urls def download_files(self): urls = self.get_urls() if not urls: print('下载失败') return if not os.path.exists('download_files'): os.mkdir('download_files') for url in urls: file_name = url.split('/')[-1] file_path = f'/Users/huyang/Desktop/2023/Grach/{file_name}' if os.path.exists(file_path): # 文件已存在 print(f'{file_name} 已存在') continue response = self.session.get(url) if response.status_code != 200: # 请求失败 print(f'{file_name} 下载失败') continue with open(file_path, 'wb') as f: f.write(response.content) print(f'{file_name} 下载成功') def run(self): self.download_files()，将对应的下载数据的网站地址改为'http://www.greenfinance.org.cn/'

您需要更改 Spider 类中的以下属性和方法： ```python class Spider: def __init__(self): self.base_url = 'http://www.greenfinance.org.cn/' self.session = HTMLSession() def get_urls(self): urls = [] response = self.session.get(self.base_url) if response.status_code == 200: for link in response.html.links: if link.endswith('.xls') or link.endswith('.xlsx'): urls.append(link) return urls def download_files(self): urls = self.get_urls() if not urls: print('下载失败') return if not os.path.exists('download_files'): os.mkdir('download_files') for url in urls: file_name = url.split('/')[-1] file_path = f'/Users/huyang/Desktop/2023/Grach/{file_name}' if os.path.exists(file_path): print(f'{file_name} 已存在') continue response = self.session.get(url) if response.status_code != 200: print(f'{file_name} 下载失败') continue with open(file_path, 'wb') as f: f.write(response.content) print(f'{file_name} 下载成功') def run(self): self.download_files() ``` 这里的 `get_urls()` 方法会获取 http://www.greenfinance.org.cn/ 页面中所有以 `.xls` 或 `.xlsx` 结尾的链接，而 `download_files()` 方法会下载这些链接指向的文件。请注意更改 `file_path` 变量以指向您想要保存文件的位置。最后，您可以通过调用 `run()` 方法来运行该类并下载文件。

阅读全文

python 实现代码：当response.status_code == 404，continue

相关推荐

404代码。

python爬取网页时response.status_code返回418，文件读取写入

Python中if __name__ == '__main__'作用解析

【单元测试艺术】：使用django.test实现最佳实践

Django视图扩展秘籍：自定义django.views.generic.base以满足特定需求的6个步骤

【Django异常优雅管理之道】：从django.core.exceptions到最佳实践

【Django模板加载深入】：理解django.template.loader背后的模板渲染流程

Python 编程初探：入门必备知识

代码管理的艺术：google.appengine.runtime的版本控制策略

优雅处理Python异常链：传递异常的最佳实践

Mastering Python and MySQL: Advanced Transaction Handling and Stored Procedures

大数据下的Python搜索算法：分布式计算实战应用

Python and MySQL: Best Practices for Building RESTful APIs to Store Data

生产级配置：django.core.mail在生产环境中的部署策略

HTTP Protocol Analysis: Request Methods, Status Codes, and Header Information

【Python开发者必备】boto.s3.key与Django的集成实践

实现Python 代码：如果请求链接错误代码是404，则返回空值或者跳过当前循环，继续后面的代码

最新推荐

数据库基础测验20241113.doc

高清艺术文字图标资源，PNG和ICO格式免费下载

管理建模和仿真的文件

DMA技术：绕过CPU实现高效数据传输

SGM8701电压比较器如何在低功耗电池供电系统中实现高效率运作？

mui框架HTML5应用界面组件使用示例教程

"互动学习：行动中的多样性与论文攻读经历"

【数据传输高速公路】：总线系统的深度解析

如何结合PID算法调整PWM信号来优化电机速度控制？请提供实现这一过程的步骤和代码示例。

Vue.js开发利器：chrome-vue-devtools插件解析

Python中if name == 'main'作用解析