首页在源代码的基础上完善代码，并且实现更多功能import requests url = "https://www.example.com" # 替换成你要爬取的网站的URL response = requests.get(url) with open("output.html", "w", encoding="utf-8") as f: f.write(response.text)

在源代码的基础上完善代码，并且实现更多功能import requests url = "https://www.example.com" # 替换成你要爬取的网站的URL response = requests.get(url) with open("output.html", "w", encoding="utf-8") as f: f.write(response.text)

时间: 2024-01-10 15:04:34 浏览: 140

好的，以下是一个稍微完善了一些的Python爬虫代码，它可以获取指定url的HTML内容并将其保存到本地文件中，同时还可以设置请求头、使用代理、处理异常等： ```python import requests url = "https://www.example.com" # 替换成你要爬取的网站的URL headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 Edge/16.16299", "Referer": "https://www.google.com" } # 设置请求头，模拟浏览器访问 proxies = { "http": "http://127.0.0.1:8888", "https": "https://127.0.0.1:8888" } # 设置代理，方便调试 try: response = requests.get(url, headers=headers, proxies=proxies, timeout=10) response.raise_for_status() # 如果请求状态码不是200，抛出异常 except requests.exceptions.RequestException as e: print(e) # 处理异常 else: with open("output.html", "w", encoding="utf-8") as f: f.write(response.text) ``` 这个代码在原有的基础上添加了请求头、代理、异常处理等功能。请求头可以模拟浏览器访问，防止被网站封禁。代理可以方便调试，因为有些网站可能会根据IP地址限制访问。异常处理可以避免程序崩溃，而是打印错误信息。请注意，这个代码只是一个示例，具体实现方式可能因不同的需求而有所不同。

阅读全文