如何解决html = requests.get('http://www.kanunu8.com/book3/6879').content.decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 276: invalid continuation byte

这个错误是由于尝试使用UTF-8编码解码包含非UTF-8字符的内容而引起的。你可以尝试指定其他的编码格式来解决这个问题。首先，你可以尝试使用`requests.get().text`来获取响应的文本内容，而不是使用`.content.decode()`。这样requests库会自动根据响应头中的字符编码进行解码。 ``` html = requests.get('http://www.kanunu8.com/book3/6879').text ``` 如果上述方法仍然无法解决问题，你可以尝试使用其他编码格式进行解码。例如，如果你知道该网页使用的是GBK编码，可以使用`.content.decode('gbk')`来解码。 ``` html = requests.get('http://www.kanunu8.com/book3/6879').content.decode('gbk') ``` 请根据实际情况尝试这些方法，并选择适合的编码格式来解决UnicodeDecodeError错误。

def get_html(): #获取首页源码 html=requests.get('https://www.kanunu8.com/book3/6879/').content.decode(encoding='gbk') # print(html) #获取所有章节链接 herf=re.findall('<a href="(13.*?)">',html,re.S) return herf

这段代码是一个函数`get_html()`，它用于获取指定网页的源码，并从中提取所有符合特定规则的章节链接。具体步骤如下： 1. 使用`requests`库发送GET请求，获取网页的原始内容。这里的URL是'https://www.kanunu8.com/book3/6879/'，返回的内容将以gbk编码进行解码。 2. 使用正则表达式`re.findall()`来从原始内容中提取所有符合特定规则的章节链接。这里的规则是以`<a href="(13.*?)">`为模式的链接。 3. 返回提取到的章节链接列表。请注意，运行这段代码前需要确保已经正确安装了`requests`库，并且在代码开头导入了相关的库和模块。

import requests import re import time #使用requests获取网页源代码 import requests import re import time #获取首页源码 html=requests.get('https://www.kanunu8.com/book3/6879/').content.decode(encoding='gbk') # print(html) #获取所有章节链接 herf=re.findall('<a href="(13.?)">',html,re.S) print(herf) start=time.time() for i in herf: #通过链接获取每一章的源码 chapter_html=requests.get('https://www.kanunu8.com/book3/6879/'+i).content.decode(encoding='gbk') # print(chapter_html) title=re.search('size="4">(.?)<',chapter_html,re.S).group(1)#获取章节名称 content=re.findall('(.*?)',chapter_html,re.S)#获取每一张p标签内的内容,结果返回为列表 content_str="\n".join(content).replace(" ","")#列表转为字符串并替换多余符号 with open('动物农场/'+title+'.txt','w',encoding='utf-8') as f: f.write(title) f.write(content_str) end=time.time() print(f'单线程耗时{end-start}')请详细江一下这段代码

这段代码是一个 Python 程序，用于将小说《动物农场》的所有章节内容保存到本地文件中。程序首先使用 requests 库获取小说首页的源码，然后使用正则表达式解析出所有章节的链接。接着，程序遍历每个链接，获取每个章节的源码，解析出章节的标题和内容，并将其保存到以标题命名的本地文件中。具体来说，程序中的主要步骤如下： 1. 导入 requests、re 和 time 库。 2. 使用 requests 库获取小说首页的源码。 ```python html=requests.get('https://www.kanunu8.com/book3/6879/').content.decode(encoding='gbk') ``` 3. 使用正则表达式解析出所有章节的链接。 ```python herf=re.findall('<a href="(13.*?)">',html,re.S) ``` 4. 遍历每个链接，获取每个章节的源码。 ```python for i in herf: chapter_html=requests.get('https://www.kanunu8.com/book3/6879/'+i).content.decode(encoding='gbk') ``` 5. 使用正则表达式解析出章节的标题和内容，并将其保存到本地文件中。 ```python title=re.search('size="4">(.*?)<',chapter_html,re.S).group(1) content=re.findall('(.*?)',chapter_html,re.S) content_str="\n".join(content).replace(" ","") with open('动物农场/'+title+'.txt','w',encoding='utf-8') as f: f.write(title) f.write(content_str) ``` 6. 程序结束后，输出单线程耗时。 ```python end=time.time() print(f'单线程耗时{end-start}') ```

阅读全文

如何解决html = requests.get('http://www.kanunu8.com/book3/6879').content.decode() UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 276: invalid continuation byte

def get_html(): #获取首页源码 html=requests.get('https://www.kanunu8.com/book3/6879/').content.decode(encoding='gbk') # print(html) #获取所有章节链接 herf=re.findall('<a href="(13.*?)">',html,re.S) return herf

相关推荐

Python爬虫实战：抓取http://www.win4000.com/美桌图片

Vue.js中的V-model插件：与RESTful服务的交互支持

被淘汰的Vuex-api库：轻松处理Vue.js中的API调用

http://python-requests.org/库的透明持久缓存-Python开发

使用lxml的etree爬取http://www.kanunu8.com/book3/6879/每一章节的内容

請至https://www.ttkan.co/查看小說 說明已打包

一个强大的Clojure web库，完整的HTTP，完整的异步-参见https://juxt.pro/yada/index.html -并置/yada

python文章采集例子（爬取http://infoq.com）

python requests.get带header

requests.php:HTTP请求和响应的路由器和方法

TAIEX数据：可从https://www.twse.com.tw获取Json原始数据

httpRequests.js:使多个并行异步 XMLHttpRequests

ofborg：@ofborg工具自动化https：//monitoring.nix.cidashboarddbofborg

gitlab-merge-requests.spoon:manbetx客户端打不开合并请求与MacOS集成

解决Sending DHCP and RARP requests .... 卡住问题

HttpRequests.java

UdpResponse.rar_TCP/IP协议栈_Objective-C_

Python.Requests.Essentials.1784395412

最新推荐

dnSpy-net-win32-222.zip

和美乡村城乡融合发展数字化解决方案.docx

如何看待“适度宽松”的货币政策.pdf

C#连接sap NCO组件 X64版

法码滋.exe法码滋2.exe法码滋3.exe

GitHub图片浏览插件：直观展示代码中的图像

管理建模和仿真的文件

【OPPO手机故障诊断专家】：工程指令快速定位与解决

求[100，900]之间相差为12的素数对（注：要求素数对的两个素数均在该范围内）的个数

Android IPTV项目：直播频道的实时流媒体实现

請至https://www.ttkan.co/查看小說說明已打包