'gb2312' codec can't decode byte 0xa8 in position 0: illegal multibyte sequence
时间: 2023-08-12 18:09:41 浏览: 51
问题描述: 当使用Python抓取网页后,尝试使用decode解码时,报错信息为UnicodeDecodeError: 'gb2312' codec can't decode byte 0x8f in position 6018: illegal multibyte sequence。同样的问题也出现在读取yaml文件时,报错为UnicodeDecodeError: 'gbk' codec can't decode byte 0xa7 in position 265: illegal multibyte sequence。请问这是怎么回事,该如何解决?
回答: 这个问题的原因是使用了错误的编码方式进行解码。在第一个引用中提到,gb2312是中文老的编码方式,而最新的编码方式为gb18030,它的范围更大。因此,当网页或文件中存在一些gb2312无法解码的特殊符号时,就会出现解码错误。解决办法是将编码方式从gb2312替换为gb18030。这样就可以成功解码了。\[2\]
所以,对于第一个问题,你可以将html.decode("gb2312")改为html.decode("gb18030")来解决解码错误。对于第二个问题,你可以将编码方式从'gbk'改为'gb18030'来解决解码错误。\[1\]\[3\]
#### 引用[.reference_title]
- *1* [decode解码报错UnicodeDecodeError: 'gb2312' codec can't decode byte 0x8f in position 6018: illegal ...](https://blog.csdn.net/k53247l2/article/details/79340900)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item]
- *2* [requests--request请求乱码问题](https://blog.csdn.net/qq_27056805/article/details/86444095)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item]
- *3* [UnicodeDecodeError: ‘gbk‘ codec can‘t decode byte 0xa7 in position 265: illegal multibyte sequence](https://blog.csdn.net/qq_44368508/article/details/126831706)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^control,239^v3^insert_chatgpt"}} ] [.reference_item]
[ .reference_list ]