import requests import re import os list_url='https://www.biquge66.net/book/29689/' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } html_data=requests.get(url=list_url, headers=headers).text name=re.findall('<meta property="og:novel:book_name" content="(.?)" />',html_data)[0] file = f'{name}' if not os.path.exists(file): os.mkdir(file) url_list=re.findall('<a id="haitung" href="(.?)"',html_data) for url in url_list: url='https://www.biquge66.net'+url headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } response=requests.get(url=url, headers=headers) title=re.findall('<span class="divcss5">(.?)</span>',response.text)[0] content=re.findall('<div id="booktxt"><p>(.?)<div class="content">',response.text)[0].replace('</p>','\n') content=content.replace('<p>','') print(title,'\n') print(content) print(url) with open(file + name + '.txt',mode='a',encoding='utf-8')as f: title=title.center(60) f.write(title) f.write('\n\n') f.write(content) f.write('\n')我这代码有什么问题吗

时间: 2024-04-03 09:33:53 浏览: 84

你的代码可能存在以下问题： 1. 变量 `list_url` 未定义，你需要将其定义为小说目录页的 URL。 2. 在第 7 行，你从小说目录页中提取小说名称时，使用了正则表达式，但是正则表达式中的内容可能因网页结构变化而失效。建议使用 BeautifulSoup 库来解析 HTML。 3. 在第 11 行，你使用了变量 `name`，但是该变量未定义。你需要将 `name` 定义为小说名称。 4. 在第 26 行，你将小说章节保存到文件时，文件名应该是 `title` 而不是 `name`。另外，建议将文件名中的特殊字符（如 `?`）替换为下划线。 5. 在第 31 行，你尝试提取小说章节的正文时，使用了正则表达式。建议使用 BeautifulSoup 库来解析 HTML。 6. 在第 35 行，你尝试将小说章节的正文保存到文件中时，需要将 `title` 居中对齐，否则会影响文件的可读性。 7. 在第 37 行，你将小说章节的正文保存到文件后，需要添加一个空行，否则不同章节之间的内容会连在一起。

import requests import re url='https://www.biquge66.net/book/29689/20013112.html' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } response=requests.get(url=url, headers=headers) title=re.findall('<span class="divcss5">(.?)</span>',response.text)[0] content = re.findall('<div id="booktxt">(.?)<div class="content">',response.text)[0].replace('</p><p>','\n') print(content)哪有问题

这段代码的缩进有问题，需要对其中的所有代码进行缩进。正确的代码如下： ``` import requests import re url='https://www.biquge66.net/book/29689/20013112.html' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } response=requests.get(url=url, headers=headers) title=re.findall('<span class="divcss5">(.*?)</span>',response.text)[0] content = re.findall('<div id="booktxt">(.*?)<div class="content">',response.text)[0].replace('</p><p>','\n') print(content) ```

import requests from bs4 import BeautifulSoup headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.50'} url='http://www.biquge5200.cc/191_191776/' resoponse=requests.get(url,headers=headers,timeout=50) html=BeautifulSoup(resoponse.text) print(html)

这段 Python 代码使用 requests 和 BeautifulSoup 库获取网页 http://www.biquge5200.cc/191_191776/ 的 HTML 内容，并打印出来。其中 headers 参数用于模拟浏览器请求，timeout 参数用于设置请求超时时间。如果请求成功，将返回一个 BeautifulSoup 对象，可通过对象的方法和属性解析 HTML 内容。

阅读全文

相关推荐

TAIEX数据：可从https://www.twse.com.tw获取Json原始数据

http://python-requests.org/库的透明持久缓存-Python开发

python requests.get带header

python文章采集例子（爬取http://infoq.com）

Python库 | requests_random_user_agent-0.0.11.tar.gz

Python库 | requests_random_user_agent-2020.10.5.tar.gz

list-pull-requests：满足我们需求的https：github.combuildsvillelist-pull-requests的分支

requests-random-user-agent:配置请求库以随机选择桌面用户代理

Python库 | requests_httpsproxy-1.0.1.tar.gz

EO.WebBrowser_for_.NET_2018.2.53.0

Python库 | drf_requests_jwt-0.2.tar.gz

Python库 | requests_cache_mongodb-0.0.2.tar.gz

api_lista_hoteis：列表列表

Python库 | imath_requests-0.1.30.post0-py3-none-any.whl

Python库 | nidhoggr_requests-0.4.0-py2.py3-none-any.whl

Python库 | requests_ntlm2-0.0.1-py2.py3-none-any.whl

Python库 | requests_pkcs12-1.6-py2.py3-none-any.whl

幼儿园安全教育管理.pptx

大家在看

QT实现动画右下角提示信息弹窗

GL3231S USB4.0读卡器Layout和原理图及相关的FW

2022年最新整理《康熙字典》最全的数据库

win7 64位可以用-VMware-Horizon-Client-5.1.0.zip

humblebundle-meet-metacritic:python脚本，可刮写metacritic以获得有关谦虚捆绑购买的更多信息

最新推荐

幼儿园安全教育管理.pptx

校园招聘模板 (2).pptx

Pokedex: 探索JS开发的口袋妖怪应用程序

【提升R-Studio恢复效率】：RAID 5数据恢复的高级技巧与成功率

帮我写一段Python代码：sin波生成，设定rate，fin，size，幅值

Laravel实用工具包：laravel-helpers概述

【R-Studio技术路径】：从RAID 5数据恢复基础到高级操作

如果我想要排除从这周周一开始的数据，应该怎么改，<date_sub(next_day(now(), 'Mon'), 7)吗？

Elasticsearch Analysis IK插件7.6.0版本发布

【R-Studio恢复工具解析】：RAID 5恢复的功能优势与实际应用