import requests import re url='https://www.biquge66.net/book/29689/20013112.html' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } response=requests.get(url=url, headers=headers) title=re.findall('<span class="divcss5">(.?)</span>',response.text)[0] content = re.findall('<div id="booktxt">(.?)<div class="content">',response.text)[0].replace('</p><p>','\n') print(content)哪有问题

时间: 2024-03-19 11:42:12 浏览: 298

这段代码的缩进有问题，需要对其中的所有代码进行缩进。正确的代码如下： ``` import requests import re url='https://www.biquge66.net/book/29689/20013112.html' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } response=requests.get(url=url, headers=headers) title=re.findall('<span class="divcss5">(.*?)</span>',response.text)[0] content = re.findall('<div id="booktxt">(.*?)<div class="content">',response.text)[0].replace('</p><p>','\n') print(content) ```

import requests import re import os list_url='https://www.biquge66.net/book/29689/' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } html_data=requests.get(url=list_url, headers=headers).text name=re.findall('<meta property="og:novel:book_name" content="(.?)" />',html_data)[0] file = f'{name}' if not os.path.exists(file): os.mkdir(file) url_list=re.findall('<a id="haitung" href="(.?)"',html_data) for url in url_list: url='https://www.biquge66.net'+url headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36' } response=requests.get(url=url, headers=headers) title=re.findall('<span class="divcss5">(.?)</span>',response.text)[0] content=re.findall('<div id="booktxt"><p>(.?)<div class="content">',response.text)[0].replace('</p>','\n') content=content.replace('<p>','') print(title,'\n') print(content) print(url) with open(file + name + '.txt',mode='a',encoding='utf-8')as f: title=title.center(60) f.write(title) f.write('\n\n') f.write(content) f.write('\n')我这代码有什么问题吗

你的代码可能存在以下问题： 1. 变量 `list_url` 未定义，你需要将其定义为小说目录页的 URL。 2. 在第 7 行，你从小说目录页中提取小说名称时，使用了正则表达式，但是正则表达式中的内容可能因网页结构变化而失效。建议使用 BeautifulSoup 库来解析 HTML。 3. 在第 11 行，你使用了变量 `name`，但是该变量未定义。你需要将 `name` 定义为小说名称。 4. 在第 26 行，你将小说章节保存到文件时，文件名应该是 `title` 而不是 `name`。另外，建议将文件名中的特殊字符（如 `?`）替换为下划线。 5. 在第 31 行，你尝试提取小说章节的正文时，使用了正则表达式。建议使用 BeautifulSoup 库来解析 HTML。 6. 在第 35 行，你尝试将小说章节的正文保存到文件中时，需要将 `title` 居中对齐，否则会影响文件的可读性。 7. 在第 37 行，你将小说章节的正文保存到文件后，需要添加一个空行，否则不同章节之间的内容会连在一起。

import requests from bs4 import BeautifulSoup headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36 Edg/113.0.1774.50'} url='http://www.biquge5200.cc/191_191776/' resoponse=requests.get(url,headers=headers,timeout=50) html=BeautifulSoup(resoponse.text) print(html)

这段 Python 代码使用 requests 和 BeautifulSoup 库获取网页 http://www.biquge5200.cc/191_191776/ 的 HTML 内容，并打印出来。其中 headers 参数用于模拟浏览器请求，timeout 参数用于设置请求超时时间。如果请求成功，将返回一个 BeautifulSoup 对象，可通过对象的方法和属性解析 HTML 内容。

阅读全文

相关推荐

TAIEX数据：可从https://www.twse.com.tw获取Json原始数据

http://python-requests.org/库的透明持久缓存-Python开发

python requests.get带header

python文章采集例子（爬取http://infoq.com）

list-pull-requests：满足我们需求的https：github.combuildsvillelist-pull-requests的分支

requests-random-user-agent:配置请求库以随机选择桌面用户代理

commitlint-bot：[未维护]使用https：github.comappssemantic-pull-requests

tempe-mail:Mengbuat“临时”电子邮件Dari www.temporary-mail.net

EO.WebBrowser_for_.NET_2018.2.53.0

python2-requests-oauthlib-0.8.0-5.el7.x64-86.rpm.tar.gz

Python库 | requests_random_user_agent-2020.10.5.tar.gz

Python库 | requests_random_user_agent-0.0.11.tar.gz

Python库 | requests_httpsproxy-1.0.1.tar.gz

eve-route-web.js:eve-route.js的网络赛跑者

Python库 | micropython-requests.auth-0.0.0.tar.gz

import requests获取网页源代码.docx.url

import reimport requestsfrom bs4 import BeautifulSoupimport t

'''模拟浏览器头部信息'''headers = 'User-Agent': 'Mozilla/5.0 (

大家在看

煤矿井下图像型早期火灾探测

PDK安装及cdl文件和gds文件的导入

SAP各模块字段与表的对应关系

蓝牙室内定位服务源码！

Cadence Allegro16.6高级进阶教程

最新推荐

智慧园区3D可视化解决方案PPT(24页).pptx

labelme标注的json转mask掩码图，用于分割数据集 批量转化，生成cityscapes格式的数据集

虚拟串口软件：实现IP信号到虚拟串口的转换

【Python进阶篇】：掌握这些高级特性，让你的编程能力飞跃提升

后端调用ragflow api

IE6下实现PNG图片背景透明的技术解决方案

【欧姆龙触摸屏故障诊断全攻略】

Educoder综合练习—C&C++选择结构

VBS简明教程：批处理之家论坛下载指南

【欧姆龙触摸屏：新手必读的10个操作技巧】

labelme标注的json转mask掩码图，用于分割数据集批量转化，生成cityscapes格式的数据集