tlt = re.findall(r'data-title\=\".*?\"',html)

This line of code uses the Python regular expression module `re` to search for all occurrences of a string pattern within the `html` variable. The pattern being searched for is `data-title\=\".*?\"`, which matches any string within `html` that begins with `data-title="` and ends with a double quote (`"`), with any characters in between (represented by the `.*?` wildcard expression). The `findall` method of the `re` module returns a list of all the matches found in the `html` variable. The resulting list is assigned to the variable `tlt`.

import requests import re def getHTMLText(url): try: headers={'User - Agent': 'Mozilla/5.0 (Window NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/77.0.3865.120 Safari/537.36 chrome-extension'} r=requests.get(url,headers=headers) r.raise_for_status() r.encoding=r.apparent_encoding return r.text except: print("Erro_get") #用于提取所需的电影信息 def parsePage(ilt,html): tlt=re.findall(r'data-title\=\".* ?\"',html) for i in range(len(tlt)): plt=eval(tlt[i].split('=')[1]) if plt in ilt: pass else: ilt.append(plt) def printInfo(ilt): print("上海热映") for i in ilt: print(i) def main(): url='https://movie.douban.com/cinema/nowplaying/shanghai/' list=[] html=getHTMLText(url) parsePage(list,html) printInfo(list) main()

这段代码没有语法错误，但是有一些需要注意的问题： 1. 在 headers 中，User-Agent 后面应该有一个冒号，而不是空格。 2. 在函数 parsePage 中，应该避免使用 eval 函数，因为它会执行字符串中的代码，存在安全隐患。应该使用正则表达式提取出电影名称。 3. 在函数 printInfo 中，电影名称前面多了几个空格，可以使用字符串的 strip 方法进行去除。修正后的代码如下： ``` import requests import re def getHTMLText(url): try: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36 chrome-extension'} r = requests.get(url, headers=headers) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: print("Erro_get") # 用于提取所需的电影信息 def parsePage(ilt, html): tlt = re.findall(r'data-title="(.*?)"', html) for i in range(len(tlt)): plt = tlt[i] if plt in ilt: pass else: ilt.append(plt) def printInfo(ilt): print("上海热映") for i in ilt: print(i.strip()) # 去除电影名称前后的空格 def main(): url = 'https://movie.douban.com/cinema/nowplaying/shanghai/' movie_list = [] html = getHTMLText(url) parsePage(movie_list, html) printInfo(movie_list) main() ```

import requests import re def getHTMLText(url): try: r = requests.get(url, timeout=30) r.raise_for_status() r.encoding = r.apparent_encoding return r.text except: return "" # 整个程序的关键 # 用了正则表达式而没有用BeautifulSoup def parsePage(ilt, html): try: plt = re.findall(r'\"view_price\"\:\"[\d\.]\"', html) tlt = re.findall(r'\"raw_title\"\:\".?\"', html) for i in range(len(plt)): price = eval(plt[i].split(':')[1]) # eval函数将字符串的最外层的单引号或双引号去掉 title = eval(tlt[i].split(':')[1]) ilt.append([price, title]) except: print("") def printGoodsList(ilt): tplt = "{:4}\t{:8}\t{:16}" print(tplt.format('序号', '价格', '商品名称')) count = 0 for q in ilt: count = count + 1 print(tplt.format(count, g[0], g[1])) def main(): goods = '书包' depth = 2 start_url = 'https://s.taobao.com/search?q=' + goods infoList = [] for i in range(depth): try: url = start_url + '&s=' + str(44 * i) html = getHTMLText(url) parsePage(infolist, html) except: # 如果某一个页面的解析出了问题，那我们继续下一个页面的解析而不影响整个程序的执行 continue printGoodsList(infoList) main()

这是一段Python代码，主要功能是通过requests库访问指定URL，获取网页内容并返回。如果访问失败，返回空字符串。同时，还使用了re库来处理网页内容。其中，getHTMLText()函数接受一个参数URL，表示要访问的页面地址。在函数体中，使用requests库的get()方法来访问URL，timeout参数为30秒，表示最多等待30秒后如果还没有响应将会抛出异常。如果访问成功，使用r.encoding属性来确定网页编码方式，然后返回r.text内容。如果访问失败，则返回空字符串。

阅读全文

tlt = re.findall(r'data-title\=\".*?\"',html)

相关推荐

驱动R61509V 3.2寸TLT液晶屏 单片机MCU_16位.rar

驱动R61509V 3.2寸TLT液晶屏 单片机MCU_8位.rar

tlt18.github.io

Ruby语言爱好者的新空间：tlt18.github.io 探索

tlt-resnet-15

tlt-a8写频软件

TLT轴流风机液压缸讲义123.ppt

创龙科技TLT3-EVM开发板规格书

TLT送风机液压缸拆装工序.ppt

tlt_peoplenet_unpruned_v2.1.zip PeopleNet模型

TLT807B0EPV 英飞凌芯片 INFINEON 中文版规格书手册.pdf

plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"',html) tlt = re.findall(r'\"raw_title\"\:\".*?\"',html)

tlt-dataset-convert

(179979052)基于MATLAB车牌识别系统【带界面GUI】.zip

DG储能选址定容模型matlab 程序采用改进粒子群算法，考虑时序性得到分布式和储能的选址定容模型，程序运行可靠 这段程序是一个改进的粒子群算法，主要用于解决电力系统中的优化问题 下面我将对程序进行详

最新推荐

NVIDIA DeepStream入门介绍

DeepStream 基于 Python 的行人统计模块代码解析

(179979052)基于MATLAB车牌识别系统【带界面GUI】.zip

DG储能选址定容模型matlab 程序采用改进粒子群算法，考虑时序性得到分布式和储能的选址定容模型，程序运行可靠 这段程序是一个改进的粒子群算法，主要用于解决电力系统中的优化问题 下面我将对程序进行详

三保一评关系与区别分析

Java毕业设计项目：校园二手交易网站开发指南

管理建模和仿真的文件

【MVC标准化：肌电信号处理的终极指南】：提升数据质量的10大关键步骤与工具

能否提供一个在R语言中执行Framingham数据集判别分析的详细和完整的代码示例？

Blaseball Plus插件开发与构建教程

驱动R61509V 3.2寸TLT液晶屏单片机MCU_16位.rar

驱动R61509V 3.2寸TLT液晶屏单片机MCU_8位.rar

plt = re.findall(r'\"view_price\"\:\"[\d\.]\"',html) tlt = re.findall(r'\"raw_title\"\:\".?\"',html)

DG储能选址定容模型matlab 程序采用改进粒子群算法，考虑时序性得到分布式和储能的选址定容模型，程序运行可靠这段程序是一个改进的粒子群算法，主要用于解决电力系统中的优化问题下面我将对程序进行详

DG储能选址定容模型matlab 程序采用改进粒子群算法，考虑时序性得到分布式和储能的选址定容模型，程序运行可靠这段程序是一个改进的粒子群算法，主要用于解决电力系统中的优化问题下面我将对程序进行详