file_df["sql"].str.contains(r"(?i).*?update.*?", regex=True)和file_df["sql"].str.contains(r"(?i)\.*update.*?", regex=True)的区别

这两个正则表达式的区别在于`.`的位置不同。第一个正则表达式 `r"(?i).*?update.*?"` 中，`.`在`.*?`的后面，表示匹配任意字符（除了换行符）0次或多次，尽可能少地匹配，然后再匹配字符串"update"，大小写不敏感。第二个正则表达式 `r"(?i)\.*update.*?"` 中，`.`在`\.*`的前面，表示匹配`.`字符0次或多次，然后再匹配字符串"update"，大小写不敏感。因此，第一个表达式会匹配任意字符（除了换行符）0次或多次，包括`.`字符，然后再匹配字符串"update"。而第二个表达式会匹配以`.`字符开头的字符串中的"update"。

import osimport reimport pandas as pd# 正则表达式regex = re.compile(r'^2-.\.xlsx$')continuation_regex = re.compile(r'^\d+\s.$') # 这里假设续表的第一列为数字加空格# 读取所有符合条件的xlsx文件files = [f for f in os.listdir('.') if os.path.isfile(f) and regex.match(f)]dfs = []for file in files: df = pd.read_excel(file, sheet_name=None, engine='openpyxl') for sheet_name, sheet_df in df.items(): # 判断续表并添加到之前的表格中 if sheet_df.iloc[0, 0] and continuation_regex.match(str(sheet_df.iloc[0, 0])): dfs[-1] = pd.concat([dfs[-1], sheet_df]) else: dfs.append(sheet_df)# 合并所有表格merged_df = pd.concat(dfs)# 根据城市为键合并所有表格grouped_df = merged_df.groupby('城市').agg(lambda x: x.tolist())# 将所有列表转换为字符串并重新转换为DataFramegrouped_df = grouped_df.applymap(lambda x: ','.join([str(i) for i in x]))grouped_df = pd.DataFrame(grouped_df.to_dict())# 输出结果print(grouped_df) 修改为遍历指定列表

如果想要修改这段代码，使其不再遍历当前文件夹下的xlsx文件，而是遍历指定文件夹下的xlsx文件，可以将os.listdir('.')替换为os.listdir('指定文件夹路径')，例如： ``` import os import re import pandas as pd # 正则表达式 regex = re.compile(r'^2-.*\.xlsx$') continuation_regex = re.compile(r'^\d+\s.*$') # 这里假设续表的第一列为数字加空格 # 指定文件夹路径 folder_path = '指定文件夹路径' # 读取所有符合条件的xlsx文件 files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if os.path.isfile(os.path.join(folder_path, f)) and regex.match(f)] dfs = [] for file in files: df = pd.read_excel(file, sheet_name=None, engine='openpyxl') for sheet_name, sheet_df in df.items(): # 判断续表并添加到之前的表格中 if sheet_df.iloc[0, 0] and continuation_regex.match(str(sheet_df.iloc[0, 0])): dfs[-1] = pd.concat([dfs[-1], sheet_df]) else: dfs.append(sheet_df) # 合并所有表格 merged_df = pd.concat(dfs) # 根据城市为键合并所有表格 grouped_df = merged_df.groupby('城市').agg(lambda x: x.tolist()) # 将所有列表转换为字符串并重新转换为DataFrame grouped_df = grouped_df.applymap(lambda x: ','.join([str(i) for i in x])) grouped_df = pd.DataFrame(grouped_df.to_dict()) # 输出结果 print(grouped_df) ``` 其中，需要将'指定文件夹路径'替换为实际的文件夹路径。

path = 'D:/SMT本部出勤' excel_dir = path excel_paths = [os.path.join(excel_dir, f) for f in os.listdir(excel_dir) if f.endswith('.xls')] dfs = [] for excel_path in excel_paths: df = pd.read_excel(excel_path, engine='openpyxl') df = df.replace({'<a href=': '', '</a>': ''}, regex=True) # 去除超链接标签 dfs.append(df) merged_df = pd.concat(dfs) output_path = r"D:\SMT本部出勤匯總\熱資料.xlsx" merged_df.to_excel(output_path, index=False)，zipfile.BadZipFile: File is not a zip file報這個

这个错误提示 `zipfile.BadZipFile: File is not a zip file` 表示在读取Excel文件时，遇到了一个不是ZIP格式的文件。根据你提供的代码，这个错误可能是由于以下几个原因之一引起的： 1. `excel_paths` 列表中包含非Excel文件：请确保 `excel_dir` 目录下只包含有效的Excel文件（以 `.xls` 或 `.xlsx` 结尾）。 2. `excel_paths` 列表中包含损坏的ZIP文件：在读取Excel文件时，`pandas` 库使用 `openpyxl` 引擎来处理文件。如果 `excel_paths` 列表中的某个文件实际上是一个损坏的ZIP文件，就会引发这个错误。建议你检查 `excel_dir` 目录下的文件，并确保只包含有效的Excel文件。如果有任何损坏的ZIP文件，请删除它们或修复它们。确认后，再次运行代码应该可以成功合并Excel文件并保存到指定的输出路径中。

阅读全文

file_df["sql"].str.contains(r"(?i).?update.?", regex=True)和file_df["sql"].str.contains(r"(?i)\.update.?", regex=True)的区别

相关推荐

file_df["sql"].str.contains(r"(?i).*?update.*?", regex=True)和file_df["sql"].str.contains(r"(?i)\.*update.*?", regex=True)的区别

相关推荐

Windows系统下regex2.dll的正确安装与注册方法

"正则表达式基本使用与java.util.regex包相关的操作

正则表达式基础：. * + ? \d \w [ ] ^ $

df03.Email.str.replace('(.*?)@','xxx@',regex=True) 正则意思

.str.contains(code_, regex=True) 排除

df = df.replace('┌|─|┬|├|└|┘|┐|┼|┤|┴', '', regex=True) df = df[0].str.split('│', expand=True, regex=False)

df = df.replace('┌|─|┬|├|└|┘|┐|┼|┤|┴', '', regex=True) df = df[0].str.split('│', expand=True, regex=False)

".*=.*?(?=,)" 更改一下表达式，使".*“匹配的长度遇到"="结束

boost_regex_only3file_1_34_0.rar

re2j-master.zip_com.google.re2j_google.re2j (?<_java 正则 master

df.query("msg_rule.str.contains('(一|二|三|四|五|六|七|八|九|十)大(一|二|三|四|五|六|七|八|九|十)小',regex=True)")[['id','msg_rule']]改成str.extract表达

"\\w*=.*?(?=,)" 可以匹配 "sd=23d_f" 但是不能匹配"你好=23d_f"，修改一下

大家在看

LTE Signaling & Protocol Analysis Focus: E-UTRAN and UE

Cassandra数据模型设计最佳实践

dujiaoka-mod:独角数卡魔改版

天线测试手册

SAP VMS 06_DealerPortal

最新推荐

Java源码ssm框架医院预约挂号系统-毕业设计论文-期末大作业.rar

易语言例程：用易核心支持库打造功能丰富的IE浏览框

管理建模和仿真的文件

STM32F407ZG引脚功能深度剖析：掌握引脚分布与配置的秘密（全面解读）

给出文档中问题的答案代码

Docker构建与运行Next.js应用的指南

"互动学习：行动中的多样性与论文攻读经历"

【热传递模型的终极指南】：掌握分类、仿真设计、优化与故障诊断的18大秘诀

python经典题型和解题代码

宠物控制台应用程序：Java编程实践与反思

file_df["sql"].str.contains(r"(?i).?update.?", regex=True)和file_df["sql"].str.contains(r"(?i)\.update.?", regex=True)的区别

".=.?(?=,)" 更改一下表达式，使".*“匹配的长度遇到"="结束

"\\w=.?(?=,)" 可以匹配 "sd=23d_f" 但是不能匹配"你好=23d_f"，修改一下