python怎么根据文件夹下含有多个关键词的txt文本,直接遍历文件夹下的全部文本文件和excel文件,然后把匹配到的所在行内容输出到新的文本里面
时间: 2024-02-05 11:14:17 浏览: 150
可以使用Python的os和re模块,结合pandas库来实现。
具体步骤如下:
1.首先导入需要的模块:
```python
import os
import re
import pandas as pd
```
2.定义需要匹配的关键词列表:
```python
keywords = ['keyword1', 'keyword2', 'keyword3']
```
3.定义需要遍历的文件夹路径:
```python
folder_path = 'your/folder/path'
```
4.遍历文件夹下的所有文本文件和Excel文件:
```python
file_list = []
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.endswith('.txt') or file.endswith('.xlsx'):
file_list.append(os.path.join(root, file))
```
5.定义一个函数来匹配关键词并输出匹配到的行内容到新的文本文件:
```python
def match_keywords(file_path, keywords):
# 读取文本文件或Excel文件中的数据
if file_path.endswith('.txt'):
with open(file_path, 'r', encoding='utf-8') as f:
data = f.readlines()
elif file_path.endswith('.xlsx'):
df = pd.read_excel(file_path)
data = df.values.tolist()
# 匹配关键词并输出匹配到的行内容到新的文本文件
output_file_path = os.path.join(folder_path, 'output.txt')
with open(output_file_path, 'a', encoding='utf-8') as f:
for row in data:
if any(re.search(keyword, str(row)) for keyword in keywords):
f.write(str(row)+'\n')
```
6.遍历文件列表并调用函数进行匹配关键词和输出行内容:
```python
for file_path in file_list:
match_keywords(file_path, keywords)
```
完整代码示例:
```python
import os
import re
import pandas as pd
keywords = ['keyword1', 'keyword2', 'keyword3']
folder_path = 'your/folder/path'
def match_keywords(file_path, keywords):
# 读取文本文件或Excel文件中的数据
if file_path.endswith('.txt'):
with open(file_path, 'r', encoding='utf-8') as f:
data = f.readlines()
elif file_path.endswith('.xlsx'):
df = pd.read_excel(file_path)
data = df.values.tolist()
# 匹配关键词并输出匹配到的行内容到新的文本文件
output_file_path = os.path.join(folder_path, 'output.txt')
with open(output_file_path, 'a', encoding='utf-8') as f:
for row in data:
if any(re.search(keyword, str(row)) for keyword in keywords):
f.write(str(row)+'\n')
file_list = []
for root, dirs, files in os.walk(folder_path):
for file in files:
if file.endswith('.txt') or file.endswith('.xlsx'):
file_list.append(os.path.join(root, file))
for file_path in file_list:
match_keywords(file_path, keywords)
```
阅读全文