从txt文件中抽取含有多个关键词的句子,关键词列表从Excel表格中获取,使用python代码,并将抽取的句子存为txt文件
时间: 2023-02-06 12:52:58 浏览: 190
首先,我们需要打开并读取txt文件:
```python
with open('input.txt', 'r') as f:
text = f.read()
```
然后,我们可以使用正则表达式来匹配关键词:
```python
import re
keywords = ['关键词1', '关键词2', '关键词3']
pattern = '|'.join(keywords)
# 匹配句子,并将匹配的结果存入列表中
sentences = re.findall(r'[^。!?]*(%s)[^。!?]*[。!?]' % pattern, text)
```
如果你想从Excel表格中获取关键词列表,可以使用pandas库来读取Excel文件:
```python
import pandas as pd
df = pd.read_excel('keywords.xlsx')
keywords = df['Keyword'].tolist()
```
最后,我们可以将抽取的句子写入新的txt文件中:
```python
with open('output.txt', 'w') as f:
for sentence in sentences:
f.write(sentence + '\n')
```
完整的代码如下:
```python
import re
import pandas as pd
# 读取关键词列表
df = pd.read_excel('keywords.xlsx')
keywords = df['Keyword'].tolist()
# 构建正则表达式
pattern = '|'.join(keywords)
# 读取txt文件
with open('input.txt', 'r') as f:
text = f.read()
# 匹配句子,并将匹配的结果存入列表中
sentences = re.findall(r'[^。!?]*(%s)[^。!?]*[。!?]' % pattern, text)
# 将句子写入新的txt文件中
with open('output.txt', 'w') as f:
for sentence in sentences:
f.write(sentence + '\n')
```