Python 提取pdf的excel表格
时间: 2024-10-08 09:07:22 浏览: 50
要使用Python从PDF中提取表格并保存为Excel文件,可以按照以下步骤进行:
1. **安装所需库**[^2]:
```sh
pip install pdfplumber openpyxl pandas
```
2. **导入必要的库**:
```python
import os
from pdfplumber import PDFPlumber
import pandas as pd
from openpyxl import Workbook
```
3. **定义提取表格的函数**:
```python
def extract_tables_from_pdf(pdf_path, excel_writer):
with PDFPlumber(pdf_path) as pdf:
for page in pdf.pages:
tables = page.extract_tables()
if tables:
for table in tables:
df = pd.DataFrame(table)
excel_writer.append(df)
```
4. **创建Excel writer对象并开始提取**:
```python
def save_tables_to_excel(pdf_paths, output_file):
wb = Workbook()
writer = pd.ExcelWriter(output_file, engine='openpyxl')
for pdf_path in pdf_paths:
extract_tables_from_pdf(pdf_path, writer)
writer.save()
```
5. **调用函数处理多个PDF文件**:
```python
pdf_paths = ["path/to/pdf1.pdf", "path/to/pdf2.pdf"]
output_file = "extracted_tables.xlsx"
save_tables_to_excel(pdf_paths, output_file)
```
通过以上代码,PDF中的表格会被逐一提取出来,存储在一个Excel文件中。记得替换`pdf_paths`变量中的路径为你实际的PDF文件路径。
阅读全文