如何解决 AttributeError: module 'pdfminer' has no attribute 'extract_tables'
时间: 2023-06-14 08:04:00 浏览: 479
这个错误通常是因为您正在使用旧版本的pdfminer库。在旧版本中,`extract_tables`函数不存在。您可以通过以下方法解决:
1. 升级pdfminer库到最新版本:
```
pip install --upgrade pdfminer
```
2. 如果升级不起作用,可以尝试使用pdfminer.six库:
```
pip install pdfminer.six
```
然后在代码中使用以下语句进行表格提取:
```
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import PDFPageAggregator
from pdfminer.layout import LAParams, LTTextBox, LTTextLine, LTChar, LTFigure, LTImage, LTTextBoxHorizontal, LTTextGroup, LTTextBoxVertical
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfpage import PDFTextExtractionNotAllowed
from pdfminer.pdfdevice import PDFDevice
from pdfminer.layout import *
from pdfminer.pdfinterp import resolve1
from pdfminer.converter import PDFPageAggregator
def extract_tables(pdf_path):
tables = []
with open(pdf_path, "rb") as fp:
parser = PDFParser(fp)
doc = PDFDocument(parser)
if not doc.is_extractable:
raise PDFTextExtractionNotAllowed
rsrcmgr = PDFResourceManager()
laparams = LAParams()
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)
for page in PDFPage.create_pages(doc):
interpreter.process_page(page)
layout = device.get_result()
for lt_obj in layout:
if isinstance(lt_obj, LTTable):
table = []
for row in lt_obj:
row_data = []
for cell in row:
row_data.append(cell.get_text().strip())
table.append(row_data)
tables.append(table)
return tables
```
阅读全文