使用python 和opencv 识别图片中的表格,将结果转化成excel 文件
时间: 2023-06-26 12:05:06 浏览: 701
要实现这个任务,需要使用Python中的OpenCV和Pandas库。具体步骤如下:
1. 首先,使用OpenCV读取图片,将其转换为灰度图像并进行阈值处理,以便于检测表格线。
```python
import cv2
import numpy as np
# 读取图片并转换为灰度图像
img = cv2.imread('image.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# 应用阈值处理,以便于检测表格线
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
```
2. 检测表格线。我们可以使用HoughLinesP函数来检测直线,然后筛选出水平和垂直线。
```python
# 检测表格线
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25, 1))
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1, 25))
horizontal_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel)
vertical_lines = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel)
# 筛选出水平和垂直线
lines = cv2.HoughLinesP(horizontal_lines + vertical_lines, 1, np.pi/180, 100, minLineLength=100, maxLineGap=10)
```
3. 将检测到的表格线画出来,并将表格线的坐标存储在列表中。
```python
# 画出检测到的表格线
for line in lines:
x1, y1, x2, y2 = line[0]
cv2.line(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
# 存储表格线的坐标
horizontal_points = []
vertical_points = []
for line in lines:
x1, y1, x2, y2 = line[0]
if abs(x1 - x2) < 10:
vertical_points.append((x1, y1))
elif abs(y1 - y2) < 10:
horizontal_points.append((x1, y1))
```
4. 根据表格线的坐标,将表格分割成单元格。
```python
# 对表格线的坐标进行排序
horizontal_points = sorted(horizontal_points, key=lambda x: x[1])
vertical_points = sorted(vertical_points, key=lambda x: x[0])
# 将表格分割成单元格
cells = []
for i in range(len(horizontal_points) - 1):
for j in range(len(vertical_points) - 1):
x1, y1 = vertical_points[j]
x2, y2 = vertical_points[j+1]
x3, y3 = horizontal_points[i]
x4, y4 = horizontal_points[i+1]
cell = img[y3:y4, x1:x2]
cells.append(cell)
```
5. 使用Pandas库将单元格数据转换为Excel文件。
```python
import pandas as pd
# 将单元格数据转换为DataFrame
data = []
for cell in cells:
gray_cell = cv2.cvtColor(cell, cv2.COLOR_BGR2GRAY)
text = pytesseract.image_to_string(gray_cell, lang='eng', config='--psm 6')
data.append(text.strip())
df = pd.DataFrame(np.array(data).reshape(len(horizontal_points) - 1, len(vertical_points) - 1))
# 将DataFrame写入Excel文件
df.to_excel('table.xlsx', index=False, header=False)
```
完整代码如下:
阅读全文