使用OpenCV进行文档扫描与OCR处理

27 浏览量更新于2024-08-29 收藏 580KB PDF 举报

"Opencv|Document Scanning & Optical Character Recognition 使用OpenCV进行文档扫描与光学字符识别(OCR)的教程" 在计算机视觉领域，OpenCV是一个强大的库，它提供了丰富的功能，包括图像处理、特征检测、对象识别等。在这个场景中，我们将探讨如何使用OpenCV进行文档扫描和光学字符识别（OCR）。OCR技术主要用于从图像中自动提取文本，使得非结构化的图像数据能够被转化为可编辑和可搜索的文本。 **步骤1：导入必要的包和自定义模块** 在项目开始时，我们需要导入必要的Python库。这里我们看到`cv2`是OpenCV的主要接口，`numpy`用于处理数组操作，而`resize`是一个自定义的py文件，可能包含了调整图像大小的功能。这一步是为了确保我们有所有需要的工具来处理图像。 ```python import cv2 import numpy as np import resize ``` **步骤2：图像的导入与预处理** 首先，读取待处理的图片。如果图片分辨率足够高，甚至可以使用笔记本电脑的摄像头作为输入。然后，我们将图像尺寸调整为特定大小，以适应后续处理。在这个例子中，我们将其调整为1500x1125像素。同时，创建原始图像的一个副本，以便稍后恢复。 ```python image = cv2.imread('test.jpg') image = cv2.resize(image, (1500, 1125)) orig = image.copy() ``` 接下来，将图像转换为灰度，以减少颜色信息对识别的干扰，并通过高斯模糊减少噪声，提高边缘检测的准确性。 ```python gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred = cv2.GaussianBlur(gray, (5, 5), 0) ``` 然后，使用Canny算法检测图像边缘，这是边缘检测的经典方法，可以有效地区分图像中的边界。 ```python edged = cv2.Canny(blurred, 0, 50) ``` 保留一个Canny算法处理后的边缘图像副本，以便后续分析。 ```python orig_edged = edged.copy() ``` **步骤3：获取图像的大致轮廓** 找到边缘图像中的轮廓，只保留最大的那个，初始化扫描区域。OpenCV的`findContours`函数用于从二值图像中找出轮廓。这里，我们使用`RETR_LIST`模式以列表形式返回所有轮廓，`CHAIN_APPROX_NONE`表示保留每个轮廓的所有点。 ```python contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE) contours = sorted(contours, key=cv2.contourArea, reverse=True)[:1] ``` 通过排序和选择面积最大的轮廓，我们假设这个轮廓最接近文档的边界。在实际应用中，还需要进一步的处理，如四边形拟合，来精确地确定文档的边界框，以便裁剪出文档区域并进行OCR。对于OCR部分，通常会使用Tesseract或其他专门的OCR引擎来识别图像中的文字。这个过程涉及到了图像预处理、边缘检测、轮廓分析等关键技术，是实现文档扫描和OCR的基础步骤。通过优化这些步骤，我们可以提高识别的准确性和效率。

Opencv|Document Scanning & Optical Character Recognition

Opencv|Document Scanning & Optical Character Recognition(OCR)

Step 1. Import some packages and a pyfile named resize for the project.

import cv2

import numpy as np

import resize

Step 2. Import and preliminary processing of the image.

Read in the picture to be detected. If the resolution is good enough, we can also use the laptop camera.

image = cv2.imread('test.jpg')

image = cv2.resize(image, (1500, 1125))

orig = image.copy()

# Create a copy of the original image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Grayscale the image, and then perform line Gaussian blur to reduce noise

edged = cv2.Canny(blurred, 0, 50)

# Use canny algorithm for edge detection

orig_edged = edged.copy()

# Create a copy processed by the canny algorithm.

Step 3. Get approximate contours of the image.

Find the outline in the edge image, keep only the largest one, and initialize the screen outline.

contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)

# findContours() for finding contours from binary images

contours = sorted(contours, key=cv2.contourArea, reverse=True)

# Use the sorted function in python to return the results of contours

# Get approximate contours:

for c in contours:

p = cv2.arcLength(c, True)

# Calculate the circumference of the closed contour or the length of the curve

approx = cv2.approxPolyDP(c, 0.02 * p, True)

# Specify (0.02 * p) as precision to approximate the polygon curve. Because approximate curve is a closed curve, the parameter closed is True.

if len(approx) == 4:

target = approx

break

#Find the rectangle profile we are looking for.

Step 4. Create a function to rectify and resize the target image.

ps: Function rectify is stored in resize.py.

def rectify(h):

h = h.reshape((4, 2))

hnew = np.zeros((4, 2), dtype=np.float32)

add = h.sum(1)

hnew[0] = h[np.argmin(add)] # return the larger number

hnew[2] = h[np.argmax(add)] diff = np.diff(h, axis=1)

# Calculate the N-dimensional discrete difference along the specified axis.

hnew[1] = h[np.argmin(diff)] hnew[3] = h[np.argmax(diff)] # Determine the four vertices of the detected document.

return hnew

approx = resize.rectify(target)

Step 5. Map our target to a quadrilateral size of (400 * 600) after perspective transformation.

pts2 = np.float32([[0, 0], [400, 0], [400, 600], [0, 600]])

M = cv2.getPerspectiveTransform(approx, pts2)

#Use the gtePerspectiveTransform function to obtain the perspective transformation matrix.

#(approx is the four fixed-point collection positions of the quadrilateral in the source image; pts2 is the four fixed-point collection positions of the

target image.)

dst = cv2.warpPerspective(orig, M, (400,600))

# Use the warpPerspective function to perform perspective transformation on the source image, the output image dst size is 400 * 600.

Step 6. Use several different ways to optimize the perspective transformed image to obtain the final result.

We can also compare different ways of processing below to choose the properest one to be our final results. The results of image processing

are not shown in the article. If you are interested in it, just try it by yourself.

dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)

# Grayscale the image after perspective transformation

cv2.drawContours(image, [target], -1, (0, 255, 0), 2)

# Draw the outline, -1 means all the outlines, the color of the brush is green, and the thickness is 2.

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38650842

粉丝: 4
资源: 977

使用OpenCV进行文档扫描与OCR处理

opencv_java-jar&dll压缩包：64位和32位依赖文件

基于opencv的face_recognition人脸识别演示

BM方法实现OpenCV光流计算技术解析

Rui-Seoul-document-scanning.zip_图形图像处理_Java_

opencv READ_PT

MFC结合OpenCV实现图片与文本互转技术分享

免费的防止锁屏小软件，可用于域统一管控下的锁屏机制

Python代码实现带装饰的圣诞树控制台输出

白色大气风格的设计师作品模板下载.zip

电商平台开发需求文档.doc

最新资源