使用OpenCV进行文档扫描与OCR处理步骤

90 浏览量更新于2024-08-29 收藏 580KB PDF 举报

"本资源主要涉及OpenCV在文档扫描与光学字符识别(OCR)中的应用。通过使用Python的OpenCV库和其他辅助包，实现对图像的预处理，包括图像缩放、灰度化、高斯模糊和边缘检测，进而提取文档的轮廓，最终可能进行OCR识别，提取图像中的文字信息。" 在文档扫描与光学字符识别中，OpenCV是一个强大的工具，它提供了丰富的图像处理功能。以下是基于给定内容的详细步骤和知识点解析： 1. 导入必要的包：首先，我们需要导入`cv2`（OpenCV的Python接口）、`numpy`（用于数组操作）和自定义的`resize`模块，可能用于调整图像尺寸。这些是进行图像处理的基本依赖。 2. 图像读取与初步处理：使用`cv2.imread()`读取待处理的图像。如果图像分辨率足够好，也可以直接使用笔记本电脑的摄像头获取图像。之后，将图像调整到特定尺寸（例如1500x1125），这有助于后续处理。`cv2.resize()`函数用于此目的。保持原始图像的副本，以备后续使用。 3. 图像转换：将彩色图像转换为灰度图像，使用`cv2.cvtColor()`函数，参数`cv2.COLOR_BGR2GRAY`完成此转换。这有助于减少颜色对边缘检测的影响，并简化图像。 4. 降噪：对灰度图像进行高斯模糊，使用`cv2.GaussianBlur()`函数，可以去除图像中的噪声，提高边缘检测的准确性。高斯滤波器能够平滑图像，降低高频噪声。 5. 边缘检测：应用Canny算法进行边缘检测，使用`cv2.Canny()`函数，找到图像中的边缘。这个步骤可以识别出图像中的边界，对于文档扫描尤其重要，因为它可以帮助确定文档的边界。 6. 轮廓提取：使用`cv2.findContours()`找出图像边缘中的轮廓。该函数返回一个轮廓列表和层次结构信息。这里选择`cv2.RETR_LIST`作为检索模式，意味着所有轮廓都被返回为一个列表，`cv2.CHAIN_APPROX_NONE`表示保存每个轮廓的所有点，以便保留所有细节。 7. 轮廓排序与筛选：对找到的轮廓进行排序，通常是为了选择最大的轮廓，这可能是文档的主要部分。这一步可能涉及到进一步的筛选，只保留最接近文档形状的轮廓。 8. OCR识别（未在给定内容中明确说明）：在轮廓提取后，可能使用OCR库（如Tesseract）对图像中的文字进行识别。首先，可能需要对图像进行额外的处理，如二值化或倾斜校正，以优化文字识别效果。然后，应用OCR引擎来识别并提取文本。总结，这个过程涉及到了OpenCV中的多个图像处理技术，包括读取、预处理、边缘检测和轮廓提取，这些都是文档扫描和OCR的基础。在实际应用中，可能还需要进一步优化，例如调整阈值、处理多页文档、识别不同语言的文本等。

Opencv|Document Scanning & Optical Character Recognition

Opencv|Document Scanning & Optical Character Recognition(OCR)

Step 1. Import some packages and a pyfile named resize for the project.

import cv2

import numpy as np

import resize

Step 2. Import and preliminary processing of the image.

Read in the picture to be detected. If the resolution is good enough, we can also use the laptop camera.

image = cv2.imread('test.jpg')

image = cv2.resize(image, (1500, 1125))

orig = image.copy()

# Create a copy of the original image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Grayscale the image, and then perform line Gaussian blur to reduce noise

edged = cv2.Canny(blurred, 0, 50)

# Use canny algorithm for edge detection

orig_edged = edged.copy()

# Create a copy processed by the canny algorithm.

Step 3. Get approximate contours of the image.

Find the outline in the edge image, keep only the largest one, and initialize the screen outline.

contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)

# findContours() for finding contours from binary images

contours = sorted(contours, key=cv2.contourArea, reverse=True)

# Use the sorted function in python to return the results of contours

# Get approximate contours:

for c in contours:

p = cv2.arcLength(c, True)

# Calculate the circumference of the closed contour or the length of the curve

approx = cv2.approxPolyDP(c, 0.02 * p, True)

# Specify (0.02 * p) as precision to approximate the polygon curve. Because approximate curve is a closed curve, the parameter closed is True.

if len(approx) == 4:

target = approx

break

#Find the rectangle profile we are looking for.

Step 4. Create a function to rectify and resize the target image.

ps: Function rectify is stored in resize.py.

def rectify(h):

h = h.reshape((4, 2))

hnew = np.zeros((4, 2), dtype=np.float32)

add = h.sum(1)

hnew[0] = h[np.argmin(add)] # return the larger number

hnew[2] = h[np.argmax(add)] diff = np.diff(h, axis=1)

# Calculate the N-dimensional discrete difference along the specified axis.

hnew[1] = h[np.argmin(diff)] hnew[3] = h[np.argmax(diff)] # Determine the four vertices of the detected document.

return hnew

approx = resize.rectify(target)

Step 5. Map our target to a quadrilateral size of (400 * 600) after perspective transformation.

pts2 = np.float32([[0, 0], [400, 0], [400, 600], [0, 600]])

M = cv2.getPerspectiveTransform(approx, pts2)

#Use the gtePerspectiveTransform function to obtain the perspective transformation matrix.

#(approx is the four fixed-point collection positions of the quadrilateral in the source image; pts2 is the four fixed-point collection positions of the

target image.)

dst = cv2.warpPerspective(orig, M, (400,600))

# Use the warpPerspective function to perform perspective transformation on the source image, the output image dst size is 400 * 600.

Step 6. Use several different ways to optimize the perspective transformed image to obtain the final result.

We can also compare different ways of processing below to choose the properest one to be our final results. The results of image processing

are not shown in the article. If you are interested in it, just try it by yourself.

dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)

# Grayscale the image after perspective transformation

cv2.drawContours(image, [target], -1, (0, 255, 0), 2)

# Draw the outline, -1 means all the outlines, the color of the brush is green, and the thickness is 2.

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38606811

粉丝: 6
资源: 982

使用OpenCV进行文档扫描与OCR处理步骤

Rui-Seoul-document-scanning.zip_图形图像处理_Java_

使用OpenCV进行文档扫描与OCR处理

opencv READ_PT

【java毕业设计】spingboot茶文化推广系统(springboot+vue+mysql+说明文档).zip

EasyConnect需要的动态依赖库

使用ZXing识别条码二维码（简单的实现）.zip

手机通讯集信息,文件,语音呼叫于一体(小鸟版).zip

Go - Go语言开发 - 算法

网上商城设计微信小程序+ssm

2023全球人工智能研究院观点报告：生成式人工智能对企业的影响和商业前景

最新资源