使用OpenCV进行文档扫描与OCR处理

需积分: 0 200 浏览量更新于2024-08-29 收藏 580KB PDF 举报

身份认证购VIP最低享 7 折!

30元优惠券

"本文将介绍如何使用OpenCV库进行文档扫描和光学字符识别(OCR)。首先，导入必要的包和一个名为resize的py文件。接着，读取图像并进行初步处理，包括灰度化、降噪和边缘检测。然后，找到图像的大致轮廓，并对轮廓进行筛选以获取文档边界。最后，将对图像进行透视变换，以便于OCR识别。" 在计算机视觉领域，OpenCV是一个强大的开源库，广泛应用于图像处理和计算机视觉任务，包括文档扫描和OCR。文档扫描主要涉及将纸质文档转换为数字化图像，而OCR则能识别这些图像中的文本内容。 **步骤1：导入必要的包** `import cv2` 和 `import numpy as np` 是OpenCV和Numpy库，它们是处理图像和数值计算的基础。`import resize` 引入了一个用于调整图像大小的自定义py文件，这在处理不同分辨率的图片时很有用。 **步骤2：图像的预处理** - `cv2.imread('test.jpg')` 用于读取图像。如果图像分辨率较高，也可以直接使用笔记本摄像头。 - `cv2.resize(image,(1500,1125))` 将图像调整到特定尺寸，以确保后续处理的一致性。 - `orig=image.copy()` 创建原始图像的副本，用于保留未处理的图像。 - `cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)` 转换图像到灰度模式，简化处理过程。 - `cv2.GaussianBlur(gray,(5,5),0)` 使用高斯模糊去除噪声，保持图像边缘清晰。 - `cv2.Canny(blurred,0,50)` 应用Canny算法进行边缘检测，找出图像中的显著边缘。 **步骤3：获取图像的轮廓** - `cv2.findContours()` 函数用于查找边缘图像中的轮廓。这里使用 `cv2.RETR_LIST` 和 `cv2.CHAIN_APPROX_NONE` 参数来获取所有轮廓的完整信息。 - 找到的轮廓通过排序，保留最大的轮廓，这通常对应于文档的边界。 **步骤4：透视变换和OCR** 在找到文档的边界后，可以使用透视变换将图像校正为直角坐标，使得文字呈现水平或垂直排列，这有助于提高OCR的准确性。接下来，可以利用OpenCV的`cv2.warpPerspective()`函数进行变换。然后，使用OCR库（如Tesseract）对处理后的图像进行识别，提取出图像中的文本。在这个过程中，需要注意的是，预处理的质量直接影响OCR的识别率。优化图像的亮度、对比度，以及有效地消除噪声，都能显著提高识别效果。此外，对于多页文档，可能需要使用更复杂的算法来检测和分割每一页。总结来说，OpenCV结合适当的预处理和OCR技术，能够实现高效且准确的文档扫描和文本识别。这个过程涉及到图像处理的多个方面，包括读取、转换、降噪、边缘检测、轮廓识别和几何变换，以及与OCR库的集成。理解并掌握这些步骤，对于开发自己的文档处理系统至关重要。

资源详情

资源推荐

Opencv|Document Scanning & Optical Character Recognition

Opencv|Document Scanning & Optical Character Recognition(OCR)

Step 1. Import some packages and a pyfile named resize for the project.

import cv2

import numpy as np

import resize

Step 2. Import and preliminary processing of the image.

Read in the picture to be detected. If the resolution is good enough, we can also use the laptop camera.

image = cv2.imread('test.jpg')

image = cv2.resize(image, (1500, 1125))

orig = image.copy()

# Create a copy of the original image.

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

blurred = cv2.GaussianBlur(gray, (5, 5), 0)

# Grayscale the image, and then perform line Gaussian blur to reduce noise

edged = cv2.Canny(blurred, 0, 50)

# Use canny algorithm for edge detection

orig_edged = edged.copy()

# Create a copy processed by the canny algorithm.

Step 3. Get approximate contours of the image.

Find the outline in the edge image, keep only the largest one, and initialize the screen outline.

contours, hierarchy = cv2.findContours(edged, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)

# findContours() for finding contours from binary images

contours = sorted(contours, key=cv2.contourArea, reverse=True)

# Use the sorted function in python to return the results of contours

# Get approximate contours:

for c in contours:

p = cv2.arcLength(c, True)

# Calculate the circumference of the closed contour or the length of the curve

approx = cv2.approxPolyDP(c, 0.02 * p, True)

# Specify (0.02 * p) as precision to approximate the polygon curve. Because approximate curve is a closed curve, the parameter closed is True.

if len(approx) == 4:

target = approx

break

#Find the rectangle profile we are looking for.

Step 4. Create a function to rectify and resize the target image.

ps: Function rectify is stored in resize.py.

def rectify(h):

h = h.reshape((4, 2))

hnew = np.zeros((4, 2), dtype=np.float32)

add = h.sum(1)

hnew[0] = h[np.argmin(add)] # return the larger number

hnew[2] = h[np.argmax(add)] diff = np.diff(h, axis=1)

# Calculate the N-dimensional discrete difference along the specified axis.

hnew[1] = h[np.argmin(diff)] hnew[3] = h[np.argmax(diff)] # Determine the four vertices of the detected document.

return hnew

approx = resize.rectify(target)

Step 5. Map our target to a quadrilateral size of (400 * 600) after perspective transformation.

pts2 = np.float32([[0, 0], [400, 0], [400, 600], [0, 600]])

M = cv2.getPerspectiveTransform(approx, pts2)

#Use the gtePerspectiveTransform function to obtain the perspective transformation matrix.

#(approx is the four fixed-point collection positions of the quadrilateral in the source image; pts2 is the four fixed-point collection positions of the

target image.)

dst = cv2.warpPerspective(orig, M, (400,600))

# Use the warpPerspective function to perform perspective transformation on the source image, the output image dst size is 400 * 600.

Step 6. Use several different ways to optimize the perspective transformed image to obtain the final result.

We can also compare different ways of processing below to choose the properest one to be our final results. The results of image processing

are not shown in the article. If you are interested in it, just try it by yourself.

dst = cv2.cvtColor(dst, cv2.COLOR_BGR2GRAY)

# Grayscale the image after perspective transformation

cv2.drawContours(image, [target], -1, (0, 255, 0), 2)

# Draw the outline, -1 means all the outlines, the color of the brush is green, and the thickness is 2.

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38571878

粉丝: 5
资源: 935

使用OpenCV进行文档扫描与OCR处理

opencv READ_PT

c == || &&

python2版本的opencv自带的识别字符函数

外文文献 基于OpenCV的图像处理技术以及谷歌开源的图像识别工具Tesseract

那如何正确编译opencv呢

ubuntu opencv配置

som opencv

基于yolov5和opencv实现仪表读数

ubuntu18.04下安装opencv

opencv ocr

php opencv识别车牌号

命令行安装opencv

ubuntu安装opencv库

opencv 识别车牌 字符

如何重装opencv库

docker安装OpenCv

ubuntu 编译 opencv3.4.16

linux opencv opencv_contrib安装

opencv车牌识别pycharm

ubuntu下安装opencv4.6.0,安装后测试opencv

最新资源

外文文献基于OpenCV的图像处理技术以及谷歌开源的图像识别工具Tesseract

opencv 识别车牌字符