傅立叶光谱与随机森林：全局识别文本图像的新方法

153 浏览量更新于2024-08-26 收藏 1.25MB PDF 举报

"这篇论文探讨了机器从全局视角自动识别文本图像的问题，针对以往依赖局部特征、计算成本高昂且常需GPU支持的挑战，提出了一个新颖而高效的解决方案。该方案通过傅立叶光谱提取整体特征，再用随机森林进行文本与非文本图像分类。实验表明，这种方法在多个公共数据集上的表现有效。" 在当前的计算机视觉领域，自动识别文本图像的能力具有重要的实际应用价值，如文档处理、图像理解以及智能搜索等。然而，传统的文本图像识别方法往往基于局部特征，例如边缘检测、纹理分析和角点检测，这些方法在计算上非常耗时，且通常需要高性能的GPU硬件支持。这不仅增加了系统的运行成本，也限制了实时性和便携性。为了解决这一问题，研究者们提出了一种新的全局视角的识别策略。该策略的核心在于从傅立叶光谱中提取整体特征。傅立叶变换是一种强大的工具，能将图像从空间域转换到频率域，揭示图像的频谱信息。在文本图像中，由于字符的结构和分布，其傅立叶光谱往往呈现出特定的模式。因此，通过分析这种全局的傅立叶特征，可以有效地捕捉图像的整体特性，而无需进行复杂的局部特征提取。在提取了整体特征后，论文采用了随机森林算法来进行图像分类。随机森林是一种集成学习方法，由多个决策树构成，能够处理高维度数据并减少过拟合的风险。在文本与非文本图像的分类任务中，随机森林可以根据傅立叶特征对图像进行高效且准确的判断。实验结果显示，这个新方案在多个公共数据集上的性能优越，证明了从整体角度识别文本图像的有效性。这不仅降低了计算复杂度，减少了对GPU的依赖，还提高了识别的速度和准确性。这种方法的出现，为文本图像识别提供了一个更简洁、成本效益更高的途径，对于推动文本检测技术的发展具有积极意义。这篇研究工作为机器自动发现文本图像提供了一个全新的视角，通过全局特征提取和随机森林分类，解决了局部特征方法的局限性，提高了文本图像识别的效率和实用性。未来的研究可能会进一步优化这种全局特征表示，或者结合其他机器学习模型，以适应更复杂和多样化的文本图像识别场景。

Available online at www.ijpe-online.com

vol. 15, no. 1, January 2019, pp. 281-287

DOI: 10.23940/ijpe.19.01.p28.281287

* Corresponding author.

E-mail address: yaochao@nwpu.edu.cn

Can Machine Automatically Discover Text Image

from Overall Perspective

Wei Jiang

, Jiayi Wu

, and Chao Yao

b,*

School of Software, North China University of Water Resources and Electric Power, Zhengzhou, 450045, China

School of Automation, Northwestern Polytechnic University, Xi’an, 710071, China

Abstract

Recently, more and more researchers have focused on the problem about how to automatically distinguish text images from non-text ones.

Most of previous works have originated from local features, which are computational expensive, and usually employ GPU in their

procedure. To address this problem, we propose a new and simple but effective scheme from an overall perspective. In the proposed

scheme, a sort of holistic feature is first extracted from Fourier spectrum, which describes the characteristic of the image or the sub-image

as a whole without local feature extraction; then, random forests are utilized to classify images into text and non-text ones. Experimental

results in several public datasets demonstrate that this scheme is efficient and effective.

Keywords: natural images; holistic feature; text/non-text image classification; random forests

(Submitted on October 12, 2018; Revised on November 11, 2018; Accepted on December 23, 2018)

1. Introduction

Text/non-text image classification is a helpful and significant problem, which can be applied into image or video retrieval

and management, road navigation, and so on. But, the problem is still open and challenging; it is gaining more and more

focus from researchers all over the world.

Text in the natural image usually carries a large amount of information, which could be useful in many applications,

such as image retrieval, scene analysis and so on. Therefore, text detection and recognition in the natural image have always

been hot research areas in computer vision. Since 2015, a new problem has been proposed; that is, how to automatically

distinguish text images from non-text ones in natural scene. It is significant and valuable to distinguish text image from non-

text images in natural scene. In social network, there are merely 10-15% images containing text [1]; therefore, it wastes a

large amount of time and expensive computational power to detect and recognize text directly in the image. If non-text

images are removed from the natural image with limited time and computational resource, a lot of time and computational

resource will be saved.

To tackle the text/non-text image classification problem, some attempts have been made. In light of a different category

of image, previous work could be divided into three parts: document image, video image and natural image.

For the document image, Alessi [2] tried to detect text candidate block and then discriminate text documents from non-

text documents with setting threshold. Indermuhle [3] and Vidya [4] both proposed a scheme to address text/non-text

regions classification problem in handwritten documents. The works mentioned above are only designed for the document

image, not for natural image.

For the video image, Shivakumara [5-7] proposed the methods that the video image was first divided into several

blocks, which were classified into text or non-text through clustering by wavelet or edge feature. Shivakumara’s works are

下载后可阅读完整内容，剩余6页未读，立即下载

weixin_38624437

粉丝: 4
资源: 925

傅立叶光谱与随机森林：全局识别文本图像的新方法

集装箱号识别数据集：海量图像与自动识别技术

机器学习驱动的中小学数学自动阅卷系统研究

NI Vision Assistant中文教程：图像处理与分析

基于机器视觉和神经网络的低质量文本识别研究.pdf

【colorsys与机器学习】：将颜色转换用于图像识别，颜色转换在图像处理中的机器学习应用

Selenium与人工智能结合：图像识别自动化测试

【自动化图像处理的秘密】：Image-Pro Plus 6.0 宏命令与脚本编写指南

图像识别图像理解探索：探索图像理解技术在图像识别中的应用

Python图像处理：用代码玩转图像，释放图像处理的无限可能

Pillow图像滤镜应用：改善图像视觉效果

最新资源