Available online at www.ijpe-online.com
vol. 15, no. 1, January 2019, pp. 281-287
DOI: 10.23940/ijpe.19.01.p28.281287
* Corresponding author.
E-mail address: yaochao@nwpu.edu.cn
Can Machine Automatically Discover Text Image
from Overall Perspective
Wei Jiang
a
, Jiayi Wu
a
, and Chao Yao
b,*
a
School of Software, North China University of Water Resources and Electric Power, Zhengzhou, 450045, China
b
School of Automation, Northwestern Polytechnic University, Xi’an, 710071, China
Abstract
Recently, more and more researchers have focused on the problem about how to automatically distinguish text images from non-text ones.
Most of previous works have originated from local features, which are computational expensive, and usually employ GPU in their
procedure. To address this problem, we propose a new and simple but effective scheme from an overall perspective. In the proposed
scheme, a sort of holistic feature is first extracted from Fourier spectrum, which describes the characteristic of the image or the sub-image
as a whole without local feature extraction; then, random forests are utilized to classify images into text and non-text ones. Experimental
results in several public datasets demonstrate that this scheme is efficient and effective.
Keywords: natural images; holistic feature; text/non-text image classification; random forests
(Submitted on October 12, 2018; Revised on November 11, 2018; Accepted on December 23, 2018)
© 2019 Totem Publisher, Inc. All rights reserved.
1. Introduction
Text/non-text image classification is a helpful and significant problem, which can be applied into image or video retrieval
and management, road navigation, and so on. But, the problem is still open and challenging; it is gaining more and more
focus from researchers all over the world.
Text in the natural image usually carries a large amount of information, which could be useful in many applications,
such as image retrieval, scene analysis and so on. Therefore, text detection and recognition in the natural image have always
been hot research areas in computer vision. Since 2015, a new problem has been proposed; that is, how to automatically
distinguish text images from non-text ones in natural scene. It is significant and valuable to distinguish text image from non-
text images in natural scene. In social network, there are merely 10-15% images containing text [1]; therefore, it wastes a
large amount of time and expensive computational power to detect and recognize text directly in the image. If non-text
images are removed from the natural image with limited time and computational resource, a lot of time and computational
resource will be saved.
To tackle the text/non-text image classification problem, some attempts have been made. In light of a different category
of image, previous work could be divided into three parts: document image, video image and natural image.
For the document image, Alessi [2] tried to detect text candidate block and then discriminate text documents from non-
text documents with setting threshold. Indermuhle [3] and Vidya [4] both proposed a scheme to address text/non-text
regions classification problem in handwritten documents. The works mentioned above are only designed for the document
image, not for natural image.
For the video image, Shivakumara [5-7] proposed the methods that the video image was first divided into several
blocks, which were classified into text or non-text through clustering by wavelet or edge feature. Shivakumara’s works are