Pornographic images recognition based on spatial pyramid partition and
multi-instance ensemble learning
Daxiang Li
a,c,
⇑
,NaLi
a,c
, Jing Wang
b
, Tingge Zhu
a,c
a
School of Telecommunication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
b
Computer Graphics, Imaging and Vision (CGIV) Research Group, School of Computing and Engineering University of Huddersfield, United Kingdom
c
Key Laboratory of Electronic Information Application Technology for Scene Investigation, Ministry of Public Security, Xi’an 710121, China
article info
Article history:
Received 21 September 2013
Received in revised form 13 April 2015
Accepted 18 April 2015
Available online 28 April 2015
Keywords:
Multi-instance learning
Pornographic images recognition
Extreme learning machine
abstract
For tackling the problem of pornographic image recognition, a novel multi-instance learning (MIL) algo-
rithm is proposed by using extreme learning machine (ELM) and classifiers ensemble. Firstly, a spatial
pyramid partition-based (SPP) multi-instance modeling technique has been deployed to transform the
pornographic images recognition problem into a typical MIL problem. The method has deployed a bag
corresponding to an image and an instance corresponding to each partitioned sub-block described by
low-level visual features (i.e. color, texture and shape). Secondly, a collection of visual word (VW) has
been generated by using hierarchical k-mean clustering method, and then based on the fuzzy member-
ship function between instance and VW, a fuzzy histogram fusion-based metadata calculation method
has been proposed to convert each bag to a single sample, which allows the MIL problem to be solved
directly by a standard single instance learning (SIL) machine. Finally, by using ELM, a group of base clas-
sifiers with different number of hidden nodes have been constructed, and their weights bas been dynam-
ically determined by using performance weighting rule. Therefore, the strategy of classifiers ensemble is
used to improve the overall adaptability of proposed ELMCE-MIL algorithm. Experimental results have
shown that the method is robust, and its performance is superior to other similar algorithms.
Ó 2015 Elsevier B.V. All rights reserved.
1. Introduction
With the rapid development of computer and communication
technologies, the Internet has become one of the primary resources
to access variable information and knowledge. However, the ocean
of information from the Internet not only contains useful knowl-
edge that people need, but also rapidly increasing objectionable
messages, such as violence, pornography, and rumors. Because
image contains much more semantic information than texts does,
the pornographic images have the most straightforward impact to
the teenager’s physical and mental health. Thus, purifying the net-
work environment and keeping teenagers away from pornographic
web pages becomes a serious challenge to their family and our
society. Therefore, an automated, effective and accurate porno-
graphic images recognition method, which can prevent teenagers
from browsing online pornographic images, becomes very impor-
tant and necessary [1,2].
Blacklist and keyword-based approaches are two kinds of com-
mon pornographic image recognition methods [3–6]. The blacklist-
based methods block all the accesses to the websites in a list of
URLs where pornographic contents have been reported. However,
since the highly dynamic feature of the model network system,
the methods cannot filter all the URLs containing pornographic
contents effectively. On the other hand, keyword-based methods
attempt to filter images by analyzing the sensitive texts from web-
pages. But many words belonging to the pornographer’s lexicon
can be also appeared in webpages for other purpose (i.e. the edu-
cational sites about breast cancer). In addition, some pornographic
websites embedding their text into images which make the key-
word-based analysis impossible to be applied. Therefore, in recent
years, content-based pornographic image recognition technology
becomes a hot research topic for pornographic images recognition,
which has better adaptability when identifying pornographic
content from webpages.
In recent years, many content-based pornographic image recog-
nition methods have been proposed and they can be roughly
categorized into three groups [7–9], which are model-based,
feature-based and region-based methods. (1) Model-based method
was first proposed by Fleck and Forsyth in 1996 [10,11]. The
http://dx.doi.org/10.1016/j.knosys.2015.04.014
0950-7051/Ó 2015 Elsevier B.V. All rights reserved.
⇑
Corresponding author at: School of Telecommunication and Information
Engineering, Xi’an University of Posts and Telecommunications, Xi’an 710121,
China.
E-mail address: 389097028@qq.com (D. Li).
Knowledge-Based Systems 84 (2015) 214–223
Contents lists available at ScienceDirect
Knowledge-Based Systems
journal homepage: www.elsevier.com/locate/knosys