Image Retrieval using CNN and Low-level Feature
Fusion for Crime Scene Investigation Image Database
Ying Liu
123
, Yanan Peng
1 *
, Dan Hu
1
, Daxiang Li
123
, Keng-Pang Lim
13
, Nam Ling
34
1 Center for Image and Information Processing, Xi’an University of Posts and Telecommunications, Xi’an 710121, China
2 Key Laboratory of Electronic Information Application Technology for Scene Investigation, Ministry of Public Security, Xi’an,
710121, China
3 International Joint Research Center for Wireless Communication and Information Processing, Shaanxi, Xi’an, 710121, China
4 Department of Computer Engineering, Santa Clara University, California, 95053, USA
Abstract — Crime scene investigation (CSI) image retrieval is
used to search for crime evidences and is critical in helping in
solving various crimes. In recent years, using Convolutional
Neural Network (CNN) has demonstrated outstanding
performances in large-scale image database retrieval. However,
to prevent over-fitting in the training of CNN model due to
limited number of CSI images, this paper proposes to cascade
two CNN models obtained based on transfer learning and
combine CNN features with low-level image feature to better
describe CSI images. First, two pre-trained CNN models are
fine-tuned using the target image set. CNN features are
extracted from fully connected layer of each model and
are concatenated as high-level features for the image. These
concatenated CNN features are then fused with the low-level
image features of the target image set. The final fused image
features are used in the image retrieval. Experimental results on
CSI image database proved the effectiveness of the proposed
algorithm for limited number of training sets. In addition,
experiments carried out on the GHIM-10K database proved
the generalizability of the proposed algorithm.
Crime scene investigation (CSI) image is an important part
of the information collected at crime scenes. Classification
and retrieval of CSI images provide important clues and play
an important role in solving serial crimes [1]. Therefore, there
is an urgent need for an automatic and effective image
classification and retrieval system to quickly find relevant
images from a large number of CSI images to improve the
efficiency of the investigation while saving human power and
material resources.
Currently, there are few studies on CSI image retrieval.
Existing CSI image retrieval technologies can be divided into
two categories: CSI image retrieval based on low-level
features and that based on high-level semantics. CSI image
retrieval technology based on low-level features uses a
content-based image retrieval (CBIR) framework to extract
low-level features of the image (such as color histogram, gray
level co-occurrence matrix, Gabor features, wavelet texture
features, etc.) or to fuse different low-level features, which
confirms the feasibility of CBIR technology in CSI image
retrieval [2, 3]. In [4], the author proposes to combine low
level features of image dominant color descriptors as color
features, gray-level co-occurrence matrix as texture features
and the edge feature obtained by gradient vector flow to
improve CSI image retrieval performance. The disadvantage
is that the computation is complex and slow. In [5], an image
retrieval method based on regional semantic template is
proposed. First, the user submits the query image and the
region of interest, thereby constructing a regional semantic
template and performing pre-classification. Finally, the image
is sorted. Experiments show that the algorithm is effective to
improve the accuracy of CSI image retrieval. Ref. [6]
proposes a two-layer system for CSI image retrieval
frameworks. First, the corresponding feature database of the
CSI image database is computed, and a support vector
machine (SVM) classifier model that can achieve
multi-semantic classification is pre-trained. After the
investigator submits the retrieved images, SVM automatically
determines the semantic categories based on the image
features, and then it performs matching retrieval on the image
library containing only the semantics. Experimental results
show that this method outperforms the Query By Example
(QBE) method in multiple retrieval indexes, with significant
reduction in retrieval time, by half. It is also an effective
method to introduce relevant feedback (RF) into the CSI
image retrieval. In [7], RF is used to automatically adjust
the weights of shoe print features to improve precision.
Although the above method achieved some good results, they
lack the “semantic gap” which may improve the accuracy of
image retrieval significantly.
With the pioneering work by Hinton et al. [8] in 2006, deep
learning has developed rapidly in the recent decade. There are
several types of deep learning frameworks such as
convolutional neural networks (CNN) and deep belief
networks (DBN), applied to digit recognition [9], image
classification [10], face recognition [11], and other
applications with unprecedented success. Deep learning has a
wide range of applications in image classification and
retrieval as well. For example, in the ImageNet competition,
the accuracy of using traditional classifiers in 2010 (top 5
accuracy) was 71.8%, and in 2011 it was 74.3%. In 2012,
Hinton and his student Alex et al. used deep learning to
improve the accuracy rate to 84.7%. In the 2017 competition,
the final accuracy rate was as high as 97.3%. A. Babenko and
J. Donahue [12,13] extracted features of the CNN fully
connected layer as high-level semantic features for retrieval,
and also extracted image features from the convolutional layer
for retrieval [14], and achieved good results. Juan A. Carvajal
1208
Proceedings, APSIPA Annual Summit and Conference 2018 12-15 November 2018, Hawaii
978-988-14768-5-2 ©2018 APSIPA APSIPA-ASC 2018