L. Du, A.T.S. Ho and R. Cong Signal Processing: Image Communication 81 (2020) 115713
Fig. 4. Examples of tampering localization results for QFT based method proposed
in [34]: (a) source image; (b) normalized reconstructed image corresponding; (c)
tampered image with attack; (d) normalized reconstructed images corresponding to
(c); (e) binary map; (f) detected tampered region.
defined in polar coordinates, in which 𝑓
𝑟
, 𝑓
𝑔
and 𝑓
𝑏
respectively repre-
sents the red, green and blue channel of image 𝑓 (𝑥, 𝑦). They first defined
a quaternion image that combined multiple features and reconstructed
a stable image by quaternion low-pass filter. After obtaining the stable
image, they divided the image into nonoverlapping blocks and used
a local binary coding to represent each block. For tamper detection,
a multiscale difference map fusion approach was investigated to fuse
difference maps, resulting from the analysis of the subtraction between
two binary maps with different sliding windows.
Remarks. These approaches mainly depend on the properties of the
applied transforms. The input image undergoes frequency transforma-
tion to make extracted features depend on the values of the image
frequency coefficients. The principal aim of the transforms is to make
all extracted features depend upon the values of its frequency coeffi-
cients in the transform space. Currently, most features are only robust
against for one or several types of attacks. It may not be feasible to
extract one absolute robust feature which can satisfy different scenar-
ios. It is worth mentioning that the image feature hash construction
via QFT is an effective way to fulfill the requirement of processing
different features in a holistic manner. A brief summary of invariant
feature transform based methods is presented in Table 3.
3.2. Local feature points based methods
Local feature patterns are a group of important robust features for
generating image hashes. Local feature patterns usually include edges,
corners, blobs, salient regions and so on. Since image hashes should be
invariant to content-preserving processing, robust repeatable features
with small computations are desired.
Yang et al. [38] proposed content based image hashing using com-
panding and gray code. Morlet wavelet coefficients were used at feature
points to generate robust image features. Then, they combined robust
feature point detector and robust content singularity descriptor at these
feature points. Finally, the Morlet wavelet coefficients are quantized
and coded by using companding and Gray code. Morlet wavelet is a
continuous wavelet with single-frequency sinusoidal Gaussian function.
Morlet wavelet is used to detect linear structures perpendicular to the
orientation of the wavelet. 2D Morlet wavelet is defined as
𝜑
𝑀
(𝑅) = (𝑒
𝑖𝑉
0
𝑅
− 𝑒
−1∕2
𝑉
0
2
)𝑒
−1∕2
𝑅
2
(14)
where 𝑅 = (𝑟
1
, 𝑟
2
) is the 2-dimensional spatial coordinates, and 𝑉
0
=
(𝑣
1
, 𝑣
2
) is the wave-vector of the mother wavelet. Liu et al. [39] pro-
posed a SIFT operator based hash algorithm, which was mainly focused
on the robustness against geometric attacks. For decision making, a
generalized set distance based matching operation was designed. Then,
Lv et al. [40] proposed a novel shape contexts based image hashing
approach using robust local feature points. They first used scale in-
variant feature transform (SIFT) to detect robust feature points and
incorporated the Harris criterion to select the most stable points. To
characterize local information, they introduced the shape contexts into
hash generation to represent the geometric distribution of the detected
feature points. They used a descriptor to represent these feature points
as an unique signature. Local extremum search is performed on a series
of difference-of-Gaussian (DOG) images in the scale space 𝜎, and local
feature points are obtained as candidate points for scale-invariant key
points.
The construction of DOG is shown as follows: Image 𝐼(𝑥, 𝑦) is first
convolved with a series of Gaussian kernel functions 𝐺(𝑠, 𝑦, 𝜎), the scale
of 𝜎 = (𝜎
1
, 𝜎
2
, … , 𝜎
𝑛
) is continuously increasing, where 𝜎
1
< 𝜎
2
< ⋯ 𝜎
𝑛
𝐿(𝑥, 𝑦, 𝜎) = 𝐺(𝑥, 𝑦, 𝜎) ∗ 𝐼(𝑥, 𝑦) (15)
Then, a DOG are generated by two Gaussian blurred images with
nearby scales 𝑐𝜎 and 𝜎 as
𝐷(𝑥, 𝑦, 𝜎) = 𝐿(𝑥, 𝑦, 𝑐𝜎) − 𝐿(𝑥, 𝑦, 𝜎)
= (𝐺(𝑥, 𝑦, 𝑐𝜎) − 𝐺(𝑥, 𝑦, 𝜎)) ∗ 𝐼(𝑥, 𝑦)
(16)
It provides a close approximation of the scale-normalized Laplacian of
Gaussian
𝐺(𝑥, 𝑦, 𝑐𝜎) − 𝐺(𝑥, 𝑦, 𝜎) ≈ (𝑐 − 1)𝜎
2
▿
2
𝐺 (17)
Substituting (16) into (15) and using the property of convolution, we
could obtain that
𝐷(𝑥, 𝑦, 𝜎) ≈ (𝑐 − 1)𝜎
2
▿
2
∗ 𝐼(𝑥, 𝑦)
= (𝑐 − 1)𝜎
2
𝐺 ∗ ▿
2
𝐼(𝑥, 𝑦)
(18)
where ▿
2
𝐼(𝑥, 𝑦), i.e., ▿
2
= 𝜕
2
∕𝜕𝑥
2
+ 𝜕
2
∕𝜕𝑦
2
, is the Laplacian operator.
Wang et al. [41] proposed an image forensic signature for content
authenticity analysis. In the proposed method, adaptive Harris corner
detection algorithm was used to extract image feature points. Corner
detection is a method used in computer vision systems to extract
specific types of features and infer image content. In various corner
detection methods, a typical algorithm is Harris operator. Denotes an
image 𝐼, 𝐼
𝑥
, 𝐼
𝑦
represent the gradients of image gray value on horizon-
tal and vertical direction, respectively. Discrete two-dimensional zero
means Gaussian kernel function is:
𝐺(𝜎) =
1
2𝜋𝜎
2
exp
−
(𝑥
2
+ 𝑦
2
)
2𝜎
2
(19)
Let 𝐴 = 𝐺(𝜎) ⊗ 𝐼
2
𝑥
, 𝐵 = 𝐺(𝜎) ⊗ 𝐼
2
𝑦
, 𝐶 = 𝐷 = 𝐺(𝜎) ⊗ 𝐼
𝑥
𝐼
𝑦
, Harris corner
detection function is:
𝑅 =
𝐴 × 𝐵 − (𝐶 × 𝐷)
2
− 𝑐 ⋅ (𝐴 + 𝐵)
2
(20)
Here, 𝜎 is a scale parameter, ⊗ is convolution operator, 𝑐 is a constant.
For each feature point, they defined a circular feature point neigh-
borhood, and computed the mean and variance of the vector con-
structed by pixel gray value in the neighborhood. These statistics of
feature point neighborhood, together with the position coordinates
of feature point, were used to construct forensic signature by using
Huffman coding. By using the Fisher criterion, it provided an adaptive
method to generate the signature matching threshold value. However,
just as other feature point based image hash methods, the hash size was
depended on the image size and texture.
Image hashing using feature points has limitations when considering
the distortions of additive noise and blurring in large scale. This is
due to the detected key points are not exactly the same as the original
image. In order to address these limitations, Yan et al. [42,43] proposed
a multi-scale image hashing method by using the location-context in-
formation of the features generated by adaptive local feature extraction
techniques. Firstly, they produced multiple content-preserving attacked
images, and then extracted SIFT features. The SIFT feature points
were matched with the corresponding feature points extracted from
the host image by a matching algorithm. Thus, the adaptive feature
points together with their corresponding descriptor were generated,
which were more robust for hashing generation. Finally, the Round
5