JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1
Static Hand Gesture Recognition with
Electromagnetic Scattered Field via Complex
Attention Convolutional Neural Network
Min Tan, Jian Zhou, Kuiwen Xu, Zhiyou Peng and Zhenchao Ma
Abstract—We present a novel learning-based static gesture
recognition framework using electromagnetic scattered field data,
which can efficiently address some significant issues in traditional
vision based recognition approaches. An end-to-end complex-
valued attention convolutional neural network is devised to train
the gesture recognizer, wherein the attention module is designed
to learn robust ROI-aware features. Extensive numerical experi-
ments are conducted on a public static hand gesture dataset. Both
full and limited aperture measurement with transverse magnetic
(TM) wave illumination are investigated. It is numerically shown
that: 1) both complex-valued convolutional and attention module
contribute to the excellent performance. The recognition accuracy
is above 99.0% for full aperture, and even about 95.32% under
the limited one-eighth aperture, respectively; and 2) the proposed
method not only has good scalability to the case with limited
aperture, but also performs much better than previous state-of-
the-art deep networks.
Index Terms—Attention module, complex-valued convolutional
network, EM wave sensing, scattered field, gesture recognition.
I. INTRODUCTION
H
AND gesture recognition serves as one of the most
efficient human-computer interaction manner, and has
wide applications in game and electronic (TV, DVD, etc.)
control, robot control, virtual reality environments, and natural
language communication [1], [2]. Although, it is a traditional
research topic, and much progress has been made in this
task, developing a robust gesture recognition system is still
very challenging. This is primarily because these traditional
recognition systems always take 2D RGB images as input,
and when viewed in 2D, these static hand gesture images
often suffer from poor light condition, bad viewpoint, clustered
background, etc. To address these issues, some researchers
begun to focus on gesture recognition from wearable 3D
sensors [3]. Though with high accuracy, these 3D gesture
wearable equipments are tedious on human bodies, and the
data involve expensive memory and computational cost.
With the rapid development of wireless communication
systems, electromagnetic (EM) wave sensing has been in-
creasingly used for many tasks [4], [5]. Compared with
RGB images, EM wave sensing or imaging is much less
sensitive to light conditions and viewpoints, and is easier to
handle scale-related issues owing to the non-contact feature.
In addition, EM wave response data outperforms RGB images
This work was supported by National Natural Science Foundation of
China (No. 61972119, No. 61602136, No. 81603198, No. 61601161, and No.
61806063), Zhejiang Provincial Natural Science Foundation of China (No.
LY19F020038 and LY19F010012). (Corresponding author: Kuiwen Xu.)
/ 3D gesture data in terms of non-contrast monitoring and
computational costs (e.g., storage, size). These advantages
greatly inspire researchers to use EM waves to facilitate object
sensing, recognition, reconstruction (electromagnetic inverse
problems), imaging, etc. Recently, Lan et al. begun to utilize
EM wave sensing to construct a relative shallow convolutional
neural network (CNN) for gesture recognition [6]. Indeed, the
work have shed some light on the territory of using EM wave
sensing for hand gesture recognition.
In this letter, we design an end-to-end complex-valued
convolutional neural network to learn the gesture recognizer
with scattered field data. The contributions of this work are
three-folds: 1) to the best of our knowledge, it is FIRST time to
apply the scattered field data at the single microwave frequen-
cy for hand gesture recognition with learning-based methods;
2) a novel complex-valued neural network is devised for the
scattered field data based recognition, wherein the 3-layer
convolutional with block attention module is designed. The
proposed approach outperforms many classical sophisticated
networks in terms of both the efficiency and accuracy; and 3)
under limited aperture, more than 95% recognition accuracy
can be achieved, which significantly inspires its application in
more research and industry community in the near future.
II. FORMULATION OF THE PROBLEM
A. Electromagnetic Scattering Forward Problem
Here, for convenience, a two-dimensional (2D) scenario
with transverse magnetic (TM) wave illumination is set up.
The permittivity, permeability and wave number of the homo-
geneous background medium (free space) are denoted as ε
0
,
µ
0
, and k
0
, respectively. The unknown targets (e.g., hand) are
located in the domain of interest (DoI) D in the free space.
A total number of N
i
incidences at r
j
(j = 1, 2, ..., N
i
) are
located from the measurement domain S outside the DoI. The
DoI is sequentially illuminated by TM polarized waves emitted
by each incidence. The scattered fields for each incidence are
collected by a total number of N
r
receivers located at r
q
(q = 1, 2, ..., N
r
) along a circular line on the S. Consequently,
for the full aperture measurement, a N
i
× N
r
multi-static
response (MSR) matrix is formed.
In order to get the scattered field, in the numerical tests, rect-
angular DoI is chosen to implement the conjugate-gradient fast
Fourier transform (CG-FFT) scheme in method of moments
(MoM) [7]. A total number of M = M
1
×M
2
(M
1
and M
2
are
the numbers of subunits along x- and y-axes) small subunits