there was still a lot of diversity among different people and
circumstances [40]. Transfer learning attempts to solve this
problem. In emotion recognition, transfer learning makes use
of the knowledge of the source domains (previous subjects) to
help improve the classification accuracy in the target domain
(the present subject) [41]. A recent study [20] offers us a
comprehensive survey on the transfer learning in BCIs. In
the traditional models, the most popular algorithms are based
on the common spatial pattern (CSP) [42–44], which aims at
finding an invariant subspace to project all subjects’ data to. In
this way, the shared part of different subjects’ data is pre-
served, and the difference is compensated. To our knowledge,
seldom works have been conducted concerning the transfer
methods in aBCIs. Lu et al. personalized the emotion classifier
trained on the previous subjects to the present person with
transductive parameter transfer (TPT) [39]. However, the
transfer methods are combined with shallow models, e.g.,
SVM. In our work, we will discuss the most popular transfer
method in DL, i.e., the fine-tune technique [45].
In this paper, we adopt DE to characterize EEG signals as it
has been proven to be suitable for emotion decoding in previ-
ous studies [10, 25, 26]. In order to maintain the information
contained in the positional relatio nship between the elec-
trodes, we organize data from different channels as 2-D sparse
maps to train HCNN classifiers on Delta, Theta, Alpha, Beta,
and Gamma bands. We implement the recognition models
with SVM and KNN and compare them with DL. We also
compare the performance of two deep models (HCNN and
SAE) in emotion decoding. We analyze the classification ac-
curacy across frequency bands to locate the critical bands. To
evaluate the transfer capability of HCNN parameters among
different peoples with fine-tune technique, we define three
training strategies. In the end, we visualize the evolution pro-
cess of representations along the hierarchical structures in 2-D
maps with PCA technique and discuss the information-
processing mechanism inside HCNN.
The rest of this paper is organized as follows: BMethods^
introduces the core methods we use. BExperiments^ describes
the processes in detail. BResults^ is the evaluation of the ex-
perimentation. BDiscussion^ is the analysis of the results and
BConclusion^ is the completion section of the study.
Methods
In this section, we first propose the general framework of
HCNN-based emotion recognition system, and then introduce
each module in sequence.
The General Framework for Emotion Recognition
The HCNN-based emotion recognition system is shown in
Fig. 1. The typical aBCI is a closed-loop system [46], where
feedbacks are involved. Emotion recognition is the key module
in aBCI. For illustration, we draw feedback in the figure. The
visual, together with auditory stimuli, is used to evoke the sub-
ject’s emotion, and EEG signals are recorded on 62 electrodes.
After some standard preprocessing procedures, we compute DE
features on all channels at certain time interval and organize
them as 2-D maps. After the sparse operation, we use HCNN
to process the input maps and classify emotion states.
Preprocessing
The raw EEG data are downsampled to 200-Hz sampling rate.
The signals are visually checked and the recordings seriously
contaminated by EMG and EOG are removed manually. With
the help of EOG recordings, the blink artifacts are located and
removed. In order to filter the noise and remove the artifacts,
the EEG data are processed with a bandpass filter between 0.3
and 50 Hz.
Short-Time Fourier Transform and Differential
Entropy
Fourier transform (FT) is often used to analyze the frequency
configuration of a time-domain signal, and it is widely used in
EEG decomposition. However, the FT operation assumes that
brain wave activities are stationary, which is a false hypothesis
apparently. Therefore, the time series should be cut into small
time segments, and within each segment, the brain electric
activities are approximately considered as stationary. The idea
is called the short-time Fourier transform (STFT). STFT de-
composes a function of time (in our case, EEG signal) into the
frequencies that make it up at fixed time intervals. The calcu-
lation formulation of STFT is as follows:
X τ; ωðÞ¼∫
∞
−∞
xtðÞω t−τðÞe
−jωt
dt; ð1Þ
where x(t) is the original signal and ω(t) is the window func-
tion. The Hanning window (shown in (2), a discrete version) is
a linear combination of modulated rectangular windows, and
it usually emerges in applications that require low aliasing and
less spectrum leakage.
ω nðÞ¼
1
2
1−cos
2πn
N−1
; ð2Þ
where n is the window length and N is the sampling number.
DE [25] is referred to as the extension of the Shannon entropy
hxðÞ¼−∑
N
i¼1
px
i
ðÞlog px
i
ðÞðÞ: ð3Þ
DE is the continuous version of Shannon entropy and the
original computing formulation could be written as the following:
hx
ðÞ
¼ −∫ fx
ðÞ
log fx
ðÞðÞ
dx: ð4Þ
Cogn Comput