IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO. 12, DECEMBER 2019 1907
A Hybrid R-BILSTM-C Neural Network Based
Text Steganalysis
Yan Niu, Juan We n , Ping Zhong , and Yiming Xue , Member, IEEE
Abstract—With the emergence of the generation-based steganog-
raphy, the traditional text steganalysis methods show the unsatis-
factory detection performance as the manually extracted features
are simple and non-universal. The recently proposed deep learning-
based text steganalysis methods can obtain the great detection ac-
curacy by extracting the high-level features. In this letter, a hybrid
text steganalysis method (R-BILSTM-C) is proposed through com-
bining the advantages of Bidirectional Long Short Term Memory
Recurrent Neural Network (Bi-LSTM) and Convolutional Neural
Network (CNN). The proposed method can efficiently capture
both local features and long-term semantic information from text
to improve the detection accuracy. In the proposed method, the
Bi-LSTM architecture is used to capture the long-term semantic
information of texts. And the asymmetric convolution kernels with
different sizes are applied to extract the local relationship between
words. In addition, the high dimensional semantic feature space
is visualized. Experimental results show that the proposed method
adapts to the different steganographic algorithms efficiently, and
achieves the comparable or superior detection performance for the
various sentence lengths compared with other state-of-the-art text
steganalysis methods.
Index Terms—Text steganalysis, Bi-LSTM, CNN, long-term
semantic feature, local feature.
I. INTRODUCTION
L
INGUISTIC steganography that embeds the secret in-
formation into texts has attracted widespread attention
as the most frequently used texts in daily life can provide
a large number of carriers for text steganography. Gener-
ally, the linguistic steganography can be roughly divided into
two main sorts: embedded-steganographic algorithms [1]–[3]
and generation-based steganographic algorithms [4]–[6]. In the
embedded-steganographic algorithms, the synonym substitution
based steganography is widely used as it is hardly to cause
the semantic changes after substitution. The generation-based
steganography utilizes the powerful feature extraction and ex-
pression abilities of neural networks to acquire statistical and
Manuscript received August 19, 2019; revised October 24, 2019; accepted
November 4, 2019. Date of publication November 18, 2019; date of current
version December 12, 2019. This work was supported by the National Natural
Science Foundation of China under Grant 61872368 and Grant 61802410. The
associate editor coordinating the review of this manuscript and approving it for
publication was Dr. Roberto Caldelli. (Corresponding author: Ping Zhong.)
Y. Niu, J. Wen, and Y. Xue are with the College of Information and Electrical
Engineering, China Agricultural University, Beijing 100083, China (e-mail:
niuyan@cau.edu.cn; wenjuan@cau.edu.cn; xueym@cau.edu.cn).
P. Zhong is with the College of Science, China Agricultural University, Beijing
100083, China (e-mail: zping@cau.edu.cn).
Digital Object Identifier 10.1109/LSP.2019.2953953
semantic features of the large number of training samples, and
then generates the high-quality steganographic texts.
As the counter-technique of steganography, text steganalysis
that aims to detect the existence of secret messages in the
text has been rapidly developed. Most of the traditional text
steganalysis methods are proposed based on the general ma-
chine learning framework [7]–[14]. However, these traditional
steganalysis methods are difficult to adapt to the different kinds
of steganographic algorithms since they are designed based
on the statistical changes caused by a specific steganography.
And they show the unsatisfactory detection performance for
the latest generation-based text steganographic algorithms as
the manually extracted features, such as word frequency dis-
tribution [8]–[11], and context fitness [10], are simple and
non-universal. With the development of the generation-based
text steganography [4]–[6], some researchers have studied the
text steganalysis algorithms based on deep learning [15]–[17].
Wen et al. [15] propose a text steganalysis model to capture the
local correlations between words based on CNN. Yang et al. [16]
utilize the strong feature expression capability of the Recurrent
Neural Networks (RNNs) to extract the long-term semantic
features. Although the current deep learning-based steganalysis
methods have achieved the great detection performance for
distinguishing the stego texts through extracting the high-level
features, they can be still improved. Notice that CNN is able
to capture local semantic correlations of texts but it does not
perform well in learning long-term sequential information, while
RNN is ideal for processing sequences of any length [18]. And
the Long Short Term Memory (LSTM), as a variant of RNN, is
able to capture long-term contextual dependency and solve the
problem of the vanishing gradient of the RNN.
In this letter, we propose a hybrid and universal text steganal-
ysis algorithm based on deep learning, named R-BILSTM-C,
to extract the local and global features by combining Bi-LSTM
with CNN. The proposed text steganalysis scheme finds out the
subtle differences in semantic spatial distribution before and af-
ter embedding the secret messages. It converts each sentence into
the corresponding matrix by the fusion strategy in the word em-
bedding layer firstly, and then concatenates the forward semantic
features and back semantic information by Bi-LSTM to better
express the long-term contextual features and the word order in-
formation. Inspired by Inception modules in [19], we employ the
asymmetric convolution kernels with different sizes to extract
the local features, which can not only improve the performance
of the model, but also accelerate the training process and relieve
over-fitting by reducing a large number of parameters. Thus,
1070-9908 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.