STACKED SPARSE AUTOENCODER (SSAE) BASED FRAMEWORK FOR NUCLEI PATCH
CLASSIFICATION ON BREAST CANCER HISTOPATHOLOGY
Jun Xu
1
, Lei Xiang
1
, Renlong Hang
1
, Jianzhong Wu
2
1
Nanjing University of Information Science and Technology, Nanjing 210044, China.
2
Jiangsu Cancer Hospital, Nanjing 210000, China.
ABSTRACT
In this paper, a Stacked Sparse Autoencoder (SSAE) based
framework is presented for nuclei classification on breast
cancer histopathology. SSAE works very well in learning
useful high-level feature for better representation of input
raw data. To show the effectiveness of proposed framework,
SSAE+Softmax is compared with conventional Softmax clas-
sifier, PCA+Softmax, and single layer Sparse Autoencoder
(SAE)+Softmax in classifying the nuclei and non-nuclei
patches extracted from breast cancer histopathology. The
SSAE+Softmax for nuclei patch classification yields an ac-
curacy of 83.7%, F1 score of 82%, and AUC of 0.8992,
which outperform Softmax classifier, PCA+Softmax, and
SAE+Softmax.
Index Terms— Deep learning, Sparse Autoencoder,
Breast Cancer Histopathology
1. INTRODUCTION
With the recent advent and cost-effectiveness of whole-slide
digital scanners, tissue histopathology slides can now be dig-
itized and stored in digital image form. Digital pathology
makes computerized quantitative analysis of histopathology
imagery possible. In the context of breast cancer, the size,
arrangement, and morphology of nuclei in breast histopathol-
ogy are important biomarkers for predicting patient out-
come [1]. However, the manual detection of BC nuclei in
histopathology is a tedious and time-consuming process that
is unfeasible in the clinical setting. Therefore, it is impor-
tant to develop efficient method for automatically detecting
BC nuclei. Previous approaches to nuclei or cell segmen-
tation include region growth, threshold, clustering, level set
[2], supervised color-texture based method, watershed based
method are not very robust to the highly variable shapes
and sizes of BC nuclei, as well as artifacts in the histological
fixing, staining, and digitization processes. In [1], we present
an semi-automated nuclear detection scheme based on the
Expectation Maximization (EM) algorithm. These previous
This work is supported by National Science Foundation of China (No.
61273259) and Six Major Talents Summit of Jiangsu Province (No. 2013-
XXRJ-019). Email: xujung@gmail.com.
works in segmentation or classification of nuclei are mostly
based on supervised learning. For histopathological images,
it is usually expensive or cost to get enough labeled data for
training or learning. On the other hand, as the rapid develop-
ment of digitalized pathological technology, it is easy to get a
large amount of unlabeled data. Moreover, histopathological
is generally high-resolution data. The performance of current
supervised-based discriminative model would be greatly im-
proved if we can develop a very efficient way to make use of
such large unlabeled and highly-structured data. One solution
to this problem is to learn a good feature representation to
capture a lot of structure from those input unlabeled data.
Then the discriminative model works on such new feature
space for subsequent classification of desired objects.
Recently, significant progress has been made on learning
representation of images from the pixel (or low) level feature
in order to identify high-level feature in images. These high-
level feature are often learned in hierarchical representation
using large amounts of unlabeled data. Deep learning is such
an hierarchical learning approach to learn high-level feature
from raw pixel level intensity which is sufficiently useful for
differentiating different objects by a classifier. Deep learning
has been shown great accomplishments in vision and learn-
ing since the first deep autoencoder network was proposed
by Hinton et al. in [3]. It has been caused great attention
by researchers from both industry and academia. Recently, a
deep max-pooling convolutional neural networks is presented
for detecting mitosis in breast histological images [4]. Sim-
ilar work from this team won the ICPR 2012 mitosis detec-
tion competition. Inspired by these works, in this paper, we
present a Stacked Sparse Autoencoder (SSAE) framework for
nuclei classification on breast histopathology.
2. METHOD
Autoencoder is an unsupervised feature learning algorithm
which aims to develop better feature representation of input
high-dimensional data by finding the correlation among the
data. Basically, an auto-encoder is simply a multi-layer feed-
forward neural network trained to represent the input with
back-propagation. By applying back-propagation, the autoen-
coder tries to decrease the discrepancy as much as possible