Semi-supervised Transfer Learning for Convolutional Neural Network based
Chinese Character Recognition
Yejun Tang, Bing Wu, Liangrui Peng and Changsong Liu
Tsinghua National Laboratory for Information Science and Technology
Department of Electronic Engineering, Tsinghua University, Beijing, China
Email:{tangyj,plr,lcs}@ocrserv.ee.tsinghua.edu.cn, bingwuthu@gmail.com
Abstract—Although transfer learning has aroused re-
searchers’ great interest, how to utilize the unlabeled data is
still an open and important problem in this field. We propose
a novel semi-supervised transfer learning (STL) method by in-
corporating Multi-Kernel Maximum Mean Discrepancy (MK-
MMD) loss into the traditional fine-tuned Convolutional Neural
Network (CNN) transfer learning framework for Chinese
character recognition. The proposed method includes three
steps. First, a CNN model is trained by massive labeled samples
in the source domain. Then the CNN model is fine-tuned
by a few labeled samples in the target domain. Finally, the
CNN model is trained by both a large number of unlabeled
samples and the limited labeled samples in the target domain to
minimize the MK-MMD loss. Experiments investigate detailed
configurations and parameters of the proposed STL method
with several frequently used CNN structures including AlexNet,
GoogLeNet, and ResNet. Experimental results on practical
Chinese character transfer learning tasks, such as Dunhuang
historical Chinese character recognition, indicate that the pro-
posed method can significantly improve recognition accuracy
in the target domain.
I. INTRODUCTION
With the emergence of deep learning, Optical Character
Recognition (OCR) has achieved great progress in recent
years. However, deep learning framework is faced with two
challenges. First, training a deep neural network requires
massive labeled samples while labeled samples are hard
to obtain in some tasks. Second, Many machine learning
methods work well only under the assumption: the training
data and testing data are of exactly the same distribution
[1], while they are often slightly different in many scenarios.
Therefore, their performance in real-world scenarios is likely
to be unsatisfactory. The domain of training samples are
often denoted as the source domain and the domain of
testing samples are denoted as the target domain. In such
cases, transfer learning would be necessary to transfer the
classification knowledge from the source domain into the
target domain.
Transfer learning can be classified into three categories
according to their training data, supervised transfer learning
whose training samples are labeled; unsupervised trans-
fer learning whose training samples are unlabeled; semi-
supervised transfer learning whose training data comprises
mostly unlabeled samples and a few labeled samples. Su-
pervised transfer learning is currently the most straightfor-
ward and commonly used way in either traditional feature
extractor and classifier framework or deep neural network
framework. In traditional feature extractor and classifier
framework, Zhang et al. [2] proposed a linear style transfer
mapping method, Li et al. [3] adopted the method in
historical Chinese character recognition and Feng et al. [4]
proposed a nonlinear transfer mapping method based on
Gaussian Process. Both of them utilized the samples in the
source domain and the labeled samples in the target domain
to train a parameter transfer mapping unit. In deep neural
network framework, Oquab et al. [5] added an adaptation
layer on AlexNet and transfer the weights by fine-tuning the
network with the labeled samples in the target domain on im-
age recognition task. Zhang et al. [6] added an unsupervised
adaptation layer on their network to adapt the variance of
writing styles in handwriting Chinese character recognition
tasks. Tang et al. [7] applied the parameter fine-tuning based
transfer learning method to Dunhuang historical Chinese
recognition task. The method proved to be effective but the
recognition accuracy extremely relies on the amount of fine-
tuning samples.
Due to the limitation of supervised transfer learning, re-
searchers show increasing interests in unsupervised transfer
learning recently. Domain adaptation is one of the major
ways for unsupervised learning. It aims at finding the
representation which minimizes the discrepancy between
probability distributions of the source domain and the target
domain. The key problem in this process is how to compare
probabilities and define their discrepancy. Various similarity
measures have been used such as: Kullback-Leibler di-
vergence, the total variation distance [8],the Kolmogorov
distance [9], the Wasserstein distance [10], etc. Gretton et
al. [11] found out that a kernel embedding of probability
distributions into reproducing kernel Hilbert spaces (RKHS)
allows the comparison of two probability measures based
on the distance between their respective embeddings, and
proposed the method of maximum mean discrepancy (M-
MD), which yields a consistent estimate with low compu-
tational cost. Long et al. [12] proposed a novel network
structure in which MK-MMD loss is adopted to minimize
2017 14th IAPR International Conference on Document Analysis and Recognition
2379-2140/17 $31.00 © 2017 IEEE
DOI 10.1109/ICDAR.2017.79
441