提升手写汉字识别精度：局部判别训练与全局优化的CNN方法

137 浏览量更新于2024-08-28 收藏 940KB PDF 举报

本文探讨了基于卷积神经网络（Convolutional Neural Network, CNN）的手写汉字识别中的局部判别训练与全局优化策略。研究团队由Xiangsheng Zeng、Donglai Xiang、Liangrui Peng、Changsong Liu和Xiaoqing Ding组成，他们来自清华大学信息技术国家实验室以及电子工程系，地址位于中国北京，邮编100084。在局部判别训练部分，作者提出了一种新颖的方法，即结合三元组损失（Triplet Loss）和交叉熵损失（Cross-Entropy Loss）作为CNN模型的损失函数。传统的softmax通常用于分类任务，但在此工作中，通过在CNN的最后一层之前添加一个全连接层，引入了三元组损失，旨在增强模型对样本间差异的敏感度，提高分类的区分度。这种方法有助于网络更好地学习到汉字字符之间的局部特征，从而提升识别精度。而全局优化方面，研究人员利用条件随机场（Conditional Random Field, CRF）技术。CRF是一种结构化预测模型，能够考虑特征向量之间的全局依赖关系，尤其是在已经经过三元组损失训练的CNN特征空间中。CRF的运用可以进一步优化识别过程，整合上下文信息，提升整体识别性能。实验部分，研究团队在实际的手写汉字数据集上测试了不同的CNN模型，验证了上述方法的有效性和优越性。通过对这些模型的比较和分析，他们揭示了局部判别训练与全局优化策略如何协同工作，从而显著提高基于CNN的手写汉字识别系统的准确性和鲁棒性。这篇论文深入研究了如何通过创新的训练策略和优化技术，提升基于CNN的汉字识别系统性能，对于深度学习在手写字符识别领域的实际应用具有重要意义。读者可以从中学到如何结合不同类型的损失函数、结构化模型和深度学习框架来优化识别任务，从而推动该领域的发展。

Local Discriminant Training and Global Optimization for Convolutional Neural

Network based Handwritten Chinese Character Recognition

Xiangsheng Zeng, Donglai Xiang, Liangrui Peng, Changsong Liu, Xiaoqing Ding

Tsinghua National Laboratory for Information Science and Technology

Department of Electronic Engineering, Tsinghua University, Beijing, China, 100084

Email: {cengxs13, xdl13}@tsinghua.org.cn; {penglr, lcs, dingxq}@tsinghua.edu.cn

Abstract—This paper investigates local discriminant training

and global optimization methods for Convolutional Neural

Network (CNN) to improve its discriminant ability and recog-

nition accuracy. For local discriminant training, we propose to

combine triplet loss and softmax with cross-entropy loss as the

loss function. The triplet loss is incorporated into an additional

fully-connected layer before the ﬁnal fully-connected layer of

a CNN model. For global optimization, we use Conditional

Random Field (CRF) to further utilize the pairwise distance of

the CNN feature vectors trained with triplet loss. Experiments

with different CNN models on handwritten Chinese character

samples show that the combined local discriminant training

and global optimization scheme achieves better character

recognition accuracy and conﬁdence analysis performance.

Keywords-convolutional neural network; handwritten Chi-

nese character recognition; triplet loss; conditional random

ﬁeld;

I. INTRODUCTION

Convolutional Neural Network (CNN) has provided an

end-to-end solution for character recognition, image classi-

ﬁcation and other machine learning tasks. However, Nguyen

et al. [1] demonstrate that CNNs are easily fooled in

that they classify many unrecognizable images with near-

certainty as members of a recognizable class. In a practical

Optical Character Recognition (OCR) system, the input of

a CNN model for segmentation based character recognition

usually include outliers such as over-segmented characters

and touched characters. Reliable conﬁdence analysis will

provide better rejection for these outliers. This motivates us

to improve the recognition accuracy and conﬁdence analysis

performance of CNN by using discriminant training or

optimization.

For discriminant training of CNN, it is straight forward to

incorporate discriminative feature [2] [3]. Fukuda et al. [2]

propose to use projection matrices composed of eigenvectors

estimated by Linear Discriminant Analysis (LDA) objective

function as initial weights for the ﬁrst convolutional layer in

CNN. Chen et al. [3] present a novel and effective method

to learn a rotation-invariant and Fisher discriminative CNN

(RIFD-CNN) model. This is achieved by introducing and

learning a rotation-invariant layer and a Fisher discrimina-

tive layer on the basis of the existing high-capacity CNN

architectures. The Fisher discriminative layer is trained by

imposing the Fisher discrimination criterion on the CNN

features so that they have small within-class variation and

large between-class variation.

As a local discriminant training strategy, triplet loss is ﬁrst

proposed for face veriﬁcation to enforce a margin between

each pair of faces from one person to all other faces [4].

The formulations of triplet units, loss functions and sample

mining methods have received a lot of attention. Song et

al. [5] propose to learn semantic CNN feature embedding

where similar samples are mapped close to each other and

samples from different classes are mapped apart. Zhang et

al. [6] optimize the max-margin loss on triplet samples to

learn deep hashing function for image retrieval. Wang et al.

[7] propose an efﬁcient triplet sampling algorithm to learn

ﬁne-grained image similarity. Wang et al. [8] propose a hard

negative mining method for triplet sampling. Simo-Serra et

al. [9] present an aggressive mining strategy biased towards

patches that are hard to classify. Shi et al. [10] propose a

novel moderate positive sample mining method to deal with

the problem of large within-class variation.

For global optimization of CNN, Jaderberg et al. [11]

incorporate Conditional Random Field (CRF) to ﬁnd the

character sequence that maximizes the CRF score, enforcing

the consistency of the individual predictions. Isola et al. [12]

also propose to solve the image collection parsing problem

by using CRF, similar to what has been proposed for solving

the pixel parsing problem from the global optimization point

of view.

Inspired by the above works, we propose the local dis-

criminant training and global optimization strategies for

CNN. The local discriminant training is realized by in-

corporating triplet loss to the loss function in the training

process, and the global optimization is fulﬁlled by deploying

CRF. We conduct experiments on the ICDAR 2013 of-

ﬂine handwritten Chinese character recognition competition

dataset. Although the state-of-the-art performance on the

ICDAR 2013 competition dataset has been achieved by

Zhang et al. [13] with the combination method of the

traditional feature extraction and the writer adapted deep

convolutional neural network, our focus is to compare the

recognition accuracy and conﬁdence analysis performance

of the local discriminant training and global optimization

2017 14th IAPR International Conference on Document Analysis and Recognition

DOI 10.1109/ICDAR.2017.70

383

下载后可阅读完整内容，剩余5页未读，立即下载

weixin_38741891

粉丝: 6

提升手写汉字识别精度：局部判别训练与全局优化的CNN方法

最新资源