Fine-grained Patient Similarity Measuring
using Deep Metric Learning
Jiazhi Ni
2,3
, Jie Liu
1,2,
*, Chenxin Zhang
2,3
, Dan Ye
2
, Zhirou Ma
2
1
State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences
2
Institute of Software, Chinese Academy of Sciences, Beijing, China
3
University of Chinese Academy of Sciences, Beijing, China
{ nijiazhi14, ljie, zhangchenxin17, yedan, mazhirou}@otcaix.iscas.ac.cn
ABSTRACT
Patient similarity measuring1 plays a significant role in many
healthcare applications, such as cohort study and treatment
comparative effectiveness research. Existing methods mainly
rely on supervised metric learning method to study patient
similarity from Electronic Health Records (EHRs), facing the
challenge of differentiating patients with a large number of fine-
grained disease categories. Deep metric learning has gained
noticeable success in fine-grained image categorization problem,
however, it cannot be directly applied to classification of patients
with hierarchical disease labels. In this paper, we present a novel
three layer patient similarity deep metric learning framework
(PSDML) by optimizing quadruple loss improved from triplet
loss, to learn an embedding distance for disease classification
among the patients. The context semantic relation of multi
diagnosis labels encoding by ICD-10 is taken into account to
compute the supervised distance of patients. To solve the
diagnosis class imbalance, patient tuples that violate deep metric
learning framework loss constraints are chosen prior as samples
to accelerate the convergence of the neural network. We
conducted KNN multi label classification experiment using the
learned similarity metric on the real EHRs about stroke disease
collected by Chinese Stroke Data Center. The results
demonstrate substantial improvement over the baselines.
KEYWORDS
Patient Similarity, Distance Metric Learning, Deep Metric
Learning, Multi Label Classification
*Corresponding Author.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned
by others than ACM must be honored. Abstracting with credit is permitted. To
copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from
Permissions@acm.org.
CIKM'17 , November 6–10, 2017, Singapore
© 2017 Association for Computing Machinery.
ACM ISBN 978-1-4503-4918-5/17/11…$15.00
https://doi.org/10.1145/3132847.3133022
1 INTRODUCTION
Patient similarity measuring is a fundamental and important
task in clinical decision support applications through the
Electronic Health Records (EHRs) of outpatient, inpatient and
medical research. The goal is to derive clinically meaningful
distance metric to measure the similarity between patient pairs
represented by their key clinical indicators. Fine-grained disease
classification heavily relies on the underlying patient similarity
distance metric to correctly measuring relations of input EHRs.
Consequently, we transform the deep metric learning method of
image and speech recognition to suitable patient similarity
metric measurements.
Deep metric learning has developed much popularity recently
with remarkable success in image and speech recognition.
Compared to standard distance metric learning, it learns a
nonlinear embedding representation of the data using deep
neural networks, and it has shown a significant accuracy
improvement by learning deep representation using contrastive
loss or triplet loss in applications such as face recognition and
image retrieval. However, in the medical field, the existing
frameworks of deep metric learning based on contrastive loss or
triplet loss cannot adequately describe the patient similarity.
Employing only one negative and one positive sample ignores
interaction between other classes in each update partially raises
the problem. Because the situation of patients with multi
diagnosis labels and the diagnosis label (ICD-10 encoding) has
the context semantic relation, traditional distance metric
learning method in medical field like Locally Supervised Metric
Learning (LSML) algorithm [1] or Mahalanobis Distance cannot
work effectively in the real medical situation. Table 1
summarizes the typical information contained in our EHRs,
including medical image conclusion and multi diagnosis label
(some other events and time factors are omitted due to space
limitation), where the abbreviations are explained in Table 2. In
this work, we adopt the advanced deep metric learning method
of image field to address the following questions:
How to get supervised information by encoding multi
diagnose labels of one patient?
How to solve the diagnosis class imbalance problem of the
EHRs and ensure a fast deep neural network convergence?
How to construct the deep metric learning framework with
a proper loss function definition for fine-grained disease
classification?
Session 7A: Health Analytics 1
CIKM’17, November 6-10, 2017, Singapore