深度学习驱动的医疗数据挖掘：从结构化到非结构化

需积分: 6 119 浏览量更新于2024-07-09 收藏 32.27MB PDF 举报

“Advances in Mining Heterogeneous Healthcare Data” 是一篇关于在医疗健康领域利用深度学习技术挖掘异构数据的教程。这篇教程由 Fenglong Ma, Muchao Ye, Junyu Luo, Cao Xiao 和 Jimeng Sun 准备，涵盖了电子健康记录（EHR）的不同类型数据、各种应用以及深度学习在结构化和非结构化医疗数据挖掘中的最新进展。电子健康记录（EHR）是患者在医疗机构多次就诊过程中生成的长期健康信息记录。从2008年至2015年，美国医院对电子健康记录系统的采用率显著增加，至2017年，94%的医院使用EHR数据进行临床实践指导的医院流程。EHR数据的应用范围广泛，包括但不限于疾病诊断、风险预测和治疗建议。教程分为两大部分： 1. 挖掘结构化健康数据： - 计算表型（Phenotyping）：通过分析结构化的医疗数据，如病历、实验室结果和药物处方，识别并定义患者的临床表型，有助于研究疾病模式。 - 疾病检测/风险预测：使用深度学习模型来提前识别疾病迹象，提高早期诊断的准确性，同时预测疾病风险，从而实现预防性医疗。 - 治疗建议：根据患者的个人健康状况，利用深度学习算法生成个性化的治疗方案，优化治疗效果。 2. 挖掘非结构化健康数据： - 自动化ICD编码：利用深度学习自动为临床记录分配国际疾病分类（ICD）代码，提高编码效率和准确性。 - 可理解的医学语言翻译：通过深度学习模型将医学术语转化为通俗易懂的语言，增强医患沟通。 - 医学报告生成：基于深度学习的自然语言处理技术，自动生成详细的医学检查报告，减轻医生的工作负担。 - 临床试验挖掘：在大量的医学文献中搜索和匹配适合的临床试验，加速新药或疗法的研发进程。教程面向对将深度学习应用于医疗保健感兴趣的初级和高级学生、工程师和研究人员，对先验知识的要求较低。教程结束时会进行开放式问题讨论和问答环节，鼓励参与者深入探讨和互动。总结来说，这篇教程旨在通过介绍最新的深度学习方法，推动医疗健康数据的高效利用，解决结构化和非结构化数据的挑战，促进医疗保健领域的科技进步。通过学习，参与者可以了解到如何利用深度学习技术来改善疾病诊断、提高医疗服务质量和效率，以及推动医学研究的发展。

Phenotyping

• Goal: Learning medical concept representations from EHR data

• Approach: Predicting the next visit information according to all the

previous visits

Intra-visit Skip-gram

• Model all pairs of medical codes in a visit

• Visit contains codes {c

, c

, … c

}

• c

: i-th code among the code vocabulary C

• p(c

| c

): Skip-gram probability (see below)

• Each code c

, c

, … c

is used as the ”input”

• Learn W

, the code representation

. . .

! "

. . .

! "

. . .

Repeat for n times

• V

: t-th visit

• c

, c

: codes in the visit V

• [:, j]: j-th column of the matrix

text window size, exp the element-wise exponential function,

and 1 denotes an all one vector. We have used MATLAB’s

notation for selecting a row in W

and a coordinate of b

3.3 Learning from the code-level information

As we described in the introduction, healthcare datasets

contain two-level information: visit-level sequence informa-

tion and code-level co-occurrence information. Since the loss

function in E q. (2)caneﬃciently capture the sequence level

information, now we need to ﬁnd a way to use the second

source of information, i.e., the intra-visit co-occurrence of

the codes.

A natural choice to capture the code co-occurrence infor-

mation is to use Skip-gram. The main idea would be that

the representations for the codes that occur in the same visit

should predict each other. To embed Skip-gram in Med2Vec,

we can train W

2 R

m⇥|C|

(which also produces intermedi-

ate visit l evel representations) so that the i-th column of W

will be the representation for the i-th medical code among

total |C| co des. Note that given the unordered nature of

the codes inside a visit, unlike the original Skip-gram, we do

not distinguish between the “input” medical code and the

“output” medical code. In text, it is sensible to assume that

a word can serve a di↵erent role as a center word and a

context word, whereas in EHR datasets, we cannot classify

codes as center or context codes. It is also desirable to learn

the representations of di↵erent types of codes (e.g. di agno-

sis, medication, procedure code) in the same latent space so

that we can capture the hidden relationships between them.

However, precise interpretation of Skip-gram codes will be

diﬃcult as W

will have positive and negative values. For in-

tuitive interpretation, we should learn code representations

with non-negative values. Note that in Eq.(1), if the binary

vector x

is a one-hot vector, then the intermediate visit rep-

resentation u

becomes a code representation. Therefore,

using the Skip-gram algorithm, we train the non-negative

weight ReLU(W

) instead of W

. This will not only use

the intra-visit co-occurrence information, but also guaran-

tee non-negative code representations. Moreover, ReLU pro-

duces sparse code representations, which further facilitates

easier interpretation of the codes.

The code representations to b e learned is denoted as a

matrix W

= ReLU(W

) 2 R

m⇥|C|

. From a sequence of

visits V

,...,V

, the code-level representations can be

learned by maximizing the following log-likelihood,

min

t=1

i:c

j:c

,j6=i

log p(c

), (3)

where p(c

exp

⇣

[:,j]

[:,i]

⌘

|C|

k=1

exp

⇣

[:,k]

[:,i]

⌘

. (4)

3.4 Uniﬁed training

The single uniﬁed framework can be obtained by adding

the two objective functions (3) and (2) as follows,

argmin

W ,b

t=1



i:c

j:c

,j6=i

log p(c

)

w kw,k6=0

x

t+k

log

 (1  x

t+k

)

log(1 

)

By combining the two objective functions we learn both

code representations and visit representations from the same

source of patient visit records, exploiting both intra-visit

co-occurrence information as well as inter-visit sequential

information at the same time.

3.5 Interpretation of learned representations

While the original Skip-gram learns code representations

that have interesting properties such as additivity, in health-

care we need stronger interpretability. We need to be able

to associate clinical meaning to each dimension of both code

and visit representations. Interpreting the learned represen-

tations is based on analyzing each coordinate in both code

and visit embedding spaces.

Interpreting code representations.

If information is properly embedded into a lower dimen-

sional non-negative space, each coordinate of the lower di-

mension can be readily interpreted. Non-negative matrix

factorization (NMF) is a go od example. Since we trained

ReLU(W

) 2 R

m⇥|C|

, a non-negative matrix, to represent

the medical codes, we can employ a simple method to inter-

pret the meaning of each coordinate of the m-dimensional

code embedding space. We can ﬁnd the top k codes that

have the largest values for the i-th coordinate of the code

embedding space as follows,

argsort(W

[i, :])[1 : k]

where argsort returns the indices of a vector that index its

values in a descending order. By studying the returned med-

ical codes, we can view each coordinate as a disease group.

Detailed examples are given in section 5.1

Interpreting visit representations.

To interpret the learned visit vectors, we can use the same

principle we used for interpreting the code representation.

For the i-th coordinate of the n-dimensional visit embed-

ding space, we can ﬁnd the top k coordinates of the code

embedding space that have the strongest values as follows,

argsort(W

[i, :])[1 : k]

where we use the same argsort as before. Once we ob-

tain a set of code coordinates, we can use the knowledge

learned from interpreting the code representations to under-

stand how each visit coordinate is associated with a group

of diseases. This simple interpretation is possible because

the intermediate visit representation u

is a non-negative

vector, due to the ReLU activation function.

In the experiments, we also tried to ﬁnd the input vector

that most activates the target visit coordinate [14, 21].

However, the results were very sensitive to the initial value of

, and even averaging over multiple samples were producing

unreliable results.

3.6 Complexity analysis

We ﬁrst analyze the computational complexity of the code-

level objective function Eq. (3). Without loss of generality,

we assume the visit records of all patients are concatenated

into a single sequence of visits. Then the complexity for Eq.

(3) is as follows,

O(T M

|C|m)

text window size, exp the el ement-wise exponential function,

and 1 denotes an all one vector. We have used MATLAB’s

notation for selecting a row in W

and a coordinate of b

3.3 Learning from the code-level information

As we described in the introduction, healthcare datasets

contain two-level information: visit-level sequence informa-

tion and code-level co-occurrence information. Since the loss

function in Eq. (2)caneﬃciently capture the sequence level

information, now we need to ﬁnd a way to use t he second

source of information, i.e., the intra-visit co-occurrence of

the codes.

A natural choice to capture the code co-occurrence infor-

mation is to use Skip-gram. The main idea would be that

the representations for the codes that occur in the same visit

should predict each other. To embed Skip-gram in Med2Vec,

we can train W

2 R

m⇥|C|

(which also produces intermedi-

ate visit level representations) so that the i-th column of W

will be the representation for the i-th medical code among

total |C| codes. Note that given the unordered nature of

the codes inside a visit, unlike the original Skip-gram, we do

not distinguish between the “input” medical code and the

“output” medical c ode. In text, it is sensible to assume that

a word can serve a di↵erent role as a center word and a

context word, whereas in EHR datasets, we cannot classify

codes as center or context codes. It is also desirable to learn

the representations of di↵erent types of codes (e.g. diagno-

sis, medication, procedure code) in the same latent space so

that we can capture the hidden relationships between them.

However, precise interpretation of Skip-gram codes will be

diﬃcult as W

will have positive and negative values. For in-

tuitive interpretation, we should learn code representations

with non-negative values. Note that in Eq.(1), if the binary

vector x

is a one-hot vector, then the intermediate visit rep-

resentation u

becomes a code representation. Therefore,

using the Skip-gram algorithm, we train the non-negative

weight ReLU(W

) instead of W

. This will not only use

the intra-visit co-occurrence information, but also guaran-

tee non-negative code representations. Moreover, ReLU pro-

duces sparse code representations, which further facilitates

easier interpretation of the codes.

The code representations to be learned is denoted as a

matrix W

= ReLU(W

) 2 R

m⇥|C|

. From a sequence of

visits V

,...,V

, the code-level representations can be

learned by maximizing the following log-likelihood,

min

t=1

i:c

j:c

,j6=i

log p(c

), (3)

where p(c

exp

⇣

[:,j]

[:,i]

⌘

|C|

k=1

exp

⇣

[:,k]

[:,i]

⌘

. (4)

3.4 Uniﬁed training

The single uniﬁed framework can be obtained by adding

the two objective functions (3) and (2) as follows,

argmin

W ,b

t=1



i:c

j:c

,j6=i

log p(c

)

w  kw,k6=0

x

t+k

log

 (1  x

t+k

)

log(1 

)

By combining the two objective functions we learn both

code representations and visit representations from the same

source of patient visit records, exploiting both intra-visit

co-occurrence information as well as inter-visit sequential

information at the same time.

3.5 Interpretation of learned representations

While the original Skip-gram learns code representations

that have interesting properties such as additivity, in health-

care we need stronger interpretability. We need to be able

to associate clinical meaning to each dimension of both code

and visit representations. Interpreting the learned represen-

tations is based on analyzing each coordinate in both code

and visit embedding spaces.

Interpreting code representations.

If information is properly embedded into a lower dimen-

sional non-negative space, each coordinate of the lower di-

mension can be readily interpreted. Non-negative matrix

factorization (NMF) is a good example. Since we trained

ReLU(W

) 2 R

m⇥|C|

, a non-negative matrix, to represent

the medical codes, we can employ a simple method to inter-

pret the meaning of each coordinate of the m-dimensional

code embedding space. We can ﬁnd the t op k co des that

have the largest values for the i-th coordinate of the code

embedding space as follows,

argsort(W

[i, :])[1 : k]

where argsort returns the indices of a vector that index its

values in a descending order. By studying the returned med-

ical codes, we can view each coordinate as a disease group.

Detailed examples are given in section 5.1

Interpreting visit representations.

To interpret the learned visit vectors, we can use the same

principle we used for interpreting the code representation.

For the i-th coordinate of the n-dimensional v isit embed-

ding space, we can ﬁnd the top k coordinates of the code

embedding space that have t he strongest values as follows,

argsort(W

[i, :])[1 : k]

where we use the same argsort as before. Once we ob-

tain a set of code coordinates, we can use the knowledge

learned from interpreting the code representations to under-

stand how each visit coordinate is associated with a group

of diseases. This simple interpretation is possible because

the intermediate visit representation u

is a non-negative

vector, due to the ReLU activation function.

In the experiments, we also tried to ﬁnd the input vector

that most activates the target visit coordinate [14, 21].

However, the results were very sensitive to the initial value of

, and even averaging over multiple samples were producing

unreliable results.

3.6 Complexity analysis

We ﬁrst analyze the computational complexity of the code-

level objective function Eq. (3). Without loss of generality,

we as sume the visit records of all patients are concatenated

into a single sequence of visits. Then the c omplexity for Eq.

(3) is as follows,

O(T M

|C|m)

text window size, exp the element-wise exponential function,

and 1 denotes an all one vector. We have used MATLAB’s

notation for selecting a row in W

and a coordinate of b

3.3 Learning from the code-level information

As we described in the introduction, healthcare datasets

contain two-level information: visit-level sequence informa-

tion and code-level co-occurrence information. Since the loss

function in Eq. (2)caneﬃciently capture the sequence level

information, now we need to ﬁnd a way to use the second

source of information, i.e., the intra-visit co-occurrence of

the codes.

A natural choice to capture the code co-occurrence infor-

mation is to use Skip-gram. The main idea would be that

the representations for the codes that occur in the same visit

should predict each other. To emb ed Skip-gram in Med2Vec,

we can train W

2 R

m⇥|C|

(which also produces intermedi-

ate visit level representations) so that the i-th column of W

will be the representation for the i-th m edical code among

total |C| codes. Note that given the unordered nature of

the codes inside a visit, unlike the original Skip-gram, we do

not distinguish between the “input” medical code and the

“output” medical code. In text, it is sensible to assume that

a word can serve a di↵erent role as a center word and a

context word, whereas in EHR datasets, we cannot classify

codes as center or context codes. It is also desirable to learn

the representations of di↵erent types of codes (e.g. diagno-

sis, medication, procedure code) in the same latent space so

that we can capture the hidden relationships between them.

However, precise interpretation of Skip-gram codes will be

diﬃcult as W

will have positive and negative values. For in-

tuitive interpretation, we should learn code representations

with non-negative values. Note that in Eq.(1), if the binary

vector x

is a one-hot vector, then the intermediate visit rep-

resentation u

becomes a code representation. Therefore,

using the Skip-gram algorithm, we train the non-negative

weight ReLU(W

) instead of W

. This will not only use

the intra-visit co-occurrence information, but also guaran-

tee non-negative code representations. Moreover, ReLU pro-

duces sparse code representations, which further facilitates

easier interpretation of the codes.

The code representations to be learned is denoted as a

matrix W

= ReLU(W

) 2 R

m⇥|C|

. From a sequence of

visits V

,...,V

, the code-level representations can be

learned by maximizing the following log-likelihood,

min

t=1

i:c

j:c

,j6=i

log p(c

), (3)

where p(c

exp

⇣

[:,j]

[:,i]

⌘

|C|

k=1

exp

⇣

[:,k]

[:,i]

⌘

. (4)

3.4 Uniﬁed training

The single uniﬁed framework can be obtained by adding

the two objective functions (3) and (2) as follows,

argmin

W ,b

t=1



i:c

j:c

,j6=i

log p(c

)

w kw,k6=0

x

t+k

log

 (1  x

t+k

)

log(1 

)

By combining the two objective functions we learn both

code representations and visit representations from the same

source of patient visit records, exploiting both intra-visit

co-occurrence information as well as inter-visit sequential

information at the same time.

3.5 Interpretation of learned representations

While the original Skip-gram learns code representations

that have interesting properties such as additivity, in health-

care we need stronger interpretability. We need to be able

to associate clinical meaning to each dimension of both code

and visit representations. Interpreting the learned represen-

tations is based on analyzing each coordinate in both code

and visit embedding spaces.

Interpreting code representations.

If information is properly embedded into a lower dimen-

sional non-negative space, each coordinate of the lower di-

mension can be readily interpreted. Non-negative matrix

factorization (NMF) is a good example. Since we trained

ReLU(W

) 2 R

m⇥|C|

, a non-negative matrix, to represent

the medical codes, we can employ a simple method to inter-

pret the meaning of each coordinate of the m-dimensional

code embedding space. We can ﬁnd the top k codes that

have the largest values for the i-th coordinate of the code

embedding space as follows,

argsort(W

[i, :])[1 : k]

where argsort returns the indices of a vector that index its

values in a descending order. By studying the returned med-

ical codes, we can view each coordinate as a disease group.

Detailed examples are given in section 5.1

Interpreting visit representations.

To interpret the learned visit vectors, we can use the same

principle we used for interpreting the code representation.

For the i-th coordinate of the n-dimensional visit embed-

ding space, we can ﬁnd the top k coordinates of the code

embedding space that have the strongest values as follows,

argsort(W

[i, :])[1 : k]

where we use the same argsort as before. Once we ob-

tain a set of code coordinates, we can use the knowledge

learned from interpreting the code representations to under-

stand how each visit coordinate is associated with a group

of diseases. This simple interpretation is possible because

the intermediate visit representation u

is a non-negative

vector, due to the ReLU activation function.

In the experiments, we also tried to ﬁnd the input vector

that most activates the target visit coordinate [14, 21].

However, the results were very sensitive to t he initial value of

, and even averaging over multiple samples were producing

unreliable results.

3.6 Complexity analysis

We ﬁrst analyze the computational complexity of the code-

level objective function Eq. (3). Without loss of generality,

we assume the visit records of all patients are concatenated

into a single sequence of visits. Then the complexity for Eq.

(3) is as follows,

O(T M

|C|m)

text window size, exp the element-wise exponential function,

and 1 denotes an all one vector. We have used MATLAB’s

notation for selecting a row in W

and a coordinate of b

3.3 Learning from the code-level information

As we described in the introduction, healthcare datasets

contain two-level information: visit-level sequence informa-

tion and code-level co- occurrence information. Since the loss

function in E q. (2)caneﬃciently capture the sequence level

information, now we need to ﬁnd a way to use the second

source of information, i.e., the intra-visit co-occurrence of

the codes.

A natural choice to capture the code co-occurrence infor-

mation is to use Skip-gram. The main idea would be that

the representations for the codes that occur in the same visit

should predict each other. To embed Skip-gram in Med2Vec,

we can train W

2 R

m⇥|C|

(which also produces intermedi-

ate visit le vel representations) so that the i-th column of W

will be the representation for the i-th medical code among

total |C| codes . Note that given the unordered nature of

the codes inside a visit, unlike the original Skip-gram, we do

not distinguish betwe en the “input” m edical code and the

“output” medical code. In text, it is sensible to assume that

a word can serve a di↵erent role as a center word and a

context word, whereas in EHR datasets, we cannot classify

codes as center or context codes. It is also desirable to learn

the representations of di↵erent types of codes (e.g. diagno-

sis, medication, procedure code) in t he same latent space so

that we can capture the hidden relationships between them.

However, precise interpretation of Skip-gram codes will be

diﬃcult as W

will have positive and negative values. For in-

tuitive interpretation, we should learn code representations

with non-negative values. Note that in Eq.(1), if the binary

vector x

is a one-hot vector, then the intermediate visit rep-

resentation u

becomes a code representation. Therefore,

using the Skip-gram algorithm, we train the non-negative

weight ReLU(W

) instead of W

. This will not only use

the intra-visit co-occurrence information, but also guaran-

tee non-negative code representations. Moreover, ReLU pro-

duces sparse code representations, which further facilitates

easier interpretation of t he codes.

The code representations to b e learned is denoted as a

matrix W

= ReLU(W

) 2 R

m⇥|C|

. From a sequence of

visits V

,...,V

, the code-level representations can be

learned by maximizing the following log-likelihood,

min

t=1

i:c

j:c

,j6=i

log p(c

), (3)

where p(c

exp

⇣

[:,j]

[:,i]

⌘

|C|

k=1

exp

⇣

[:,k]

[:,i]

⌘

. (4)

3.4 Uniﬁed training

The single uniﬁed framework can be obtained by adding

the two objective functions (3) and (2) as follows,

argmin

W ,b

t=1



i:c

j:c

,j6=i

log p(c

)

w kw,k6=0

x

t+k

log

 (1  x

t+k

)

log(1 

)

By combining the two objective functions we learn both

code representations and visit representations from the same

source of patient visit records, exploiting both intra-visit

co-occurrence information as well as inter-visit sequential

information at the same time.

3.5 Interpretation of learned representations

While the original Skip-gram learns code representations

that have interesting properties such as additivity, in health-

care we need stronger interpretability. We need to be able

to associate clinical meaning to each dimension of both code

and visit representations. Interpreting the learned represen-

tations is based on analyzing each coordinate in both code

and visit embedding spaces.

Interpreting code representations.

If information is properly embedded into a lower dimen-

sional non-negative space, each coordinate of the lower di-

mension can be readily interpreted. Non-negative matrix

factorization (NMF) is a good example. Since we trained

ReLU(W

) 2 R

m⇥|C|

, a non-negative matrix, to represent

the medical codes, we can employ a simple method to inter-

pret the meaning of each coordinate of the m-dimensional

code embedding space. We can ﬁnd the top k codes that

have the largest values for the i-th coordinate of the code

embedding space as follows,

argsort(W

[i, :])[1 : k]

where argsort returns the indices of a vector that index its

values in a descending order. By studying the returned med-

ical codes, we can view each coordinate as a disease group.

Detailed examples are given in section 5.1

Interpreting visit representations.

To interpret the learned visit vectors, we c an use the same

principle we used for interpreting the code representation.

For the i-th coordinate of the n-dimensional visit embed-

ding space, we can ﬁnd the t op k coordinates of the code

embedding space that have the strongest values as follows,

argsort(W

[i, :])[1 : k]

where we use the same argsort as before. Once we ob-

tain a set of code coordinates, we can use the knowledge

learned from interpreting the code representations to under-

stand how each visit coordinate is associated with a group

of diseases. This simple interpretation is possible because

the intermediate visit representation u

is a non-negative

vector, due to the ReLU activation function.

In the experiments, we also tried to ﬁnd the input vector

that most activates the target v isit coordinate [14, 21].

However, the results were very sensitive to the initial value of

, and even averaging over multiple samples were producing

unreliable results.

3.6 Complexity analysis

We ﬁrst analyze the computational complexity of the code-

level objective function Eq. (3) . Without loss of generality,

we assume the visit records of all patients are concatenated

into a single sequence of visits. Then the complexity for Eq.

(3) is as follows,

O(T M

|C|m)

where

the representations for the codes that occur in the same v isit

should predict each other. To embed Skip-gram in Med2Vec,

we can train W

2 R

m⇥|C|

(which also produc es intermedi-

ate visit level representations) so that the i-th column of W

will be the representation for the i-th medical code among

total |C| codes. Note that given the unordered nature of

the codes inside a visit, unlike the original Skip-gram, we do

not distinguish between the “input” medical code and the

“output” medical code. In text, it is sensible to assume that

a word can serve a di↵erent role as a center word and a

context word, whereas in EHR datasets, we cannot classify

codes as center or context codes. It is also desirable to learn

the representat ions of di↵erent types of codes (e.g. diagno-

sis, medication, procedure code) i n the same latent space so

that we can capture the hidden relationships between them.

However, coordinate-wise interpretation of Skip-gram codes

is not straightforward because the positive and negative val-

ues of W

make it hard for each coordinate to focus on

a single coherent medical concept. For i ntuitive interpreta-

tion, we should learn code representations with non-negative

values. Note that i n Eq.(1), if the binary vector x

is a one-

hot vector, then the intermediate visit representation u

be-

comes a code representation. Therefore, using the Skip-gram

algorithm, we train the non-negative weight ReLU(W

) in-

stead of W

. This will not onl y use the intra-visit co-

occurrence i nformation, but also guarantee non-negative code

representations. Moreover, ReLU produces sparse code rep-

resentations, which further facilitates easier interpretation

of the codes.

The code representa tions to be learned is denoted as a

matrix W

= ReLU(W

) 2 R

m⇥|C|

. From a sequence of

visits V

,...,V

, the code-level representations can be

learned by maximizing the following log-likelihood,

max

t=1

i:c

j:c

,j6=i

log p(c

), (3)

where p(c

exp

⇣

[:,j]

[:,i]

⌘

|C|

k=1

exp

⇣

[:,k]

[:,i]

⌘

. (4)

3.4 Uniﬁed training

The single uniﬁed framework can be obtained by adding

the two objective functions (3) and (2) as follows,

argmin

c,v,s

t=1



i:c

j:c

,j6=i

log p(c

)

wkw,k6=0

x

t+k

log

 (1  x

t+k

)

log( 1 

)

By combining the two objective functions we learn both

code represent ations and visit representations from the same

source of patient visit records, exploiting both intra-visit

co-occurrence information as well as inter-visit sequential

information at the same time.

3.5 Interpretation of learned representations

While the original Skip-gram learns code representations

that have interesting properties s uch as additivity, in health-

care we need stronger interpretability. We need to be able

to associate clinical meaning to each dimension of both code

and visit representations. Interpreting the learned represen-

tations is based on analyzing each coordinate in both code

and visit embedding spaces.

Interpreting code representations.

If information is properly embedded into a lower dimen-

sional non-negative space, each coordinate of the lower di-

mension can be readily interpreted. Non-negative matrix

factorization (NMF) is a good example. Since we trained

ReLU(W

) 2 R

m⇥|C|

, a non-negative matrix, to represent

the medical codes, we can em ploy a simple method to inter-

pret the meaning of each coordi nate of the m-dimensional

code embedding space. We can ﬁnd the top k codes that

have the largest values for the i-th coo rdinate of the code

embedding space as follows,

argsort(W

[i, :])[1 : k]

where argsort returns the indices of a vector that index its

values in a des cending order. By studying the returned med-

ical codes, we can view each coordinate as a disease group.

Detailed examples are given in section 5.1

Interpreting visit representations.

To interpret the learned visit vectors, we can use the same

principle we used for interpreting the code representation.

For the i-th coordi nate of the n-dimensional visit embed-

ding space, we can ﬁnd the top k coordinates of the code

embedding space that have the s trongest values as follows,

argsort(W

[i, :])[1 : k]

where we use the same argsort as before. Once we ob-

tain a set of code coordinates, we can use the knowledge

learned from interpreting the code representations to under-

stand how each visit coordinate is associated with a group

of diseases. This simple interpretation is possible because

the intermediate visit representation u

is a non-negative

vector, due to the ReLU activation function.

In the experiments, we also tried to ﬁnd the input vector

that most activates the target visit coordinate [14, 21].

However, the results were very sensitive to the initial value of

, and even averaging over multiple samples were producing

unreliable results.

3.6 Complexity analysis

We ﬁrst analyze the computational complexity of the code-

level objective function Eq. (3). Without loss of generality,

we assume the visit records of all patient s are concatenated

into a single sequence of visits. Then the complexity for Eq.

(3) is as follows,

O(T M

|C| m)

where T is t he number of visits, M

is the average of squared

number of medical codes within a visit, |C| the number of

unique medical codes, m the size of the code representation.

The M

factor comes from iterating over all possible pairs

of codes within a visit. The complexity of the visit-level

objective function Eq.(2) is as follows,

O(Tw(|C|(m + n)+mn))

where w is the size of the context window, n the size of the

visit representation. The added terms come from generating

a visit representation via MLP. Since size of code represen-

tation m and size of visit representation n gene rally have the

v Choi et al.

Med2Vec: Multi-layer Representation Learning for Medical Concepts

. KDD

2016.

剩余168页未读，继续阅读

努力+努力=幸运

粉丝: 2
资源: 136

深度学习驱动的医疗数据挖掘：从结构化到非结构化

数据挖掘在医学上的应用.pdf

教你如何开始学术研究.pdf

Advances in Cryptology - EUROCRYPT 2003 .pdf

Recent Advances in Nonnegative Matrix Factorization.pdf

IGI Global Press： Advances in Enterprise Information Technology Security.pdf

思维导图_综述-Recent Advances in Neural Question Generation_.pdf

advances-in-data-analysis.pdf

Latest Advances of LLC Converters in High.pdf

Springer.Grouping.Multidimensional.Data.Recent.Advances.in.Clustering.Feb.2006.eBook-DDU

Real.Time.UML.Advances.in.The.UML.for.Real.Time.Systems.Third.Edition.pdf

最新资源