"终身混合变分自动编码器：MELBO优化的专家系统"

版权申诉

89 浏览量更新于2024-04-17 收藏 5.86MB PDF 举报

The "Lifelong Mixture of Variational Autoencoders" (LM-VAE) is a novel approach proposed in this paper for end-to-end lifelong learning. It involves a system of experts, each implemented as a Variational Autoencoder (VAE). These experts are trained jointly by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the data. The LM-VAE framework allows for continuous learning and adaptation of the experts over time, enabling the system to incorporate new information and adapt to changing environments without forgetting previously learned knowledge. This is achieved through a combination of generative modeling and variational inference, which allows for efficient representation learning and accurate inference in high-dimensional data spaces. The key advantage of LM-VAE is its ability to leverage the complementary strengths of multiple experts, each specializing in a different aspect of the data. This not only improves the overall performance of the system but also increases its robustness and generalization capabilities. The use of VAEs as experts further enhances the interpretability of the model and provides a framework for unsupervised representation learning. Overall, the LM-VAE framework represents a promising approach to lifelong learning, offering a powerful tool for building flexible and adaptive systems capable of learning from and adapting to a continuous stream of data. By combining the strengths of VAEs and mixtures of experts, this approach opens up new possibilities for applications in a wide range of domains, from computer vision to natural language processing. The potential for scalability and versatility makes LM-VAE a valuable addition to the field of machine learning and artificial intelligence.

parameters a = {a

, . . . , a

}. In the following we describe

the mechanism for selecting appropriate L-MVAE components

during the training.

C. The selection of L-MVAE mixture’s components during

training

Certain research studies [24], [25] have considered equal

contributions for the components of deep learning mixture

systems. However, in this paper we consider that each mixture

component is specialized for a speciﬁc task. The selection of a

speciﬁc mixture component is performed through the mixing

weights w

, i = 1, . . . , K. We assume that the weighting

probability for each mixture’s component is drawn from a

Multinomial distribution, such as the Bernoulli distribution,

deﬁned by a Dirichlet prior.

Assignment vector.

In the following, we introduce an as-

signment vector c, with each of its entries c

∈ {0, 1},

i = 1, . . . , K, representing the probability of including or not

the i-th expert in the mixture. c

is sampled from as Bernoulli

distribution. Before starting the training, we set all entries as

= 0, i = 1, . . . , K. The assignment probability for each

mixing component is calculated considering the sample log-

likelihood of each expert after learning each task, as :

p(c

) = 1 −

exp(−L

V AE

)) + u c

i=1



exp(−L

V AE

)) + u c



(3)

where x

is sampled from the given data batch, drawn from the

database corresponding to the current task learning. c

denotes

the assignment variable for j-th expert and represents the value

resulted when learning the previous task before evaluating

Eq. (3). u c

is used to ensure that p(c

) is outside the range

of possible values for c

= 1, when evaluating Eq. (3), and

therefore we consider u as a large value. Then we ﬁnd the

maximum probability for a mixing component :

p(c

∗

) = max(p(c

), . . . , p(c

)) , (4)

where j

∗

represents the index of the selected VAE component

according to the parameters learnt during the previous tasks.

We then normalize the other assignment variables, except for

∗

p(c

) =

(

1 c

= 1

0 c

= 0

, i = 1, 2, . . . , K, i 6= j

∗

. (5)

Since c

is an assignment corresponding to the learning process

of the previous task, before evaluating Eq. (3), in order to

determine the dropout status of i-th expert during the current

task learning, we use Eq. (5) to recover the dropout status of

all experts except for j

∗

-th expert which is actually dropped

out from the future training because it is going to be used

for recording and reproducing the information associated with

the current task being learnt. When learning the ﬁrst task, all

mixture’s components will be trained and then when learning

the second task, only K − 1 components are trained, while

one component is no longer trained because it is considered

as a depository of the information associated with the ﬁrst

task. This component will consequently be used to generate

information consistent with the probabilistic representation

associated with the ﬁrst task. This process is continued and

for the last task at least one VAE is available for training. The

number of mixing components K considered initially should

be larger or at least equal to the number of tasks assumed to

be learned during the lifelong learning process. In Section VI

we describe a mechanism for expanding the mixture.

The sampling of mixing weights.

Suppose that L-MVAE

ﬁnished learning the t-th task. We collect several batches

of samples {x

, . . . , x

} from the (t + 1)-th task, where

each x

represents the i-th batch of samples, which are used

to evaluate the assignment vector c by using Eq. (3). We

calculate the average probability p(c

) =

i=1

p(c

)/N,

where each p(c

) represents the probability for the assignment

of x

. Then we ﬁnd p(c

j∗

) by using Eq. (4) and we recover

the previous assignments except for c

j∗

by using Eq. (5).

The Dirichlet parameters are calculated in order to ﬁx the

mixture components containing the information corresponding

to the previously learnt tasks while making the other mixture

components available for training with the future tasks. For

the mixing components that have been used for learning the

previous tasks, we consider

(

e c

= 1

1−e∗K

K−K

= 0 , i = 1, . . . , K

(6)

where e is a very small positive value. For i = 1, . . . , K

where K

represents the number of tasks learnt so far out

of a total of K given tasks, during the lifelong learning.

A small value for the Dirichlet parameters implies that the

corresponding mixture components are no longer trained. The

mixing weights w

, . . . , w

are sampled from a Dirichlet

distribution with parameters a

, . . . , a

. We then train the

mixture model with w

, . . . , w

by using Eq. (2) at the (t+1)-

th task learning.

Testing phase.

Suppose that after the lifelong learning pro-

cess, we have trained K components. In the testing phase,

we perform a selection of a single component to be used

for the given data samples. We ﬁrstly calculate the selection

probability {v

, . . . , v

} by calculating the log-likelihood of

the data sample for each component :

exp



−

V AE

(x)



i=1

exp



−

V AE

(x)



, j = 1, . . . , K .

(7)

Then we select a component by sampling the mixing weight

vector w from Categorical distribution Cat(v

, . . . , v

The structure of the proposed L-MVAE model is shown

in Fig. 1. In the next section we evaluate the convergence

properties of L-MVAE model during the lifelong learning.

IV. THEORETICAL ANALYSIS OF L-MVAE

In this section, we evaluate the convergence properties of

the proposed L-MVAE model during the lifelong learning. We

evaluate the evolution of the objective function L

L−MV AE

(x)

剩余14页未读，继续阅读

易小侠

粉丝: 6632
资源: 9万+

"终身混合变分自动编码器：MELBO优化的专家系统"

终身学习社交媒体上的仇恨言论分类_Lifelong Learning of Hate Speech Classification

终身孪生对抗网络_Lifelong Twin Generative Adversarial Networks

通过多策略再平衡实现终身意图检测_Lifelong Intent Detection via Multi-Strategy Re

终身师生网络学习_Lifelong Teacher-Student Network Learning

基于生成回放的终身车辆轨迹预测框架_Lifelong Vehicle Trajectory Prediction Framewo

lifelong_stemcells

stories-of-a-lifelong-student::open_book:我的博客-https

awesome-lifelong-continual-learning:终身持续机器学习领域的论文，博客，数据集和软件列表

End-to-End Lifelong Learning: a Framework to Achieve Plasticities of both the Feature and Classifier Constructions

Lifelong Learning.pdf

最新资源