Thrun and Mitchell [55] look at solving Boolean classification tasks in a
lifelong-learning framework, where an agent encounters a collection of related
problems over its lifetime. They learn each new task with a neural network, but
they enhance the standard gradient-descent algorithm with slope information
acquired from previous tasks. This speeds up the search for network parameters
in a target task and biases it towards the parameters for previous tasks.
Mihalkova and Mooney [27] perform transfer between Markov Logic Net-
works. Given a learned MLN for a source task, they learn an MLN for a related
target task by starting with the source-task one and diagnosing each formula,
adjusting ones that are to o general or too specific in the target domain. The
hypothesis space for the target task is therefore defined in relation to the source-
task MLN by the operators that generalize or specify formulas.
Hlynsson [17] phrases transfer learning in classification as a minimum descrip-
tion length problem given source-task hypotheses and target-task data. That is,
the chosen hypothesis for a new task can use hypotheses for old tasks but stip-
ulate exceptions for some data points in the new task. This method aims for a
tradeoff between accuracy and compactness in the new hypothesis.
Ben-David and Schuller [3] propose a transformation framework to determine
how related two Boolean classification tasks are. They define two tasks as related
with respect to a class of transformations if they are equivalent under that class;
that is, if a series of transformations can make one task look exactly like the
other. They provide conditions under which learning related tasks concurrently
requires fewer examples than single-task learning.
Bayesian Transfer
One area of inductive transfer applies specifically to Bayesian learning meth-
ods. Bayesian learning involves modeling probability distributions and taking
advantage of conditional independence among variables to simplify the mo del.
An additional aspect that Bayesian models often have is a prior distribution,
which describes the assumptions one can make about a domain before seeing
any training data. Given the data, a Bayesian model makes predictions by com-
bining it with the prior distribution to produce a posterior distribution. A strong
prior can significantly affect these results (see Figure 5). This serves as a natural
way for Bayesian learning methods to incorporate prior knowledge – in the case
of transfer learning, source-task knowledge.
Marx et al. [24] use a Bayesian transfer method for tasks solved by a logistic
regression classifier. The usual prior for this classifier is a Gaussian distribution
with a mean and variance set through cross-validation. To perform transfer, they
instead estimate the mean and variance by averaging over several source tasks.
Raina et al. [33] use a similar approach for multi-class classification by learning
a multivariate Gaussian prior from several source tasks.
Dai et al. [7] apply a Bayesian transfer method to a Naive Bayes classifier.
They set the initial probability parameters based on a single source task, and
revise them using target-task data. They also provide some theoretical bounds
on the prediction error and convergence rate of their algorithm.
5