PREPRINT (ORIGINAL PUBLISHED AT IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING) 4
Which source of knowledge How is the knowledge
Source
Representation
Logic Rules
Algebraic Equations
Probabilistic Relations
Spatial Invariances
Differential Equations
Knowledge Graphs
Simulation Results
Human Feedback
Training Data
Final Hypothesis
is integrated?
represented?
Where is the knowledge integrated
Integration
in the machine learning pipeline?
Expert Knowledge
(Intuition, Less Formal)
Scientific Knowledge
(Natural Sciences,
Engineering, etc.)
Hypothesis Set
(Network Architecture,
Model Structure, etc.)
World Knowledge
(Vision, Linguistics,
Semantics, General K., etc.)
Learning Algorithm
(Regularization Terms,
Constrained Opt., etc.)
Figure 2: Taxonomy of Informed Machine Learning. This taxonomy serves as a classification framework for informed
machine learning and structures approaches according to the three above analysis questions about the knowledge source,
knowledge representation and knowledge integration. Based on a comparative and iterative literature survey, we identified for
each dimension a set of elements that represent a spectrum of different approaches. The size of the elements reflects the
relative count of papers. We combine the taxonomy with a Sankey diagram in which the paths connect the elements across
the three dimensions and illustrate the approaches that we found in the analyzed papers. The broader the path, the more
papers we found for that approach. Main paths (at least four or more papers with the same approach across all dimensions)
are highlighted in darker grey and represent central approaches of informed machine learning.
representation and knowledge integration. Each dimension con-
tains a set of elements that represent the spectrum of differ-
ent approaches found in the literature. This is illustrated in
the taxonomy in Figure 2.
With respect to knowledge sources, we found three
broad categories: Rather specialized and formalized scien-
tific knowledge, everyday life’s world knowledge, and more
intuitive expert knowledge. For scientific knowledge we
found the most informed machine learning papers. With
respect to knowledge representations, we found versatile
and fine-grained approaches and distilled eight categories
(Algebraic equations, differential equations, simulation re-
sults, spatial invariances, logic rules, knowledge graphs,
probabilistic relations and human feedback). Regarding
knowledge integration, we found approaches for all stages
of the machine learning pipeline, from the training data
and the hypothesis set, over the learning algorithm, to the
final hypothesis. However, most informed machine learning
papers consider the two central stages.
Depending on the perspective, the taxonomy can be
regarded from either one of two sides: An application-
oriented user might prefer to read the taxonomy from left
to right, starting with some given knowledge source and
then selecting representation and integration. Vice versa, a
method-oriented developer or researcher might prefer to
read the taxonomy from right to left, starting with some
given integration method. For both perspectives, knowledge
representations are important building blocks and constitute
an abstract interface that connects the application- and the
method-oriented side.
3.2.2 Frequent Approaches
The taxonomy serves as a classification framework and
allows us to identify frequent approaches of informed ma-
chine learning. In our literature survey, we categorized each
research paper with respect to each of the three taxonomy
dimensions.
Paths through the Taxonomy. When visually highlight-
ing and connecting them, a specific combination of entries
across the taxonomy dimensions figuratively results in a
path through the taxonomy. Such paths represent specific
approaches towards informed learning and we illustrate