整合先验知识的机器学习：Informed ML综述与分类

需积分: 50 39 浏览量更新于2024-07-09 收藏 918KB PDF 举报

首篇《知信机器学习Informed ML》综述论文旨在深入探讨在机器学习领域中引入先验知识以解决训练数据不足问题的重要性和潜力。尽管传统机器学习已经在众多应用中取得了显著的成功，但当面临数据稀缺或质量不高的情况时，其性能往往会受到限制。为了解决这个问题，研究者提出了知信机器学习这一概念，它强调将专家知识、领域模型或其他形式的外部信息融入到学习系统中，以提升模型的泛化能力和决策的合理性。本文首先定义了知信机器学习，明确指出它区别于常规机器学习的关键在于其包含额外的知识源和处理方式。知信机器学习的核心在于以下几个方面： 1. **知识来源**：知识可以来源于多个渠道，如领域专家的经验、规则、模型、数据库、传感器数据或已有的统计规律。这些来源决定了知识的质量和适用性。 2. **知识表示**：知识需要被适当地转换和编码，以便于算法理解和利用。这可能涉及到符号表示、概率分布、神经网络等形式，取决于知识的复杂性和机器学习模型的需求。 3. **集成方法**：如何将知识有效地整合到学习过程中至关重要。这可能包括监督式学习中的特征工程、半监督或无监督学习中的引导学习、迁移学习中的知识转移，或是强化学习中的奖励函数设计。 4. **权衡与挑战**：引入先验知识可能会带来新的挑战，如知识的不确定性、冲突、过拟合或对新数据的适应性问题。因此，研究者需要在知识的有效性与模型的灵活性之间找到平衡。 5. **评估与应用**：论文还涵盖了如何度量和评估知信机器学习方法的效果，以及它们在实际场景中的应用案例，如自然语言处理、计算机视觉、医疗诊断和推荐系统等。知信机器学习是一个动态且具有广阔前景的研究领域，它通过融合内外部知识，试图克服机器学习在数据匮乏情况下的局限，提升系统的智能水平和鲁棒性。这篇综述论文为读者提供了对这一新兴领域的全面概述，对于理解和实践知识驱动的机器学习具有重要的参考价值。

PREPRINT (ORIGINAL PUBLISHED AT IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING) 4

Which source of knowledge How is the knowledge

Source

Representation

Logic Rules

Algebraic Equations

Probabilistic Relations

Spatial Invariances

Differential Equations

Knowledge Graphs

Simulation Results

Human Feedback

Training Data

Final Hypothesis

is integrated?

represented?

Where is the knowledge integrated

Integration

in the machine learning pipeline?

Expert Knowledge

(Intuition, Less Formal)

Scientiﬁc Knowledge

(Natural Sciences,

Engineering, etc.)

Hypothesis Set

(Network Architecture,

Model Structure, etc.)

World Knowledge

(Vision, Linguistics,

Semantics, General K., etc.)

Learning Algorithm

(Regularization Terms,

Constrained Opt., etc.)

Figure 2: Taxonomy of Informed Machine Learning. This taxonomy serves as a classiﬁcation framework for informed

machine learning and structures approaches according to the three above analysis questions about the knowledge source,

knowledge representation and knowledge integration. Based on a comparative and iterative literature survey, we identiﬁed for

each dimension a set of elements that represent a spectrum of different approaches. The size of the elements reﬂects the

relative count of papers. We combine the taxonomy with a Sankey diagram in which the paths connect the elements across

the three dimensions and illustrate the approaches that we found in the analyzed papers. The broader the path, the more

papers we found for that approach. Main paths (at least four or more papers with the same approach across all dimensions)

are highlighted in darker grey and represent central approaches of informed machine learning.

representation and knowledge integration. Each dimension con-

tains a set of elements that represent the spectrum of differ-

ent approaches found in the literature. This is illustrated in

the taxonomy in Figure 2.

With respect to knowledge sources, we found three

broad categories: Rather specialized and formalized scien-

tiﬁc knowledge, everyday life’s world knowledge, and more

intuitive expert knowledge. For scientiﬁc knowledge we

found the most informed machine learning papers. With

respect to knowledge representations, we found versatile

and ﬁne-grained approaches and distilled eight categories

(Algebraic equations, differential equations, simulation re-

sults, spatial invariances, logic rules, knowledge graphs,

probabilistic relations and human feedback). Regarding

knowledge integration, we found approaches for all stages

of the machine learning pipeline, from the training data

and the hypothesis set, over the learning algorithm, to the

ﬁnal hypothesis. However, most informed machine learning

papers consider the two central stages.

Depending on the perspective, the taxonomy can be

regarded from either one of two sides: An application-

oriented user might prefer to read the taxonomy from left

to right, starting with some given knowledge source and

then selecting representation and integration. Vice versa, a

method-oriented developer or researcher might prefer to

read the taxonomy from right to left, starting with some

given integration method. For both perspectives, knowledge

representations are important building blocks and constitute

an abstract interface that connects the application- and the

method-oriented side.

3.2.2 Frequent Approaches

The taxonomy serves as a classiﬁcation framework and

allows us to identify frequent approaches of informed ma-

chine learning. In our literature survey, we categorized each

research paper with respect to each of the three taxonomy

dimensions.

Paths through the Taxonomy. When visually highlight-

ing and connecting them, a speciﬁc combination of entries

across the taxonomy dimensions ﬁguratively results in a

path through the taxonomy. Such paths represent speciﬁc

approaches towards informed learning and we illustrate

剩余18页未读，继续阅读

syp_net

粉丝: 158
资源: 1187

整合先验知识的机器学习：Informed ML综述与分类

机器学习行为预测数据分享

(完整版)基于机器学习对销量预测的研究.ppt

论文笔记-Spatially informed cell-type deconvolution for spatial tran

informed-traveler

drink-informed

matlab代码影响-MMSys2020_Informed-Access-Network-Selection:与论文“使用信息访问网络选择提高

the informed job search

"贝叶斯统计分析与机器学习解析，概念、方法详解

Informed RRT*算法学习

informed rrt

最新资源