OC1: 新型多变量决策树 induction 系统

需积分: 0 67 浏览量更新于2024-06-30 收藏 365KB PDF 举报

"Murthy S K , Kasif S , Salzberg S . A System for Induction of Oblique Decision Trees[J]. Journal of Artificial Intelligence Research, 1996, 2(1):1-32 - 多变量决策树OC11" 这篇研究论文详细介绍了Oblique Decision Trees（斜决策树）的构建系统，名为OC11。作者Sreerama K. Murthy、Simon Kasif和Steven Salzberg均来自约翰斯·霍普金斯大学计算机科学系。该系统结合了确定性的梯度上升法与两种随机化策略，用于在决策树的每个节点找到最佳的斜分割（表现为超平面）。 Oblique决策树方法主要针对具有数值属性的领域设计，尽管它们可以适应符号型或混合符号/数值属性的数据。与传统的轴平行决策树相比，斜决策树有潜力构建更小且更精确的模型。这是因为斜分割允许在特征空间中寻找更复杂的决策边界，这可能更好地捕捉数据中的非线性关系。文章通过广泛的实证研究，包括使用真实和人工数据，分析了OC11构建斜决策树的能力。这些研究表明，OC11能够生成比其对应的轴平行决策树更紧凑且准确的模型。实验结果对算法的性能进行了深入比较，突显了在复杂数据集上斜决策树的优势。在机器学习和人工智能领域，决策树是一种常用且易于理解的模型，常用于分类和回归任务。而Oblique决策树则扩展了这一概念，引入了更多的灵活性，尤其在处理具有多个关联特征的数据时，能够提供更好的预测性能。OC11系统的提出，为数据挖掘和模式识别领域提供了一种有效的工具，有助于提升模型的效率和准确性。此外，论文还可能涵盖了优化过程的细节，包括如何在随机化过程中平衡探索与开发，以及如何评估不同分割的效果。这些技术对于理解OC11如何在决策树构造过程中寻找最优超平面至关重要。通过这种方法，OC11能够在保留决策树解释性的同时，提高模型对复杂数据模式的适应性。这篇论文为机器学习社区提供了一个创新的工具，即OC11系统，用于构建更高效、更准确的决策树模型，特别是在处理数值属性丰富的数据集时。这种系统和相关方法的研究，对于提高模型的性能和理解数据内在结构有着深远的影响。

Murthy, Kasif & Salzberg

On the other hand, it is possible to dene impurity measures for which the problem

of nding optimal hyperplanes can be solved in polynomial time. For example, if one

minimizes the sum of distances of mis-classied examples, then the optimal solution can

be found using linear programming methods (if distance is measured along one dimension

only). However, classiers are usually judged byhow many points they classify correctly,

regardless of how close to the decision boundary a pointmay lie. Thus most of the standard

measures for computing impurity base their calculation on the discrete numb er of examples

of each category on either side of the hyperplane. Section 3.3 discusses several commonly

used impurity measures.

Now let us address the second issue, that of the complexity of building a small tree.

It is easy to show that the problem of inducing the smallest axis-parallel decision tree is

NP-hard. This observation follows directly from the work of Hyal and Rivest (1976). Note

that one can generate the smallest axis-parallel tree that is consistent with the training

set in polynomial time

the numb er of attributes is a constant. This can b e done by

using dynamic programming or branch and bound techniques (see Moret (1982) for several

pointers). But when the tree uses oblique splits, it is not clear, even for a xed number

of attributes, how to generate an optimal (e.g., smallest) decision tree in p olynomial time.

This suggests that the complexity of constructing go od oblique trees is greater than that

for axis-parallel trees.

It is also easy to see that the problem of constructing an optimal (e.g., smallest) oblique

decision tree is NP-hard. This conclusion follows from the work of Blum and Rivest (1988).

Their result implies that in

dimensions (i.e., with

attributes) the problem of producing

a 3-no de oblique decision tree that is consistent with the training set is NP-complete. More

specically, they show that the following decision problem is NP-complete: given a training

set

with

examples and

Boolean attributes, do es there exist a 3-no de neural network

consistent with

?From this it is easy to show that the following question is NP-complete:

given a training set

, does there exist a 3-leaf-node oblique decision tree consistent with

As a result of these complexity considerations, we to ok the pragmatic approach of trying

to generate small trees, but not looking for the smallest tree. The greedy approach used by

OC1 and virtually all other decision tree algorithms implicitly tries to generate small trees.

In addition, it is easy to construct example problems for which the optimal split at a no de

will not lead to the b est tree; thus our philosophyasembo died in OC1 is to nd locally

goo d splits, but not to spend excessive computational eort on improving the qualityof

these splits.

2. Previous Work on Oblique Decision Tree Induction

Before describing the OC1 algorithm, we will briey discuss some existing oblique DT

induction metho ds, including CART with linear combinations, Linear Machine Decision

Trees, and Simulated Annealing of Decision Trees. There are also methods that induce

tree-like classiers with linear discriminants at each node, most notably methods using

linear programming (Mangasarian, Setiono, & Wolb erg, 1990; Bennett & Mangasarian,

1992, 1994a, 1994b). Though these metho ds can nd the optimal linear discriminants for

specic goo dness measures, the size of the linear program grows very fast with the number

Induction of Oblique Decision Trees

To induce a split at node

of the decision tree:

Normalize values for all

attributes.

While (TRUE)

Let the current split



, where

For

;:::;d

For



= -0.25,0,0.25

Search for the



that maximizes the goodness of the split



(



)



Let









be the settings that result in highest goo dness in these 3 searches.













Perturb

to maximize the goo dness of

,keeping

;:::;a

constant.

goo dness(

) - goo dness

(

)

j



exit while lo op.

Eliminate irrelevant attributes in

;:::;a

using backward elimination.

Convert

to a split on the un-normalized attributes.

Return the b etter of

and the best axis-parallel split as the split for

Figure 4: The pro cedure used by CART with linear combinations (CART-LC) at eachnode

of a decision tree.

of instances and the number of attributes. There is also some less closely related work on

algorithms to train articial neural networks to build decision tree-like classiers (Brent,

1991; Cios & Liu, 1992; Herman & Yeung, 1992).

The rst oblique decision tree algorithm to b e prop osed was CART with linear combina-

tions (Breiman et al., 1984, chapter 5). This algorithm, referred to henceforth as CART-LC,

is an important basis for OC1. Figure 4 summarizes (using Breiman et al.'s notation) what

the CART-LC algorithm does at each node in the decision tree. The core idea of the CART-

LC algorithm is how it nds the value of



that maximizes the go odness of a split. This

idea is also used in OC1, and is explained in detail in Section 3.1.

After describing CART-LC, Breiman et al. point out that there is still much room for

further development of the algorithm. OC1 represents an extension of CART-LC that

includes some signicant additions. It addresses the following limitations of CART-LC:



CART-LC is fully deterministic. There is no built-in mechanism for escaping local

minima, although such minima maybevery common for some domains. Figure 5

shows a simple example for which CART-LC gets stuck.



CART-LC produces only a single tree for a given data set.



CART-LC sometimes makes adjustments that increase the impurity of a split. This

feature was probably included to allow it to escape some local minima.



There is no upp er bound on the time spentatany node in the decision tree. It halts

when no perturbation changes the impurity more than



, but because impuritymay

increase and decrease, the algorithm can spend arbitrarily long time at a node.

剩余31页未读，继续阅读

H等等H

粉丝: 43
资源: 337

OC1: 新型多变量决策树 induction 系统

An introduction to system thinking

Murthy, S. K. (1998). Data Mining and Knowledge Discovery, 2(4),

藏经阁-Welcome to Apache Hadoop’s Teenage Years.pdf

hello_world_dapp:基于Mahesh Murthy的中篇文章的简单投票dapp

Murthy_resume:我在使用HTML，CSS，Bootstrap和UX设计原理的软件编码行业的简历

Cross Layer Design for Mobile IP Handoff

Database-Query-Evalution-Engine-for-Big-Data:查询评估器的增强以提供对 SQL、核外查询处理和有限形式的查询评估的全面支持

Clustering-Analysis-for-Complex-Networks:实施马尔可夫聚类算法并将其应用于给定的三个数据集，AT&T Web 网络、物理协作网络和酵母代谢网络

Thresholding in Edge Detection： A Statistical Approach

ApacheSparkCloudandOnPrem.pdf

最新资源