A Communication-Efficient Parallel Algorithm for
Decision Tree
Qi Meng
1,∗
, Guolin Ke
2,∗
, Taifeng Wang
2
, Wei Chen
2
, Qiwei Ye
2
,
Zhi-Ming Ma
3
, Tie-Yan Liu
2
1
Peking University
2
Microsoft Research
3
Chinese Academy of Mathematics and Systems Science
1
qimeng13@pku.edu.cn;
2
{Guolin.Ke, taifengw, wche, qiwye, tie-yan.liu}@microsoft.com;
3
mazm@amt.ac.cn
Abstract
Decision tree (and its extensions such as Gradient Boosting Decision Trees and
Random Forest) is a widely used machine learning algorithm, due to its practical
effectiveness and model interpretability. With the emergence of big data, there is
an increasing need to parallelize the training process of decision tree. However,
most existing attempts along this line suffer from high communication costs. In
this paper, we propose a new algorithm, called Parallel Voting Decision Tree
(PV-Tree), to tackle this challenge. After partitioning the training data onto a
number of (e.g.,
M
) machines, this algorithm performs both local voting and
global voting in each iteration. For local voting, the top-
k
attributes are selected
from each machine according to its local data. Then, globally top-
2k
attributes
are determined by a majority voting among these local candidates. Finally, the
full-grained histograms of the globally top-
2k
attributes are collected from local
machines in order to identify the best (most informative) attribute and its split point.
PV-Tree can achieve a very low communication cost (independent of the total
number of attributes) and thus can scale out very well. Furthermore, theoretical
analysis shows that this algorithm can learn a near optimal decision tree, since it
can find the best attribute with a large probability. Our experiments on real-world
datasets show that PV-Tree significantly outperforms the existing parallel decision
tree algorithms in the trade-off between accuracy and efficiency.
1 Introduction
Decision tree [
16
] is a widely used machine learning algorithm, since it is practically effective and
the rules it learns are simple and interpretable. Based on decision tree, people have developed other
algorithms such as Random Forest (RF) [
3
] and Gradient Boosting Decision Trees (GBDT) [
7
],
which have demonstrated very promising performances in various learning tasks [5].
In recent years, with the emergence of very big training data (which cannot be held in one single
machine), there has been an increasing need of parallelizing the training process of decision tree. To
this end, there have been two major categories of attempts:
2
.
∗
Denotes equal contribution. This work was done when the first author was visiting Microsoft Research Asia.
2
There is another category of works that parallelize the tasks of sub-tree training once a node is split [
15
],
which require the training data to be moved from machine to machine for many times and are thus inefficient.
Moreover, there are also some other works accelerating decision tree construction by using pre-sorting [
13
] [
19
]
[
11
] and binning [
17
] [
8
] [
10
], or employing a shared-memory-processors approach [
12
] [
1
]. However, they are
out of our scope.
30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.