1902 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 25, NO. 12, DECEMBER 2015
applied to multimedia mining, and the results of k-means
clustering and background substraction have been presented.
Moise et al. [15] propose to employ MapReduce to accelerate
searching a large amount of images. MapReduce is used
to find the nearest neighbors for the generation of image
tags in [16]. Compared with MapReduce that abstracts the
computations into map and reduce operations, the proposed
CHCF is more general in terms of the operations using filter
paradigm; moreover, the proposed CHCF employs adaptive
data partitioning and knowledge-based hierarchical scheduling
to address the challenge of load imbalance.
As far as the heterogeneous cluster is concerned, several
frameworks or paradigms have been designed in recent years.
In [17], StarPU is introduced for numerical kernel design
to execute parallel tasks on a shared-memory machine with
heterogeneous hardwares. Since the number of processors in
a shared-memory machine is limited, the predefined schedul-
ing strategy employed by StarPU improves the performance
efficiently. However, the scheduler proposed by StarPU may
become ineffective when ported onto distributed systems. This
is the reason why we design the knowledge-based hierarchical
scheduler, which is hybrid in the sense of static and dynamic
scheduling policies and thus more suitable for distributed
systems. In [18], a high-level model for parallel programming,
compiler, and runtime, called merge, is proposed, which
is similar to StarPU. Merge also aims at shared-memory
machines with heterogeneous hardwares and employs the
MapReduce programming paradigm to simplify the interface
for users, which in turn constrains its applicable range. In [19],
an elastic computing framework is proposed, which uses an
adapter to hide the difference of various kinds of processors,
such as GPU, Intel Phi MIC, and even field-programmable gate
array. However, the elastic computing framework focuses on
the design of elastic functions and has few discussions about
scheduling. As inspired by the elastic computing framework,
the proposed CHCF employs a similar technique called filter
interface to hide the difference of the underlying processors
and remain future extension to other processors. He et al. [20]
propose a GPU-based MapReduce framework Mars, which
evaluates the benefits of cooperative usage of CPUs and GPUs
on a number of traditional applications such as string matching
and matrix multiplication. However, the scheduling policy of
Mars is not mentioned in [20].
In addition to performance and scalability, a number of
works have made efforts on programmability, which share
similarities with the proposed CHCF on the goal of facilitating
domain-specific application development. In [21], a compiler
is proposed to enable users to write hybrid CPU/GPU code
by utilizing the OpenMP [22] directives. Ghoting et al. [23]
offer a high-level declarative language SystemML for pro-
gramming machine learning applications that are compiled
and executed with MapReduce. Compared with SystemML,
the proposed CHCF compiler is hybrid in the sense that
C/C++ and Java can be utilized to develop filters. Ma and
Agrawal [24] propose a code generation system, referred
to as AUTO-GC, to translate data mining applications on
GPU clusters. The results of two popular data mining algo-
rithms, including k-means clustering and principle component
analysis, are reported to demonstrate that a good scalability
is accomplished without noticeable overheads of the trans-
lated code. Compared with AUTO-GC that is actually a
code generation system for reduction operations, the pro-
posed CHCF provides a more general programming para-
digm with filters. Lee et al. [25] develop a similar compiler
framework for translating standard OpenMP applications to
Compute Unified Device Architecture-based general purpose
GPU applications. However, this compiler is operating in a
source-to-source translation manner.
Regarding load imbalance, the centralized paradigm is
widely used by scheduling policies such as [26] and [27],
which employs a master node to take charge of scheduling.
Although this paradigm is simple and effective when the num-
ber of nodes of the system is relatively small, the limitations
are obvious. First, the effectiveness may be sharply degraded
when the number of nodes increases. Second, the scheduling
made by the master is one-time assignment, but the informa-
tion for making the scheduling decision is constantly changing.
To overcome the disadvantage of centralized scheduling,
a distributed load balancing approach, called distributed
adaptive scheduling (DAS), is proposed in [28] for scientific
applications expressed as iterative loops with dependencies.
DAS combines the information about the run queue with the
timing history of each computation worker to make schedul-
ing. It is verified that DAS can achieve significant performance
improvements over centralized approaches. However, as high-
performance clusters become heterogeneous and the scale of
the clusters continues to increase, DAS meets some challenges.
Since the computation workers have to communicate with
each other to make scheduling decisions, the overheads for
network communication may hurt the overall performance and
the effectiveness of scheduling when the number of nodes is
large. Moreover, the efficiency of DAS may become degraded
when the nodes are equipped with multiple GPUs. In contrary
to DAS, the proposed knowledge-based hierarchical scheduler
is masterless, which means that scheduling decisions are made
by the master scheduler and node schedulers cooperatively.
In another word, the master scheduler coordinates the node
schedulers and the node schedulers make scheduling decisions.
Therefore, the communication overheads are reduced since the
scheduling efficiency is improved by distributing scheduling
computations from the master scheduler to node schedulers.
III. CHCF O
VERVIEW
A. Architecture
The overall architecture of the proposed CHCF is depicted
in Fig. 1. As aforementioned, CHCF aims at developing and
executing multimedia mining applications on hybrid systems.
To achieve this, CHCF provides a high-level programming
language coupled with a utility library, in which multimedia
mining algorithms can be implemented for running with CPUs
and/or GPUs, and a set of tools for compilation, execution, and
optimization are designed. In CHCF, the popular programming
paradigm, called filter stream [29], is adopted; therefore,
all computing units, either atomic or compound, are repre-
sented by filters and effectively communicate with each other