Contents lists available at ScienceDirect
Journal of Visual Languages and Computing
journal homepage: www.elsevier.com/locate/jvlc
Exploring linear projections for revealing clusters, outliers, and trends in
subsets of multi-dimensional datasets
Jiazhi Xia
a
, Le Gao
a
, Kezhi Kong
b
, Ying Zhao
⁎
,a
, Yi Chen
c
, Xiaoyan Kui
a
, Yixiong Liang
a
a
School of Information Science and Engineering, Central South University, Changsha, China
b
State Key Lab of CAD&CG, Zhejiang University, Hangzhou, China
c
Beijing Key Laboratory of Big Data Technology for Food Safety, Beijing Technology and Business University, Beijing, China
ARTICLE INFO
Keywords:
Multi-dimensional data
Projection
Visual exploring
ABSTRACT
Identifying patterns in 2D linear projections is important in understanding multi-dimensional datasets. However,
local patterns, which are composed of partial data points, are usually obscured by noises and missed in tradi-
tional quality measure approaches that measure the whole dataset. In this paper, we propose an interactive
interface to explore 2D linear projections with visual patterns on subsets. First, we propose a voting-based
algorithm to recommend optimal projection, in which the identified pattern looks the most salient. Specifically,
we propose three kinds of point-wise quality metrics of 2D linear projections for outliers, clusterings, and trends,
respectively. For each sampled projection, we measure its importance by accumulating the metrics of selected
points. The projection with the highest importance is recommended. Second, we design an exploring interface
with a scatterplot, a projection trail map, and a control panel. Our interface allows users to explore projections
by specifying interested data subsets. At last, we employ three datasets and demonstrate the effectiveness of our
approach through three case studies of exploring clusters, outliers, and trends.
1. Introduction
Multi-dimensional data visualization plays an important role in data
exploring and understanding. Among a variety of visualization ap-
proaches, 2D linear projection remains the most popular method to
provide insights into structures and patterns in datasets [1]. Specifi-
cally, users are interested in the visual patterns of clusters, outliers, and
trends in linear projections [2,3]. However, it is considered to be a
fundamental challenge to identify interesting projections from the nu-
merous possible projections [1].
To resolve this issue, several approaches have been proposed to
provide a small set of representative projections. First, quality measures
are adopted to rank possible projections [4] . Quality measures of
clusters(e.g. Linear Discriminant Analysis [5]), trends (e.g. the Pearson
correlation coefficient), and outliers (e.g. statistics analysis [4]) are
widely studied. Specifically, the scagnostics [6] comprises nine mea-
surements describing the patterns of points in projections, including
outliers, shape, trend, and density (e.g. clumpy). Second, dissimilarities
among projections are measured to reduce the redundant in the re-
commendation set [7]. Alternatively, Liu et al. [1] look for local max-
imum projections to provide a representative set.
However, most existing quality measures are defined in projections
of the whole dataset. Real-world datasets often contain multiple clusters
and noises. Local patterns that exist in a subset can be obscured by
other components or noises. For instance, it is highly improbable to
present patterns of clusters that lie in different subspaces in a single
projection. Therefore, it is challenging to provide insight into local
patterns based on global quality measures.
Let us consider a typical exploratory analysis scenario. When users
explore projections for interested patterns, the exploring process often
contains three stages. First, users look around the projection space until
a global or local pattern is observed. Because the exploring space is
large and the dataset is usually complicated, it is non-trivial to achieve
a projection with the clear pattern in this stage. More probably, users
observe a noised pattern, such as a set of points which are densely
gathered and mixed with sparsely distributed points. Second, this ob-
servation yields a hypothesis of the existence of the local pattern.
Specifically, the hypothesis is composed of the pattern type (e.g. cluster,
trend, or outlier) and the subset of points that form the pattern. Third,
this hypothesis motivates consequent exploration operations to verify
it. The loop of looking around, suggesting a hypothesis, and verifying
the hypothesis is performed iteratively in the exploring process.
https://doi.org/10.1016/j.jvlc.2018.08.003
Received 20 July 2018; Accepted 6 August 2018
⁎
Corresponding author.
E-mail addresses: xiajiazhi@csu.edu.cn (J. Xia), csugaole@csu.edu.cn (L. Gao), durantkong@zju.edu.cn (K. Kong), zhaoying@csu.edu.cn (Y. Zhao),
chenyi@th.btbu.edu.cn (Y. Chen), xykui@csu.edu.cn (X. Kui), yxliang@csu.edu.cn (Y. Liang).
Journal of Visual Languages and Computing 48 (2018) 52–60
Available online 09 August 2018
1045-926X/ © 2018 Elsevier Ltd. All rights reserved.
T