PGWinFunc: Optimizing Window Aggregate
Functions and Its Application for LBS Patterns
Jiansong Ma
1
Yu Cao
2
Xiaoling Wang
1
Chaoyong Wang
1
Cheqing Jin
1
Aoying Zhou
1
1
Shanghai Key Laboratory of Trustworthy Computing,
East China Normal University, Shanghai, China
{mjs, xlwang, ayzhou}@sei.ecnu.edu.cn
2
EMC Labs
Beijing, China
yu.cao@emc.com
Abstract—In modern cities, more and more people drive the
vehicles, equipped with the GPS devices, which create a large
scale of trajectories. Gathering and analyzing these large-scale
trajectory data provide a new opportunity to understand the
city dynamics and to reveal the hidden social and economic
phenomena. This paper designs and implements a tool, named as
PGWinFunc, to analyse trajectory data by extending a traditional
relational database. Firstly we introduce some efficient query
process and optimization methods for SQL Window Aggregate
Functions in PostgreSQL. Secondly, we present how to mine
the LBS(Location Based Service) patterns, such as the average
speed and traffic flow, from the large-scale trajectories with
SQL expression with Window Aggregate Functions. Finally,
the effectiveness and efficiency of the PGWinFunc tool are
demonstrated and we also visualized the results by BAIDU MAP.
I. INTRODUCTION
Trajectory data generated by moving vehicles becomes more
and more important, and it provides us an unprecedented
opportunity to understand the city dynamics and reveal the
hidden social and economic phenomena. In this paper, we
focus on storing and analyzing these large-scale real digital
trajectory data by the traditional RDB(Relational Database
Systems), PostgreSQL.
SQL Window Aggregate Functions perform common anal-
yses such as ranking, percentiles, moving averages and cumu-
lative in a flexible, intuitive and efficient manner, overcoming
shortcomings of the traditional alternatives such as grouped
queries, correlated subqueries and self-joins [2], [3]. As one
of the most useful standardized extensions to SQL since the
SQL:2003 standard, Window Aggregate Functions have been
widely implemented in most of the major commercial and
open-source relational database systems (e.g. Oracle, DB2,
SQL Server, Teradata, Pivotal Greenplum and PostgreSQL),
as well as in some emerging Big Data systems (e.g. Google
Tenzing, SAP HANA, Amazon Redshift, Pivotal HAWQ and
Cloudera Impala). With the Window Aggregate Functions, we
can easily tackle with the large-scale trajectories to mine the
hidden social patterns and phenomena.
In current database systems [4], [5], in principle a Window
Aggregate Function is evaluated over the windowed table in
a two-phase manner. In the first phase, the windowed table
is reordered into a set of physical window partitions, each
of which has a distinct value of the PARTITION BY key
and is sorted on the ORDER BY key. The generated window
partitions are pipelined into the second phase, where the
Window Aggregate Function is sequentially invoked for each
row over its window frame within each window partition.
While existing techniques [4], [6], [7], [8] are available to
optimize the table reordering operation in the first phase, there
are rare previous studies investigating how to save the costs of
window function calls in the second phase, which is exactly
we want to set off in this system.
In this demo, we illustrate our design of a new data analysis
system, PGWinFunc, which is implemented by extending Post-
greSQL. In this system, the user submits his SQL query with
Window Aggregate Function and obtains the analysis results
by BAIDU MAP [1] visualization technology. In PGWinFunc,
user gets the intuitive visual insights of the analysis results
instead of the traditional relational tables. These visualizations
help users quickly understand the city dynamics and reveal the
hidden social and economic phenomena.
II. FRAMEWORK
The framework of the PGWinFunc system is illustrated
in Figure 1. The PGWinFunc system is divided into two
components: online part and offline part.
Some preprocess work, including map matching and loading
data into the PostgreSQL table, are conducted in the offline
mode. The component of map-matching is to map the LBS
data onto the road-network with the most simple map matching
algorithm(we map the GPS point onto the nearest road).
For this application, we design the trajectory schema with
six basic attributes: year, month, day, hour, minute and second.
We also add some statistic attributes into this table, including
the car numbers, the average speed, the max speed and so
on. Finally, we load the trajectory data into the table in
PostgreSQL database.
In the online part, we execute users’ SQL query and
return the results by using BAIDU MAP visualization. The
PGWinFunc system accepts the request from the UI(User