PGWinFunc：优化PostgreSQL窗口聚合函数及其在轨迹数据中的应用

151 浏览量更新于2024-08-27 收藏 2.08MB PDF 举报

"PGWinFunc 是一款针对 PostgreSQL 数据库优化窗口聚合函数的工具，特别适用于大规模轨迹数据分析，以提取 LBS（Location-Based Service）模式，如平均速度和交通流量等信息。该论文详细介绍了在 PostgreSQL 中对 SQL 窗口聚合函数的查询处理和优化方法，并展示了如何利用这些优化来挖掘轨迹数据中的隐藏模式。" 本文主要探讨了在现代城市中，由于大量装有 GPS 设备的车辆产生的大规模轨迹数据，为理解和揭示城市动态以及社会经济现象提供了新的机会。为了有效分析这些数据，作者设计并实现了一个名为 PGWinFunc 的工具，它扩展了传统的关系数据库功能，特别是在 PostgreSQL 中的窗口聚合函数方面。首先，论文详细阐述了在 PostgreSQL 中优化 SQL 窗口聚合函数的方法。窗口聚合函数允许在数据集的特定窗口或分组内进行计算，这对于处理时间序列数据（如轨迹数据）尤其有用。优化包括查询处理的改进，如更高效的索引策略、并行化执行和内存管理，以提高查询性能和减少计算时间。其次，作者提出了如何利用这些优化的窗口聚合函数来挖掘 LBS 模式。例如，通过计算每个时间段内的平均速度，可以分析交通流的动态变化；通过分析车辆的移动路径，可以推断出交通拥堵区域。此外，还可以分析特定地点的访问频率，以识别热门地点或活动模式，这对提供基于位置的服务至关重要。在实际应用中，PGWinFunc 可能被用于智能交通系统，帮助城市规划者更好地理解交通状况，或者为地图服务提供商提供更准确的实时导航建议。同时，该工具也可以应用于商业分析，如零售业者分析顾客的购物行为，或者社交媒体平台追踪用户活动的地理分布。总结来说，PGWinFunc 是一个强大的工具，它通过优化 PostgreSQL 中的窗口聚合函数，有效地处理大规模轨迹数据，为研究和应用提供了新的可能性。这不仅提高了数据处理效率，还促进了对复杂时空数据的深入洞察，对于理解城市运行机制和社会现象具有重要意义。

PGWinFunc: Optimizing Window Aggregate

Functions and Its Application for LBS Patterns

Jiansong Ma

Yu Cao

Xiaoling Wang

Chaoyong Wang

Cheqing Jin

Aoying Zhou

Shanghai Key Laboratory of Trustworthy Computing,

East China Normal University, Shanghai, China

{mjs, xlwang, ayzhou}@sei.ecnu.edu.cn

EMC Labs

Beijing, China

yu.cao@emc.com

Abstract—In modern cities, more and more people drive the

vehicles, equipped with the GPS devices, which create a large

scale of trajectories. Gathering and analyzing these large-scale

trajectory data provide a new opportunity to understand the

city dynamics and to reveal the hidden social and economic

phenomena. This paper designs and implements a tool, named as

PGWinFunc, to analyse trajectory data by extending a traditional

relational database. Firstly we introduce some efﬁcient query

process and optimization methods for SQL Window Aggregate

Functions in PostgreSQL. Secondly, we present how to mine

the LBS(Location Based Service) patterns, such as the average

speed and trafﬁc ﬂow, from the large-scale trajectories with

SQL expression with Window Aggregate Functions. Finally,

the effectiveness and efﬁciency of the PGWinFunc tool are

demonstrated and we also visualized the results by BAIDU MAP.

I. INTRODUCTION

Trajectory data generated by moving vehicles becomes more

and more important, and it provides us an unprecedented

opportunity to understand the city dynamics and reveal the

hidden social and economic phenomena. In this paper, we

focus on storing and analyzing these large-scale real digital

trajectory data by the traditional RDB(Relational Database

Systems), PostgreSQL.

SQL Window Aggregate Functions perform common anal-

yses such as ranking, percentiles, moving averages and cumu-

lative in a ﬂexible, intuitive and efﬁcient manner, overcoming

shortcomings of the traditional alternatives such as grouped

queries, correlated subqueries and self-joins [2], [3]. As one

of the most useful standardized extensions to SQL since the

SQL:2003 standard, Window Aggregate Functions have been

widely implemented in most of the major commercial and

open-source relational database systems (e.g. Oracle, DB2,

SQL Server, Teradata, Pivotal Greenplum and PostgreSQL),

as well as in some emerging Big Data systems (e.g. Google

Tenzing, SAP HANA, Amazon Redshift, Pivotal HAWQ and

Cloudera Impala). With the Window Aggregate Functions, we

can easily tackle with the large-scale trajectories to mine the

hidden social patterns and phenomena.

In current database systems [4], [5], in principle a Window

Aggregate Function is evaluated over the windowed table in

a two-phase manner. In the ﬁrst phase, the windowed table

is reordered into a set of physical window partitions, each

of which has a distinct value of the PARTITION BY key

and is sorted on the ORDER BY key. The generated window

partitions are pipelined into the second phase, where the

Window Aggregate Function is sequentially invoked for each

row over its window frame within each window partition.

While existing techniques [4], [6], [7], [8] are available to

optimize the table reordering operation in the ﬁrst phase, there

are rare previous studies investigating how to save the costs of

window function calls in the second phase, which is exactly

we want to set off in this system.

In this demo, we illustrate our design of a new data analysis

system, PGWinFunc, which is implemented by extending Post-

greSQL. In this system, the user submits his SQL query with

Window Aggregate Function and obtains the analysis results

by BAIDU MAP [1] visualization technology. In PGWinFunc,

user gets the intuitive visual insights of the analysis results

instead of the traditional relational tables. These visualizations

help users quickly understand the city dynamics and reveal the

hidden social and economic phenomena.

II. FRAMEWORK

The framework of the PGWinFunc system is illustrated

in Figure 1. The PGWinFunc system is divided into two

components: online part and ofﬂine part.

Some preprocess work, including map matching and loading

data into the PostgreSQL table, are conducted in the ofﬂine

mode. The component of map-matching is to map the LBS

data onto the road-network with the most simple map matching

algorithm(we map the GPS point onto the nearest road).

For this application, we design the trajectory schema with

six basic attributes: year, month, day, hour, minute and second.

We also add some statistic attributes into this table, including

the car numbers, the average speed, the max speed and so

on. Finally, we load the trajectory data into the table in

PostgreSQL database.

In the online part, we execute users’ SQL query and

return the results by using BAIDU MAP visualization. The

PGWinFunc system accepts the request from the UI(User

下载后可阅读完整内容，剩余3页未读，立即下载

weixin_38625351

粉丝: 3

PGWinFunc：优化PostgreSQL窗口聚合函数及其在轨迹数据中的应用

Optimizing Java: Practical Techniques for Improving JVM Application Performance

Optimizing Java_Practical Techniques for Improving JVM Application Performance

Energy Hub Integration: Optimizing Electricity and Heat Market P

Secrets to Multi-Objective Optimization with fmincon: Optimizing Multiple Objective Functions ...

Application of fmincon in Engineering Design: Optimizing Structural Strength and Weight

Best Practices for Elasticsearch Data Modeling: Optimizing Search Performance and Relevance

The Application of fmincon in Machine Learning: Optimizing Model Parameters and Hyperparameters

Comprehensive Application of Linear Programming in Healthcare: Optimizing Resources and Improving ...

The Application of fmincon in Image Processing: Optimizing Image Quality and Processing Speed

The Ingenious Application of Linear Programming in Production Planning: Optimizing Output and ...

最新资源