tdwi.org 5
Introduction to the Unified Data Warehouse/Data Lake
Introduction to the Unified Data Warehouse/
Data Lake
Data warehousing continues to evolve. As organizations collect and analyze large amounts
of disparate and diverse data, they are often looking to modernize their data warehousing
environments to support new use cases, such as powering the machine learning pipeline at the
heart of enterprise AI.
TDWI research indicates that newer data types such as machine data, text data, image data, and
other unstructured and semistructured data sources are gaining popularity for use in analytics.
Different users—such as data scientists, business analysts, and business users—want to derive
insights and take action on this data. Yet in many cases, the evolution of complex data has
outstripped a company’s ability to manage it for business value.
For years, TDWI research has tracked the modernization and evolution of data warehouse (DW)
architectures, as well as the emergence of the data lake (DL) design pattern for organizing
massive volumes of analytics data.
1
We have seen both the DW and the DL grow in popularity,
especially in the cloud. The new generation of DWs are, in fact, DLs that are designed, rst and
foremost, to govern the cleansed, consolidated, and sanctioned data used to build and train
machine learning models.
In recent years, enterprise data practitioners have seen DW and DL architectures converge into
a powerful new type of platform. Within this evolved silo-busting environment, DWs and DLs
incorporate distinct but integrated, overlapping, and interoperable architectures that include
standard functional layers. This unied DW/DL architecture continues to evolve, blurring the
architectural distinctions between these formerly discrete approaches to deploying, processing,
and managing analytics data.
One of the hallmarks of the unied DW/DL architecture is its ability to support a wider range
of data structures, end user types, and business use cases than either of its constituent micro-
architectures. This may account for the reason why 89% of respondents to this survey view the
unied DW/DL as an opportunity.
The Current State of the Data Warehouse and Data Lake
DWs have their roots in business intelligence (BI). Most DWs—whether legacy or modern—were
designed primarily for business reporting and related practices in performance management,
dashboards, self-service, and OLAP, enabled by squeaky-clean, aggregated, and transformed data.
BI remains a core use case of the unied DW/DL. As organizations strive to derive value from
their data, they are often modernizing their DW environments to support self-service, advanced
analytics, and data sharing.
Nevertheless, articial intelligence’s many use cases are the principal driver behind the evolution
of DWs into unied DW/DLs.
Initially built on the Apache Hadoop open-source data analytics platform, DLs have evolved
over the past decade to include object stores and run on public, private, hybrid, and other cloud
architectures. DLs primarily support articial intelligence (AI), machine learning (ML), and
other advanced analytics that may require a wider range of unstructured and semistructured
data types, may scale to much larger volumes of stored data, and often handle more complex and
dynamic analytics workloads than the traditional DW.
In recent years,
enterprise data
practitioners have
brought DW and DL
architectures together
into a powerful new
type of platform.
1
See, for instance, the 2018 TDWI Best Practices Report: Multiplatform Data Architectures, available at tdwi.org/bpreports.