The Who, What, and Why of Data Lake Table Formats
May 16, 2024
48 min
Free
data-lakehouse
table-formats
apache-hudi
delta-lake
data-partitioning
z-order-sorting
schema-evolution
data-lake
apache-iceberg
data-management
data-engineering
metadata
Description
This presentation delves into the intricacies of data lake table formats, explaining what a data lakehouse is and the evolution of data storage from traditional databases to data lakes and finally to data lakehouses. The talk focuses on three primary table formats: Apache Iceberg, Apache Hudi, and Delta Lake, detailing their architectures, how they manage data and metadata, and their respective strengths and weaknesses. It covers key features such as ACID transactions, schema evolution, copy-on-write vs. merge-on-read strategies, Z-ordering for data optimization, and partitioning evolution. The speaker also touches upon format interoperability and the importance of these formats in enabling modern data analytics and AI/ML workloads.