Fast Data Loading for Deep Learning Workloads with LakeFS Mount
December 05, 2024
30 min
Free
data-versioning
lakefs
data-loading
git-like
data-consistency
data-lake
deep-learning
mlops
data-management
object-storage
data-pipelines
tensorflow
Description
This talk explores how LakeFS Mount can significantly speed up data loading for deep learning workloads. Amit Kesarwani demonstrates how mounting datasets as a local file system, coupled with intelligent prefetching of metadata and data, optimizes GPU utilization and eliminates download delays. The presentation covers LakeFS's Git-like data versioning capabilities, including branching, committing, and merging, and how LakeFS Mount integrates seamlessly with existing workflows, allowing for rapid iteration and reproducible AI/ML experiments. The demo showcases training a TensorFlow model locally with data sourced from LakeFS, highlighting the ease of reproducibility by cloning the repository and remounting the dataset.