Data Versioning in Generative AI: A Pathway to Cost-effective ML

May 16, 2024 36 min Free

Description

In this talk, Dmitry Petrov, CEO of DVC, explores the unique challenges and solutions for data versioning in Generative AI workflows. He discusses strategies for minimizing processing time and API calls to external models like ChatGPT for cost savings. The talk also covers effective methodologies for sharing datasets among ML researchers to promote seamless collaboration and examines how Generative AI has transformed data versioning, including annotations and embeddings.

Key takeaways include:
1. Differences in data management for Generative AI vs. traditional ML.
2. Saving costs on compute and API calls through data versioning.
3. Improving team collaboration via dataset sharing.
4. Efficiently versioning annotations, embeddings, and auto-labels alongside data.