Building a Multimodal RAG: A Step-by-Step Guide

This talk provides a step-by-step guide for AI/ML practitioners on how to build a multimodal Retrieval Augmented Generation (RAG) system. Ivan Nardini and Holt Skinner from Google Cloud demonstrate how to handle various data modalities like PDFs, images, audio, and video, and integrate them into a RAG pipeline using Google Cloud's Vertex AI and Gemini API. The presentation covers key stages from prototyping to building an MVP, focusing on data preprocessing, embedding generation, retrieval mechanisms, and evaluation metrics for multimodal RAG applications.

Building a Multimodal RAG: A Step-by-Step Guide

Description