Building a Multimodal RAG: A Step-by-Step Guide
December 09, 2024
1h 5m
Free
rag
gemini
vertex-ai
generative-ai
llm
multimodal-ai
ai-ml
google-cloud
embeddings
vector-database
data-processing
python
Description
This talk provides a step-by-step guide for AI/ML practitioners on how to build a multimodal Retrieval Augmented Generation (RAG) system. Ivan Nardini and Holt Skinner from Google Cloud demonstrate how to handle various data modalities like PDFs, images, audio, and video, and integrate them into a RAG pipeline using Google Cloud's Vertex AI and Gemini API. The presentation covers key stages from prototyping to building an MVP, focusing on data preprocessing, embedding generation, retrieval mechanisms, and evaluation metrics for multimodal RAG applications.