From Idea to Production: AI Infra for Scaling LLM Apps

May 16, 2024 39 min Free

MLOps World - MLOps World & Generative AI World 2024

llm ai ai-infrastructure llm-ops prompt-engineering model-deployment gpu data-pipelines rag cost-optimization generative-ai llm-applications

Description

AI applications must adapt to new models, evolving workflows, and complex debugging challenges. This talk addresses the critical AI infrastructure needed to scale LLM applications from beta to production. It covers prompt management, data pipelines, Retrieval Augmented Generation (RAG), cost optimization, and GPU availability. Join Guy Eshet to explore strategies for building adaptability into LLM applications, focusing on addressing the challenges of building Generative AI and LLM apps, designing for adaptability, and preparing applications for future model advancements.

Up Next

59 min

ray-serve large-language-models llm rag mlops gpu performance-optimization inference scaling python fastapi kubernetes vm vector-database

Back to Home

From Idea to Production: AI Infra for Scaling LLM Apps

Description

Up Next

LLMidas' Touch; Safely Adopting GenAI for Production Use Cases

Mastering Enterprise-Grade LLM Deployment: Overcoming Production Challenges

What Does it Mean to be an AI Engineer in the World of GenAI?

Running prompts at CI does not make your GenAI app enterprise ready

From Paper to Production in 30 Minutes: Implementing code-less Gen AI Research

Lessons learned from scaling large language models in production