Serving GenAI Workloads At Scale With LitServe

December 05, 2024 41 min Free

MLOps World - MLOps World & Generative AI World 2024

lit-serve pytorch-lightning api-serving dynamic-batching genai ai llm python autoscaling rag openai-api docker

Description

This talk explores serving AI models with high throughput at scale, covering techniques like dynamic batching and autoscaling for complex GenAI workloads. It introduces LitServe, an open-source library from Lightning AI, designed to serve any AI model efficiently with high throughput. The discussion delves into concepts like Retrieval Augmented Generation (RAG) and demonstrates how LitServe can be used to build scalable and robust AI serving solutions, including serving LLMs with an OpenAI API compatible interface. Examples are provided on how to integrate LitServe into existing applications, package them into Docker images, and deploy them on cloud platforms or on-premises.

Up Next

41 min

Lessons learned from scaling large language models in production

MLOps World - MLOps World & Generative AI World 2024

Matt Squire

ray-serve large-language-models llm rag mlops gpu performance-optimization inference scaling python fastapi kubernetes vm vector-database

39 min

Serving GenAI Workloads At Scale With LitServe

Description

Up Next

Lessons learned from scaling large language models in production

From Idea to Production: AI Infra for Scaling LLM Apps

LLMs From Dream to Deployed

Generative AI Infrastructure at Lyft

Customizable RAG Workflows with your Own Data

How to Run Your Own LLMs, From Silicon to Service