Serving GenAI Workloads At Scale With LitServe

December 05, 2024 41 min Free

Description

This talk explores serving AI models with high throughput at scale, covering techniques like dynamic batching and autoscaling for complex GenAI workloads. It introduces LitServe, an open-source library from Lightning AI, designed to serve any AI model efficiently with high throughput. The discussion delves into concepts like Retrieval Augmented Generation (RAG) and demonstrates how LitServe can be used to build scalable and robust AI serving solutions, including serving LLMs with an OpenAI API compatible interface. Examples are provided on how to integrate LitServe into existing applications, package them into Docker images, and deploy them on cloud platforms or on-premises.