Serving GenAI Workloads At Scale With LitServe
December 05, 2024
41 min
Free
lit-serve
pytorch-lightning
api-serving
dynamic-batching
genai
ai
llm
python
autoscaling
rag
openai-api
docker
Description
This talk explores serving AI models with high throughput at scale, covering techniques like dynamic batching and autoscaling for complex GenAI workloads. It introduces LitServe, an open-source library from Lightning AI, designed to serve any AI model efficiently with high throughput. The discussion delves into concepts like Retrieval Augmented Generation (RAG) and demonstrates how LitServe can be used to build scalable and robust AI serving solutions, including serving LLMs with an OpenAI API compatible interface. Examples are provided on how to integrate LitServe into existing applications, package them into Docker images, and deploy them on cloud platforms or on-premises.