The State and Future of Cloud-Native Model Serving

May 01, 2023 40 min Free

Description

KServe is a cloud-native open-source project for serving production ML models built on CNCF projects like Knative and Istio. This talk provides an update on KServe's progress towards 1.0, covering recent developments like ModelMesh and InferenceGraph, and its future roadmap. It delves into the Kubernetes design patterns used in KServe to achieve core ML inference capabilities, its design philosophy, and how it integrates with the CNCF ecosystem. The discussion highlights the InferenceService interface, which encapsulates networking, lifecycle, and server configurations, enabling seamless integration of serverless capabilities with model servers such as TensorFlow Serving, TorchServe, and Triton on CPU/GPU. The talk also explores advanced scenarios demonstrating how to quickly set up KServe for production-ready deployments with scalability, security, observability, and auto-scaling acceleration, leveraging CNCF projects like Knative, Istio, SPIFFE/SPIRE, OpenTelemetry, and Fluid. The presentation also touches upon the challenges and solutions for serving large language models (LLMs) like Bloomberg GPT, including distributed inference techniques and performance optimizations.

The State and Future of Cloud-Native Model Serving

Description

Up Next

Low-latency Model Inference in Finance: A Close Look at Seldon V2

LLMs From Dream to Deployed

Hardening Kubeflow Security for Enterprise Environments

Generative AI Infrastructure at Lyft

MLOps on Highly Sensitive Data - Strict Confinement, Confidential Computing, and Tokenization Protecting Privacy

Serving GenAI Workloads At Scale With LitServe