Running Multiple Models on the Same GPU, on Spot Instances
May 16, 2024
33 min
Free
ml-inference
spot-instances
gpu-fractionalization
gpu
cost-optimization
generative-ai
llm
cloud-computing
aws
gcp
azure
mlops
Description
Oscar Rovira, Co-founder of Mystic AI, discusses two key cost optimization strategies for running ML inference in the cloud: GPU fractionalization and the use of Spot Instances. He explains what GPU fractionalization is, its benefits and limitations, and the value of using Spot Instances along with their potential challenges. The talk includes examples of how combining these techniques can increase throughput and reduce costs for Generative AI applications.