Running Multiple Models on the Same GPU, on Spot Instances

Oscar Rovira, Co-founder of Mystic AI, discusses two key cost optimization strategies for running ML inference in the cloud: GPU fractionalization and the use of Spot Instances. He explains what GPU fractionalization is, its benefits and limitations, and the value of using Spot Instances along with their potential challenges. The talk includes examples of how combining these techniques can increase throughput and reduce costs for Generative AI applications.

Running Multiple Models on the Same GPU, on Spot Instances

Description