Memory Optimizations for Machine Learning

December 05, 2024 32 min Free

Description

This talk explores memory optimization strategies crucial for deploying Machine Learning models, especially Large Language Models (LLMs). It delves into the memory footprint of ML data structures and algorithms, detailing techniques like data quantization and model pruning. A significant focus is placed on optimizing LLM inferencing, discussing factors affecting their memory usage and practical strategies for memory conservation without compromising performance. The presentation also touches upon hardware considerations and real-world examples of memory optimization in ML.