Avoid ML OOps with ML Ops: A modular approach to scaling Forethought’s E2E ML Platform
Description
Salina Wu discusses Forethought's journey in building a scalable, cost-efficient, and automated ML Ops platform. The presentation details a modular approach to improving their end-to-end ML lifecycle, covering key areas such as streamlining ML training with SageMaker, efficient model serving with SageMaker Serverless and Multi-Model Endpoints, orchestrating ML processes with Dagster, centralizing feature engineering with Spark, and building model management tooling with Retool. The talk highlights the transition from a rudimentary v0 ML architecture to an enhanced v1, with insights into their v2 vision including automated re-training and LLM support. Key takeaways include understanding ML infrastructure components, identifying and addressing bottlenecks, and migrating ML infrastructure in stages.