Building Multimodal LLMs for Product Taxonomy at Shopify

December 09, 2024 30 min Free

Description

Kshetrajna Raghavan, Senior Staff ML Engineer at Shopify, discusses the challenges and solutions involved in building and deploying multimodal Large Language Models (LLMs) for product taxonomy at scale. The talk details how Shopify leverages vision-language models to process product images and descriptions, automatically classifying products and extracting attributes. Key aspects covered include the development of a comprehensive product taxonomy, strategies for data collection and fine-tuning LLMs, infrastructure choices for training and inference (like GCP, SkyPilot, Hugging Face, and Triton), and the engineering efforts to optimize cost and performance for millions of daily predictions. The system aims to solve problems related to product data structure, impacting areas like tax calculation, search relevancy, and recommendation systems, ultimately benefiting both Shopify and its merchants.