Efficiently Fine-Tune And Serve Your Own LLMs

May 16, 2024 42 min Free

MLOps World - MLOps World & Generative AI World 2024

Alex Sherstinsky

llm-fine-tuning predibase ludwig lorax large-language-models lora parameter-efficient-fine-tuning peft transformer-models mistral-7b model-serving inference

Description

This talk explores the process of efficiently fine-tuning and serving open-source Large Language Models (LLMs). Alex Sherstinsky introduces LoRA (Low-Rank Adaptation) as a parameter-efficient fine-tuning technique that allows for customization of LLMs with significantly fewer trainable parameters. The presentation covers the challenges of using commercial LLMs like GPT-4, such as cost and lack of ownership, and advocates for the benefits of fine-tuning open-source models for specific tasks. It delves into the technical aspects of fine-tuning using frameworks like Ludwig and efficient model serving with Lorax, demonstrating a practical code walkthrough on fine-tuning three adapters for customer support scenarios using the Mistral 7B model. The talk highlights how fine-tuned models can achieve comparable or superior performance to commercial LLMs at a fraction of the cost, with a focus on practical applications and the importance of model serving in production environments.

Up Next

LLM Fine-Tuning for Modern AI Teams: How One E-Commerce Unicorn Cut Inference Cost by 90%

LLM Fine-Tuning for Modern AI Teams: How One E-Commerce Unicorn Cut Inference Cost by 90%

MLOps World - MLOps World & Generative AI World 2024

Emmanuel Turlay

inference-cost data-preparation mistral-7b gpt-3.5 cost-reduction llm fine-tuning ai machine-learning e-commerce natural-language-processing model-evaluation

Mastering LLM Fine-Tuning

Mastering LLM Fine-Tuning

MLOps World - MLOps World & Generative AI World 2024

data-generation model-merging supervise-fine-tuning parameter-efficient-fine-tuning peft dpo llm fine-tuning machine-learning artificial-intelligence lora

How Do You Scale to Billions of Fine-Tuned LLMs

How Do You Scale to Billions of Fine-Tuned LLMs

MLOps World - MLOps World & Generative AI World 2024

cuda batching llm large-language-models fine-tuning lora inference scalability parameter-efficient-fine-tuning gpu mlops ai

Beyond the Model Zoo: Optimizing Foundation Models for Your Application

Beyond the Model Zoo: Optimizing Foundation Models for Your Application

MLOps World - MLOps World & Generative AI World 2024

Salma Mayorquin

data-synthesis ai-optimization foundation-models large-language-models llm-evaluation parameter-efficient-fine-tuning peft lora mlops machine-learning artificial-intelligence model-deployment

LLM Economics: The Cost of Leveraging Large Language Models

LLM Economics: The Cost of Leveraging Large Language Models

MLOps World - MLOps World & Generative AI World 2024

cost-analysis open-source-llm commercial-llm api-costs llm large-language-models generative-ai rag fine-tuning model-deployment inference-cost python

Evaluation Techniques for Large Language Models

Evaluation Techniques for Large Language Models

MLOps World - MLOps World & Generative AI World 2024

functional-correctness similarity-metrics large-language-models llm ai machine-learning benchmarking nlp natural-language-processing hugging-face prompt-engineering