Investigating the Evolution of Evaluation from Model Training to GenAI Inference

December 09, 2024 39 min Free

MLOps World - MLOps World & Generative AI World 2024

generative-ai large-language-models llm evaluation-metrics model-training fine-tuning ai-ethics bias-detection toxicity-detection natural-language-processing machine-learning

Description

This session explores the evolution of evaluation techniques in machine learning, from traditional model training through fine-tuning to the current challenges of assessing large language models (LLMs) and generative AI systems. We'll trace the progression from simple metrics like accuracy and F1 score to sophisticated automated evaluation systems that can generate criteria and assertions. The session will culminate in an in-depth look at cutting-edge approaches like EvalGen, which use LLMs to assist in creating aligned evaluation criteria while addressing phenomena like criteria drift.

About the Speaker: Anish Shah is a leading expert in AI and ML at Weights & Biases, specializing in the optimization of large language models for complex applications. With extensive experience in fine-tuning and model evaluation, Anish has contributed to significant advancements in the field of AI research and presented at many conferences and events.

Up Next

46 min

generative-ai llm llm-ops human-evaluation llm-as-judge security rag agents tool-use

31 min

Beyond the Model Zoo: Optimizing Foundation Models for Your Application

MLOps World - MLOps World & Generative AI World 2024

Salma Mayorquin

data-synthesis ai-optimization foundation-models large-language-models llm-evaluation parameter-efficient-fine-tuning peft lora mlops machine-learning artificial-intelligence model-deployment

32 min

The Secret Sauce for Deploying LLM Applications into Production

MLOps World - MLOps World & Generative AI World 2024

Josh Reini

evaluation-frameworks truelens ai-adoption llm generative-ai production-deployment rag langchain evaluation-data model-monitoring observability enterprise-ai

36 min

Evaluating LLMs and RAG Pipelines at Scale

MLOps World - MLOps World & Generative AI World 2024

Eric Korman

valor llm rag mlops llm-ops pipelines language-models natural-language-processing data-management model-training model-deployment

Back to Home

Investigating the Evolution of Evaluation from Model Training to GenAI Inference

Description

Up Next

Towards Robust GenAI: Techniques for Evaluating Enterprise LLM Applications

Evaluation Techniques for Large Language Models

Measuring the Minds of Machines: Evaluating Generative AI Systems

Beyond the Model Zoo: Optimizing Foundation Models for Your Application

The Secret Sauce for Deploying LLM Applications into Production

Evaluating LLMs and RAG Pipelines at Scale