tech talks
Sign in Register
  • Sign in
  • Register

Tags

Speakers

Events

Sort By

Clear All Filters

Filters

Tags

Speakers

Events

Sort By

Clear All Filters
How Do You Scale to Billions of Fine-Tuned LLMs
5 min

How Do You Scale to Billions of Fine-Tuned LLMs

MLOps World - MLOps World & Generative AI World 2024
James Dbiorin
cuda batching llm large-language-models fine-tuning lora inference scalability parameter-efficient-fine-tuning gpu mlops ai
LLMs From Dream to Deployed
29 min

LLMs From Dream to Deployed

MLOps World - MLOps World & Generative AI World 2024
Josh Goldstein
chatbots seldon llm large-language-models machine-learning mlops deployment retrieval-augmented-generation rag kubernetes openai hugging-face gpu
Running Multiple Models on the Same GPU, on Spot Instances
33 min

Running Multiple Models on the Same GPU, on Spot Instances

MLOps World - MLOps World & Generative AI World 2024
Oscar Rovira
ml-inference spot-instances gpu-fractionalization gpu cost-optimization generative-ai llm cloud-computing aws gcp azure mlops
From Idea to Production: AI Infra for Scaling LLM Apps
39 min

From Idea to Production: AI Infra for Scaling LLM Apps

MLOps World - MLOps World & Generative AI World 2024
Guy Eshet
llm ai ai-infrastructure llm-ops prompt-engineering model-deployment gpu data-pipelines rag cost-optimization generative-ai llm-applications
Lessons learned from scaling large language models in production
41 min

Lessons learned from scaling large language models in production

MLOps World - MLOps World & Generative AI World 2024
Matt Squire
ray-serve large-language-models llm rag mlops gpu performance-optimization inference scaling python fastapi kubernetes vm vector-database
From ML Repository to ML Production Pipeline
36 min

From ML Repository to ML Production Pipeline

MLOps World - MLOps World & Generative AI World 2024
Jakub Witkowski Dariusz Adamczyk
production-pipelines ml-repository mlops machine-learning devops docker kubernetes ci-cd kubeflow data-science gpu automation
Leverage Kubernetes To Optimize the Utilization of Your AI Accelerators
23 min

Leverage Kubernetes To Optimize the Utilization of Your AI Accelerators

MLOps World - MLOps World & Generative AI World 2024
Nathan Beach
accelerators kubernetes kubernetes-engine ai gpu optimization training inference workloads resource-utilization cloud-computing
Memory Optimizations for Machine Learning
32 min

Memory Optimizations for Machine Learning

MLOps World - MLOps World & Generative AI World 2024
Tejas Chopra
model-pruning neural-networks cpu data-quantization machine-learning llm memory-optimization quantization inference deep-learning transformer-models gpu
How to Run Your Own LLMs, From Silicon to Service
31 min

How to Run Your Own LLMs, From Silicon to Service

MLOps World - MLOps World & Generative AI World 2024
Charles Frye
llms large-language-models mlops machine-learning-operations inference gpu quantization tensorrt-llm vllm modal-labs model-serving ai-engineering
Large Language Model Training and Serving at LinkedIn
24 min

Large Language Model Training and Serving at LinkedIn

MLOps World - MLOps World & Generative AI World 2024
Dre Olgiati
llm large-language-models ai machine-learning mlops training gpu kubernetes python tensorflow pytorch kernels optimization memory-management transformer
Streamlining AI Deployments
8 min

Streamlining AI Deployments

MLOps World - MLOps World & Generative AI World 2024
Vasilis Vagias
ai llm mlops deployment optimization inference compiler pytorch docker gpu api
Python Meets Heterogeneous Computing
59 min

Python Meets Heterogeneous Computing

PyCon - PyCon US 2023
William Cunningham Santosh Kumar Radha
python heterogeneous-computing distributed-computing gpu quantum-computing hpc workflow-orchestration performance-optimization cloud-hpc open-source-tools data-science machine-learning
1 2 Next › Last »

© 2025 Tech Talks. All rights reserved.

Privacy Policy Terms of Service Contact