tech talks
Sign in
Register
Open main menu
Sign in
Register
Filters
1
Tags
Speakers
Events
Sort By
Newest First
Oldest First
Title A-Z
Title Z-A
Clear All Filters
Filters
Tags
Speakers
Events
Sort By
Newest First
Oldest First
Title A-Z
Title Z-A
Clear All Filters
5 min
How Do You Scale to Billions of Fine-Tuned LLMs
MLOps World - MLOps World & Generative AI World 2024
James Dbiorin
cuda
batching
llm
large-language-models
fine-tuning
lora
inference
scalability
parameter-efficient-fine-tuning
gpu
mlops
ai
29 min
LLMs From Dream to Deployed
MLOps World - MLOps World & Generative AI World 2024
Josh Goldstein
chatbots
seldon
llm
large-language-models
machine-learning
mlops
deployment
retrieval-augmented-generation
rag
kubernetes
openai
hugging-face
gpu
33 min
Running Multiple Models on the Same GPU, on Spot Instances
MLOps World - MLOps World & Generative AI World 2024
Oscar Rovira
ml-inference
spot-instances
gpu-fractionalization
gpu
cost-optimization
generative-ai
llm
cloud-computing
aws
gcp
azure
mlops
39 min
From Idea to Production: AI Infra for Scaling LLM Apps
MLOps World - MLOps World & Generative AI World 2024
Guy Eshet
llm
ai
ai-infrastructure
llm-ops
prompt-engineering
model-deployment
gpu
data-pipelines
rag
cost-optimization
generative-ai
llm-applications
41 min
Lessons learned from scaling large language models in production
MLOps World - MLOps World & Generative AI World 2024
Matt Squire
ray-serve
large-language-models
llm
rag
mlops
gpu
performance-optimization
inference
scaling
python
fastapi
kubernetes
vm
vector-database
36 min
From ML Repository to ML Production Pipeline
MLOps World - MLOps World & Generative AI World 2024
Jakub Witkowski
Dariusz Adamczyk
production-pipelines
ml-repository
mlops
machine-learning
devops
docker
kubernetes
ci-cd
kubeflow
data-science
gpu
automation
23 min
Leverage Kubernetes To Optimize the Utilization of Your AI Accelerators
MLOps World - MLOps World & Generative AI World 2024
Nathan Beach
accelerators
kubernetes
kubernetes-engine
ai
gpu
optimization
training
inference
workloads
resource-utilization
cloud-computing
32 min
Memory Optimizations for Machine Learning
MLOps World - MLOps World & Generative AI World 2024
Tejas Chopra
model-pruning
neural-networks
cpu
data-quantization
machine-learning
llm
memory-optimization
quantization
inference
deep-learning
transformer-models
gpu
31 min
How to Run Your Own LLMs, From Silicon to Service
MLOps World - MLOps World & Generative AI World 2024
Charles Frye
llms
large-language-models
mlops
machine-learning-operations
inference
gpu
quantization
tensorrt-llm
vllm
modal-labs
model-serving
ai-engineering
24 min
Large Language Model Training and Serving at LinkedIn
MLOps World - MLOps World & Generative AI World 2024
Dre Olgiati
llm
large-language-models
ai
machine-learning
mlops
training
gpu
kubernetes
python
tensorflow
pytorch
kernels
optimization
memory-management
transformer
8 min
Streamlining AI Deployments
MLOps World - MLOps World & Generative AI World 2024
Vasilis Vagias
ai
llm
mlops
deployment
optimization
inference
compiler
pytorch
docker
gpu
api
59 min
Python Meets Heterogeneous Computing
PyCon - PyCon US 2023
William Cunningham
Santosh Kumar Radha
python
heterogeneous-computing
distributed-computing
gpu
quantum-computing
hpc
workflow-orchestration
performance-optimization
cloud-hpc
open-source-tools
data-science
machine-learning
1
2
Next ›
Last »