tech talks
Sign in
Register
Open main menu
Sign in
Register
Filters
1
Tags
Speakers
Events
Sort By
Newest First
Oldest First
Title A-Z
Title Z-A
Clear All Filters
Filters
Tags
Speakers
Events
Sort By
Newest First
Oldest First
Title A-Z
Title Z-A
Clear All Filters
32 min
Memory Optimizations for Machine Learning
MLOps World - MLOps World & Generative AI World 2024
Tejas Chopra
model-pruning
neural-networks
cpu
data-quantization
machine-learning
llm
memory-optimization
quantization
inference
deep-learning
transformer-models
gpu
30 min
A Practical Guide to Efficient AI
MLOps World - MLOps World & Generative AI World 2024
Shelby Heinecke
ai
artificial-intelligence
machine-learning
llm
large-language-models
model-optimization
quantization
small-language-models
function-calling
prompt-engineering
inference
model-efficiency
31 min
How to Run Your Own LLMs, From Silicon to Service
MLOps World - MLOps World & Generative AI World 2024
Charles Frye
llms
large-language-models
mlops
machine-learning-operations
inference
gpu
quantization
tensorrt-llm
vllm
modal-labs
model-serving
ai-engineering
29 min
On-Device ML for LLMs: Post-Training Optimization Techniques with T5 and Beyond
MLOps World - MLOps World & Generative AI World 2024
Sri Raghu Malireddi
on-device-ml
llms
t5
model-optimization
quantization
pruning
layer-fusion
inference-optimization
latency-reduction
edge-devices
mlops
grammarly
29 min
Creating our own Private OpenAI API
MLOps World - MLOps World & Generative AI World 2024
Meryem Arik
Hannes Hapke
large-language-models
llms
private-api
openai-api
self-hosting
mlops
generative-ai
inference-optimization
quantization
gpu-utilization
api-gateway
kubernetes