tech talks
Sign in Register
  • Sign in
  • Register

Tags

Speakers

Events

Sort By

Clear All Filters

Filters

Tags

Speakers

Events

Sort By

Clear All Filters
Memory Optimizations for Machine Learning
32 min

Memory Optimizations for Machine Learning

MLOps World - MLOps World & Generative AI World 2024
Tejas Chopra
model-pruning neural-networks cpu data-quantization machine-learning llm memory-optimization quantization inference deep-learning transformer-models gpu
A Practical Guide to Efficient AI
30 min

A Practical Guide to Efficient AI

MLOps World - MLOps World & Generative AI World 2024
Shelby Heinecke
ai artificial-intelligence machine-learning llm large-language-models model-optimization quantization small-language-models function-calling prompt-engineering inference model-efficiency
How to Run Your Own LLMs, From Silicon to Service
31 min

How to Run Your Own LLMs, From Silicon to Service

MLOps World - MLOps World & Generative AI World 2024
Charles Frye
llms large-language-models mlops machine-learning-operations inference gpu quantization tensorrt-llm vllm modal-labs model-serving ai-engineering
On-Device ML for LLMs: Post-Training Optimization Techniques with T5 and Beyond
29 min

On-Device ML for LLMs: Post-Training Optimization Techniques with T5 and Beyond

MLOps World - MLOps World & Generative AI World 2024
Sri Raghu Malireddi
on-device-ml llms t5 model-optimization quantization pruning layer-fusion inference-optimization latency-reduction edge-devices mlops grammarly
Creating our own Private OpenAI API
29 min

Creating our own Private OpenAI API

MLOps World - MLOps World & Generative AI World 2024
Meryem Arik Hannes Hapke
large-language-models llms private-api openai-api self-hosting mlops generative-ai inference-optimization quantization gpu-utilization api-gateway kubernetes

© 2025 Tech Talks. All rights reserved.

Privacy Policy Terms of Service Contact