Colocate Hadoop YARN with Kubernetes to Save Massive Costs on Big Data
May 01, 2023
43 min
Free
kubernetes
hadoop
yarn
big-data
cost-optimization
resource-management
cgroups
kernel
container-runtimes
scheduler
kubelet
data-infrastructure
Description
This presentation details how Shopee collocated Hadoop YARN with Kubernetes to significantly reduce big data infrastructure costs. The talk explores the challenges of low resource utilization on Kubernetes and the complexities of co-locating online services with offline jobs. It delves into how custom extensions to the Linux Kernel, Container Runtime, Kubernetes Scheduler, and Kubelet were used to improve resource utilization while ensuring the stability of online services, and how restrictions on offline job scheduling were overcome to achieve substantial cost savings.