Setting up Etcd with Kubernetes to Host Clusters with Thousands of Nodes

May 01, 2023 40 min Free

Description

This talk delves into the challenges of setting up and configuring etcd for Kubernetes clusters hosting thousands of nodes, a common requirement for AI/ML/HPC workloads and internet-scale applications. The speakers, from DataDog and Isovalent, share best practices learned from running large-scale production environments. They cover critical aspects like optimizing etcd deployments, handling disk I/O and network throughput bottlenecks, and managing the impact of API server restarts on etcd. The session also highlights common pitfalls, such as expensive list calls and inefficient client interactions, and introduces solutions like using informers and the priority and fairness mechanisms in Kubernetes to ensure control plane stability and performance.