Autoscaling Can Be Reliable: Running Cluster Autoscaler in Prod
May 01, 2023
25 min
Free
Description
This talk focuses on the reliability of running Cluster Autoscaler in production environments. Maciej Pytel, a software engineer at Google with extensive experience on the GKE team, shares insights on managing Kubernetes cluster nodes across various cloud providers. The presentation covers essential metrics and logs for monitoring and debugging common issues, such as nodes failing to boot, resource quota exhaustion, and misconfigurations that can impact scaling operations. It provides practical advice on identifying and resolving these challenges to ensure reliable autoscaling.