Disaster Recovery: Bringing Back Production from Scratch in Under 1 Hour Using KOps, ArgoCD and Velero
May 01, 2023
37 min
Free
kubernetes
disaster-recovery
gitops
argocd
kops
velero
cloud-native
devops
cluster-management
infrastructure-as-code
aws
sre
Description
This talk shares a real-life incident where a production Kubernetes cluster failed due to misconfiguration, leading to a complete rebuild. The presenter details how investments in GitOps, ArgoCD, kOps, and infrastructure-as-code enabled them to bring production back online from scratch in under an hour. The presentation covers the challenges faced, lessons learned, tools that didn't perform as expected, and strategies for improving disaster recovery plans and practices.