Intuit’s Customer Centric Observability Journey Using AIOps
Description
Intuit, running ~2500 services on Kubernetes, prioritizes operational excellence. Despite investments in metrics, logs, and events, a gap existed in identifying customer impact and causal services to improve Mean Time To Detect (MTTD) and Mean Time To Resolve (MTTR). This talk focuses on using Numaproj, Intuit's open-source project, and other CNCF technologies to address the challenge of analyzing massive data volumes in real-time. Numaproj, comprising Numaflow (stream processing) and Numalogic (ML models), enables real-time data collection, processing, and analysis, computing normalized anomaly scores for every data point. These scores significantly reduce noise, improve signal-to-noise ratio, and directly trigger incidents. Intuit's AIOps platform now detects incidents with ~98% confidence, reducing MTTD from over 30 minutes to less than 3 minutes.