Troubleshooting
NudgeBee's troubleshooting dashboard gives you a real-time view of events, errors, and anomalies across all your connected Kubernetes clusters. Instead of switching between multiple monitoring tools, you get a single pane of glass — powered by the Semantic Knowledge Graph — that correlates metrics, logs, traces, and code to help you find the root cause of issues faster, reducing MTTR from hours to minutes.
What You Can Do Here
- Monitor real-time events — See pod crashes, OOM kills, deployment failures, and other Kubernetes events as they happen.
- AI-powered root cause analysis with NuBi — When an LLM is connected, NuBi (the SRE AI Agent) and NudgeBee's pre-built AI agents automatically analyze incidents, correlate signals across the Semantic Knowledge Graph, and suggest root causes in plain language.
- Explore the Semantic Knowledge Graph — Visualize your infrastructure dependencies and trace how issues propagate across services. See Semantic Knowledge Graph.
- Configure alerting rules — Set up custom alerting rules to get notified when specific conditions are met. See Alerting.
- Use playbooks — Apply predefined troubleshooting playbooks for common scenarios. See Playbook Catalog.
info
Prerequisites: To use troubleshooting features, you need at least one Kubernetes cluster connected and an observability source integrated. For AI-powered analysis, an LLM connection is also needed.
Watch a Walkthrough
What You Will Find in This Section
- Alerting — Configure custom alerting rules and thresholds.
- Playbook Catalog — Browse and apply predefined troubleshooting playbooks for common Kubernetes issues.