Triage a payments outage
A deploy went live and checkout errors spiked. Use the alert feed and logs to find the root cause and pick a safe action.
Topic intro
Incident response is about fast, safe recovery. Find the first cause, stop the failure, then capture what happened. Use alerts to spot the impact and logs to confirm the cause.
Alert feed
p95 jumped from 240ms to 1.8s in 10 minutes
Service: checkout
5xx rate at 7.2 percent
Service: payments
backlog 1,200 pending jobs
Service: payments
Mission brief
You are on call. Checkout is timing out and payments are failing. Work fast, keep changes low risk, and capture notes for the incident report.
Log explorer
Filter by service, level, or search terms.
Decision point
Root cause
Immediate action
Hint ladder
Hint 1
The first spike appears right after a deploy.
Hint 2
Payment signatures fail even before the queue backs up.
Hint 3
Look for anything about a secret or config in the logs.
Incident note
Write a short summary. Include the cause, the fix, and the prevention step.