Skip to main content
By the end of this guide, you’ll understand what happens when a monitor fails, how to investigate an incident, and how to resolve it.
  • At least one monitor running — see First HTTP monitor
  • Familiarity with the Dashboard or CLI

When an incident appears

An incident appears in your Dashboard when a monitor’s check failures match its incident policy. Here’s the typical flow:
1

Checks start failing

A monitor’s check returns an error or fails an assertion. DevHelm starts watching the situation.
2

Trigger rule matched

After enough consecutive failures (default: 2), the trigger rule fires. The incident enters TRIGGERED status.
3

Multi-region confirmation

If the monitor runs from multiple regions, DevHelm waits for confirmation from additional regions. This reduces false positives from single-region network issues.
4

Incident confirmed

The incident moves to CONFIRMED. Alerts fire through your notification policies.

Investigate the incident

View incident details

devhelm incidents list
devhelm incidents get <incident-id>
The incident detail shows:
  • SeverityDOWN, DEGRADED, or MAINTENANCE
  • Affected regions — which probe regions are seeing failures
  • Timeline — every status change and update since detection
  • Trigger rule — which rule fired and why
  • Duration — how long the incident has been active

Check the timeline

The incident timeline shows exactly what happened and when:
devhelm incidents get <incident-id>
Look at the updates array — each entry records a status change, user update, or system event with a timestamp.

View failing check results

Drill into the monitor’s recent check results to understand what’s failing:
devhelm monitors checks <monitor-id> --limit 10
Look for assertion failures, HTTP error codes, timeouts, or connection errors.

Resolve the incident

Automatic resolution

Most incidents resolve automatically. When the monitor starts passing again:
  1. Consecutive passing checks accumulate (default: 2 required)
  2. Enough regions must be healthy (default: 2)
  3. The incident moves to RESOLVED
  4. A cooldown period prevents immediate reopening (default: 5 minutes)

Manual resolution

If you’ve fixed the issue and don’t want to wait for automatic recovery:
devhelm incidents resolve <incident-id> \
  --body "Deployed hotfix to restore API endpoint"

Add context

Post updates during investigation to keep your team informed:
devhelm incidents update <incident-id> \
  --body "Investigating — seeing 503s from upstream dependency" \
  --notify

Key concepts to know

ConceptWhat to know
Incident policyControls when incidents open (trigger rules) and close (recovery policy)
ConfirmationMulti-region validation that reduces false positives
CooldownQuiet period after resolution that prevents flapping
ReopeningIf the monitor fails again after cooldown, the same incident reopens

Next steps

Incidents overview

Full lifecycle, statuses, severities, and sources.

Incident policies

Customize trigger rules and recovery behavior.

First alert

Get notified when incidents happen.

Incidents guide

Day-to-day incident management workflows.