Skip to main content
An incident represents an ongoing problem detected by a monitor, reported by a third-party service, or created manually. DevHelm automates the full lifecycle from detection through confirmation, alerting, and recovery.
Define this in code. Manage incident policies as part of your monitoring-as-code workflow: YAML format · Terraform · CI/CD patterns

Incident lifecycle

Every incident follows a predictable state machine:
Check fails → Trigger rule fires → WATCHING → Regions confirm → TRIGGERED → Multi-region confirmed → CONFIRMED

                                                                                                   Alerts sent

                                                                                              Checks start passing

                                                                                              Recovery policy met → RESOLVED

                                                                                                                  Cooldown period
  1. Detection — A monitor’s checks fail and match a trigger rule
  2. Watching — The system observes the failure while waiting for enough data to confirm
  3. Triggered — The trigger rule threshold is met in at least one region
  4. Confirmed — The confirmation policy validates failures across multiple regions, promoting the incident to active status and firing alerts
  5. Resolved — The recovery policy detects consecutive passing checks across enough regions, or a user resolves manually
  6. Cooldown — A configurable quiet period prevents the same monitor from immediately reopening a new incident

Statuses

StatusMeaning
WATCHINGFailure detected but not yet confirmed — waiting for additional check results
TRIGGEREDTrigger rule threshold met in at least one region
CONFIRMEDActive incident — failure confirmed across regions, alerts have been sent
RESOLVEDIncident closed — monitor recovered or manually resolved

Severities

Each incident carries a severity that determines urgency and drives notification policy matching.
SeverityWhen it’s used
DOWNComplete failure — endpoint unreachable or critical assertions failing
DEGRADEDPartial failure — response time thresholds exceeded or non-critical assertions failing
MAINTENANCEPlanned downtime — created by maintenance windows
A single monitor can have trigger rules at different severities. For example, a response time threshold might open a DEGRADED incident, while consecutive failures open a DOWN incident.

Sources

Incidents can originate from four different sources:
SourceCreated by
MONITORSAutomatic detection when a monitor’s check results match its trigger rules
MANUALUser-created via Dashboard, CLI, or API for issues not covered by automation
STATUS_DATAPropagated from a third-party service incident when you track it as a dependency
RESOURCE_GROUPAggregated from multiple monitors or services within a resource group

Resolution reasons

When an incident resolves, DevHelm records the reason:
ReasonMeaning
AUTO_RECOVEREDMonitor checks started passing and met the recovery policy
MANUALResolved by a user via Dashboard, CLI, or API
AUTO_RESOLVEDResolved by system logic (e.g., an upstream Status Data incident resolved)

Reopening

If a monitor fails again after an incident resolves and the cooldown period has passed, DevHelm reopens the existing incident rather than creating a new one. The reopenCount field tracks how many times an incident has been reopened. Reopening behavior interacts with escalation chains — you can configure whether escalation restarts from the beginning or resumes from the current step.

Incident fields

Key fields on every incident object:
FieldTypeDescription
idUUIDUnique incident identifier
statusstringCurrent lifecycle status
severitystringDOWN, DEGRADED, or MAINTENANCE
sourcestringHow the incident was created
titlestringShort summary (auto-generated or user-provided)
monitorIdUUIDMonitor that triggered the incident (null for manual/service incidents)
affectedRegionsstring[]Probe regions that observed the failure
reopenCountintegerNumber of times the incident was reopened
resolutionReasonstringHow the incident was resolved
startedAtdatetimeWhen the incident was first detected
confirmedAtdatetimeWhen multi-region confirmation completed
resolvedAtdatetimeWhen the incident was resolved
cooldownUntildatetimeEnd of cooldown period after resolution
shortlinkstringShort URL to the incident detail page
For the full incident schema including all fields, see the API Reference.

Next steps

Incident policies

Configure trigger rules, confirmation, and recovery behavior.

Manual incidents

Create incidents for issues not caught by automated monitoring.

Alerting overview

Route incident notifications to your team.

Incident timeline

Track status changes and event history.