Incident lifecycle
Every incident follows a predictable state machine:- Detection — A monitor’s checks fail and match a trigger rule
- Watching — The system observes the failure while waiting for enough data to confirm
- Triggered — The trigger rule threshold is met in at least one region
- Confirmed — The confirmation policy validates failures across multiple regions, promoting the incident to active status and firing alerts
- Resolved — The recovery policy detects consecutive passing checks across enough regions, or a user resolves manually
- Cooldown — A configurable quiet period prevents the same monitor from immediately reopening a new incident
Statuses
| Status | Meaning |
|---|---|
WATCHING | Failure detected but not yet confirmed — waiting for additional check results |
TRIGGERED | Trigger rule threshold met in at least one region |
CONFIRMED | Active incident — failure confirmed across regions, alerts have been sent |
RESOLVED | Incident closed — monitor recovered or manually resolved |
Severities
Each incident carries a severity that determines urgency and drives notification policy matching.| Severity | When it’s used |
|---|---|
DOWN | Complete failure — endpoint unreachable or critical assertions failing |
DEGRADED | Partial failure — response time thresholds exceeded or non-critical assertions failing |
MAINTENANCE | Planned downtime — created by maintenance windows |
DEGRADED incident, while consecutive failures open a DOWN incident.
Sources
Incidents can originate from four different sources:| Source | Created by |
|---|---|
MONITORS | Automatic detection when a monitor’s check results match its trigger rules |
MANUAL | User-created via Dashboard, CLI, or API for issues not covered by automation |
STATUS_DATA | Propagated from a third-party service incident when you track it as a dependency |
RESOURCE_GROUP | Aggregated from multiple monitors or services within a resource group |
Resolution reasons
When an incident resolves, DevHelm records the reason:| Reason | Meaning |
|---|---|
AUTO_RECOVERED | Monitor checks started passing and met the recovery policy |
MANUAL | Resolved by a user via Dashboard, CLI, or API |
AUTO_RESOLVED | Resolved by system logic (e.g., an upstream Status Data incident resolved) |
Reopening
If a monitor fails again after an incident resolves and the cooldown period has passed, DevHelm reopens the existing incident rather than creating a new one. ThereopenCount field tracks how many times an incident has been reopened.
Reopening behavior interacts with escalation chains — you can configure whether escalation restarts from the beginning or resumes from the current step.
Incident fields
Key fields on every incident object:| Field | Type | Description |
|---|---|---|
id | UUID | Unique incident identifier |
status | string | Current lifecycle status |
severity | string | DOWN, DEGRADED, or MAINTENANCE |
source | string | How the incident was created |
title | string | Short summary (auto-generated or user-provided) |
monitorId | UUID | Monitor that triggered the incident (null for manual/service incidents) |
affectedRegions | string[] | Probe regions that observed the failure |
reopenCount | integer | Number of times the incident was reopened |
resolutionReason | string | How the incident was resolved |
startedAt | datetime | When the incident was first detected |
confirmedAt | datetime | When multi-region confirmation completed |
resolvedAt | datetime | When the incident was resolved |
cooldownUntil | datetime | End of cooldown period after resolution |
shortlink | string | Short URL to the incident detail page |
For the full incident schema including all fields, see the API Reference.
Next steps
Incident policies
Configure trigger rules, confirmation, and recovery behavior.
Manual incidents
Create incidents for issues not caught by automated monitoring.
Alerting overview
Route incident notifications to your team.
Incident timeline
Track status changes and event history.