Incidents Overview

An incident represents an ongoing problem detected by a monitor, reported by a third-party service, or created manually. DevHelm automates the full lifecycle from detection through confirmation, alerting, and recovery.

Define this in code. Manage incident policies as part of your monitoring-as-code workflow: YAML format · Terraform · CI/CD patterns

Incident lifecycle

Every incident follows a predictable state machine:

Check fails → Trigger rule fires → WATCHING → Regions confirm → TRIGGERED → Multi-region confirmed → CONFIRMED
                                                                                                        ↓
                                                                                                   Alerts sent
                                                                                                        ↓
                                                                                              Checks start passing
                                                                                                        ↓
                                                                                              Recovery policy met → RESOLVED
                                                                                                                       ↓
                                                                                                                  Cooldown period

Detection — A monitor’s checks fail and match a trigger rule
Watching — The system observes the failure while waiting for enough data to confirm
Triggered — The trigger rule threshold is met in at least one region
Confirmed — The confirmation policy validates failures across multiple regions, promoting the incident to active status and firing alerts
Resolved — The recovery policy detects consecutive passing checks across enough regions, or a user resolves manually
Cooldown — A configurable quiet period prevents the same monitor from immediately reopening a new incident

Statuses

Status	Meaning
`WATCHING`	Failure detected but not yet confirmed — waiting for additional check results
`TRIGGERED`	Trigger rule threshold met in at least one region
`CONFIRMED`	Active incident — failure confirmed across regions, alerts have been sent
`RESOLVED`	Incident closed — monitor recovered or manually resolved

Severities

Each incident carries a severity that determines urgency and drives notification policy matching.

Severity	When it’s used
`DOWN`	Complete failure — endpoint unreachable or critical assertions failing
`DEGRADED`	Partial failure — response time thresholds exceeded or non-critical assertions failing
`MAINTENANCE`	Planned downtime — created by maintenance windows

A single monitor can have trigger rules at different severities. For example, a response time threshold might open a DEGRADED incident, while consecutive failures open a DOWN incident.

Sources

Incidents can originate from four different sources:

Source	Created by
`MONITORS`	Automatic detection when a monitor’s check results match its trigger rules
`MANUAL`	User-created via Dashboard, CLI, or API for issues not covered by automation
`STATUS_DATA`	Propagated from a third-party service incident when you track it as a dependency
`RESOURCE_GROUP`	Aggregated from multiple monitors or services within a resource group

Resolution reasons

When an incident resolves, DevHelm records the reason:

Reason	Meaning
`AUTO_RECOVERED`	Monitor checks started passing and met the recovery policy
`MANUAL`	Resolved by a user via Dashboard, CLI, or API
`AUTO_RESOLVED`	Resolved by system logic (e.g., an upstream Status Data incident resolved)

Reopening

If a monitor fails again within the cooldown period after an incident resolves, DevHelm reopens the existing incident rather than creating a new one. Once the cooldown period has passed, a subsequent failure opens a new incident. The reopenCount field tracks how many times an incident has been reopened. Reopening behavior interacts with escalation chains — you can configure whether escalation restarts from the beginning or resumes from the current step.

Incident fields

Key fields on every incident object:

Field	Type	Description
`id`	UUID	Unique incident identifier
`status`	string	Current lifecycle status
`severity`	string	DOWN, DEGRADED, or MAINTENANCE
`source`	string	How the incident was created
`title`	string	Short summary (auto-generated or user-provided)
`monitorId`	UUID	Monitor that triggered the incident (null for manual/service incidents)
`affectedRegions`	string[]	Probe regions that observed the failure
`reopenCount`	integer	Number of times the incident was reopened
`resolutionReason`	string	How the incident was resolved
`startedAt`	datetime	When the incident was first detected
`confirmedAt`	datetime	When multi-region confirmation completed
`resolvedAt`	datetime	When the incident was resolved
`cooldownUntil`	datetime	End of cooldown period after resolution
`shortlink`	string	Short URL to the incident detail page

For the full incident schema including all fields, see the API Reference.

Next steps

Incident policies

Configure trigger rules, confirmation, and recovery behavior.

Manual incidents

Create incidents for issues not caught by automated monitoring.

Alerting overview

Route incident notifications to your team.

Incident timeline

Track status changes and event history.

​Incident lifecycle

​Statuses

​Severities

​Sources

​Resolution reasons

​Reopening

​Incident fields

​Next steps

Incident policies

Manual incidents

Alerting overview

Incident timeline

Incident lifecycle

Statuses

Severities

Sources

Resolution reasons

Reopening

Incident fields

Next steps