Incident Policies

Every monitor has an incident policy that controls when incidents open, how they’re confirmed, and when they auto-resolve. A policy has three components: trigger rules, a confirmation policy, and a recovery policy.

Define this in code. Manage incident policies as part of your monitoring-as-code workflow: YAML format · Terraform · CI/CD patterns

Trigger rules

Trigger rules define the conditions that open an incident from check results. Each monitor can have multiple rules at different severities.

Rule types

Type	Behavior	Required fields
`consecutive_failures`	Opens an incident after N consecutive failed checks	`count`
`failures_in_window`	Opens an incident after N failures within a time window	`count`, `windowMinutes`
`response_time`	Opens an incident when response time exceeds a threshold	`thresholdMs`, `aggregationType`

Scope

Each rule has a scope that determines how regions are evaluated:

Scope	Behavior
`per_region`	Each region is evaluated independently — the rule must be satisfied in a single region
`any_region`	Failures are aggregated across all regions

Severity

Each rule targets a severity level. When multiple rules fire, the highest severity wins:

Severity	Priority
`down`	Highest — complete failure
`degraded`	Lower — partial failure or performance issue

Response time aggregation

For response_time rules, the aggregationType field controls how latency is evaluated across checks:

Aggregation	Behavior
`all_exceed`	Every check in the evaluation window must exceed the threshold
`average`	The average response time exceeds the threshold
`p95`	The 95th percentile exceeds the threshold
`max`	The maximum response time exceeds the threshold

Default policy

When you create a monitor without specifying a policy, DevHelm applies a sensible default:

Trigger: 2 consecutive failures per region → severity down
Confirmation: Multi-region, 1 region failing, wait up to max(60, frequency × 2) seconds
Recovery: 2 consecutive successes, 2 regions passing, 5-minute cooldown

Example

A policy with two trigger rules — one for complete failures and one for performance degradation:

{
  "triggerRules": [
    {
      "type": "consecutive_failures",
      "count": 3,
      "scope": "per_region",
      "severity": "down"
    },
    {
      "type": "response_time",
      "thresholdMs": 5000,
      "aggregationType": "p95",
      "scope": "any_region",
      "severity": "degraded"
    }
  ],
  "confirmation": {
    "type": "multi_region",
    "minRegionsFailing": 2,
    "maxWaitSeconds": 120
  },
  "recovery": {
    "consecutiveSuccesses": 3,
    "minRegionsPassing": 2,
    "cooldownMinutes": 10
  }
}

Confirmation

Confirmation prevents false positives by requiring failures from multiple probe regions before promoting an incident to CONFIRMED status.

Field	Type	Description
`type`	string	Confirmation strategy — currently `multi_region`
`minRegionsFailing`	integer	Minimum regions that must be failing to confirm
`maxWaitSeconds`	integer	Maximum seconds to wait for enough regions to report failures

When a trigger rule fires in one region, the confirmation policy waits up to maxWaitSeconds for at least minRegionsFailing regions to also report failures. If enough regions confirm within the window, the incident moves to CONFIRMED and alerts fire. If the window expires without enough regions failing, the incident is discarded.

Set minRegionsFailing to 1 to confirm on the first region that reports a failure. This is useful for monitors running from a single region.

Recovery

Recovery controls when a confirmed incident auto-resolves.

Field	Type	Description
`consecutiveSuccesses`	integer	Number of consecutive passing checks required before resolving
`minRegionsPassing`	integer	Minimum regions that must be healthy before recovery completes
`cooldownMinutes`	integer	Minutes after resolution before a new incident can open (0–60)

The recovery policy ensures stability before closing an incident. After the required consecutive successes are observed across enough regions, the incident moves to RESOLVED. The cooldown period then suppresses new incidents for the same monitor, preventing flapping.

Managing policies

View a monitor’s policy

devhelm monitors get <monitor-id> --include-policy

Update a policy

devhelm monitors update-policy <monitor-id> \
  --trigger-type consecutive_failures \
  --trigger-count 3 \
  --trigger-scope per_region \
  --trigger-severity down \
  --confirmation-min-regions 2 \
  --confirmation-max-wait 120 \
  --recovery-successes 3 \
  --recovery-min-regions 2 \
  --recovery-cooldown 10

Next steps

Incidents overview

Understand the full incident lifecycle and statuses.

Monitoring regions

Learn how multi-region checks interact with confirmation policies.

Alerting overview

Configure notifications for confirmed incidents.

Maintenance windows

Suppress incidents during planned downtime.

Documentation Index

​Trigger rules

​Rule types

​Scope

​Severity

​Response time aggregation

​Default policy

​Example

​Confirmation

​Recovery

​Managing policies

​View a monitor’s policy

​Update a policy

​Next steps

Incidents overview

Monitoring regions

Alerting overview

Maintenance windows

Trigger rules

Rule types

Scope

Severity

Response time aggregation

Default policy

Example

Confirmation

Recovery

Managing policies

View a monitor’s policy

Update a policy

Next steps