Response Time Budgets

By the end of this guide, you’ll have monitors that detect slow endpoints before they become outages — using layered response time thresholds at warn and fail severity.

Prerequisites

DevHelm CLI installed or an API token
An HTTP monitor — see First HTTP monitor

Why response time budgets matter

An endpoint that’s technically “up” but responding in 10 seconds is effectively broken for users. Response time assertions let you define performance budgets at multiple levels, catching degradation early.

Set up layered thresholds

Use two assertions — a warning for early detection and a failure for critical slowdowns:

monitors:
  - name: API Health
    type: HTTP
    config:
      url: https://api.example.com/health
      method: GET
    frequencySeconds: 60
    regions:
      - us-east
      - eu-west
    assertions:
      - config:
          type: response_time
          thresholdMs: 500
        severity: warn
      - config:
          type: response_time
          thresholdMs: 2000
        severity: fail

devhelm monitors create \
  --name "API Health" \
  --type HTTP \
  --url https://api.example.com/health \
  --frequency 60 \
  --regions us-east,eu-west \
  --assertion 'response_time<500' \
  --assertion '{"severity":"fail","config":{"type":"response_time","thresholdMs":2000}}'

This gives you:

Warning at 500ms — endpoint is slower than expected, investigate
Failure at 2000ms — endpoint is critically slow, open a DEGRADED incident

In the CLI, the shorthand --assertion 'response_time<N' creates a warn-severity assertion; pass a JSON object (as above) when you need fail severity. To add a threshold to an existing monitor, use POST /api/v1/monitors/{monitorId}/assertions with a body like {"severity": "warn", "config": {"type": "response_time", "thresholdMs": 500}}.

Combine with trigger rules

Response time assertions detect individual slow checks. For incident creation, use a response_time trigger rule to aggregate across checks:

incidentPolicy:
  triggerRules:
    - type: consecutive_failures
      count: 3
      scope: per_region
      severity: down
    - type: response_time
      thresholdMs: 2000
      aggregationType: p95
      scope: any_region
      severity: degraded
  confirmation:
    type: multi_region
    minRegionsFailing: 2
    maxWaitSeconds: 120

This creates a DEGRADED incident when the p95 response time exceeds 2 seconds across any region, while still opening a DOWN incident for complete failures.

Aggregation types

Type	Behavior
`all_exceed`	Every check in the window must exceed the threshold
`average`	Average response time exceeds the threshold
`p95`	95th percentile exceeds the threshold
`max`	Maximum response time exceeds the threshold

Route differently by severity

Use notification policies to handle DEGRADED and DOWN incidents differently:

Severity	Alert channel	Urgency
`DEGRADED`	Slack channel	Awareness — investigate during business hours
`DOWN`	PagerDuty	Page on-call immediately

{
  "name": "Degraded to Slack",
  "matchRules": [
    { "type": "severity_gte", "value": "DEGRADED" }
  ],
  "escalation": {
    "steps": [{
      "delayMinutes": 0,
      "channelIds": ["<slack-channel-id>"]
    }]
  },
  "priority": 5
}

Choosing thresholds

Endpoint type	Warn threshold	Fail threshold
Health check / status	200ms	1000ms
Public API	500ms	2000ms
Dashboard page	1000ms	5000ms
Background webhook	2000ms	10000ms

Base your thresholds on baseline measurements. Use the Dashboard’s response time charts to understand normal performance for each monitor.

Next steps

Incident policies

Configure trigger rules and response time aggregation.

HTTP assertions

Full list of HTTP assertion types.

Alert routing by tag

Send degraded and down alerts to different teams.

​Why response time budgets matter

​Set up layered thresholds

​Combine with trigger rules

​Aggregation types

​Route differently by severity

​Choosing thresholds

​Next steps