Skip to main content
False positives erode trust in your monitoring. When alerts fire too often for non-issues, teams start ignoring them — and then miss real incidents. Here’s how to minimize noise.

Why false positives happen

  • Transient network issues — a single packet drop between probe and server
  • DNS propagation delays — intermittent resolution failures during changes
  • Load balancer health checks — brief unhealthy periods during deployments
  • Rate limiting — monitoring probes get throttled
  • Server restarts — momentary unavailability during graceful shutdown

Strategy 1: Multi-region confirmation

The most effective approach. Require failures from multiple probe regions before creating an incident:
  • Single-region failure → likely a network path issue, not an outage
  • Multi-region failure → likely a real problem with your service
Configure DevHelm to require 2+ regions failing before confirming an incident.

Strategy 2: Consecutive failure thresholds

Require multiple consecutive failed checks before alerting:
ThresholdBehavior
1 failureAlert immediately (noisy)
2 consecutiveFilters out single transient failures
3 consecutiveHigh confidence, but slower detection
For a 60-second check frequency, a 3-consecutive threshold means detection takes up to 3 minutes instead of 1.

Strategy 3: Confirmation windows

Instead of counting consecutive failures, count failures within a time window:
  • “3 failures in the last 5 minutes” catches intermittent issues
  • More flexible than consecutive-only thresholds
  • Handles scenarios where checks alternate between pass and fail

Strategy 4: Smart assertions

Overly strict assertions cause false positives: Too strict:
  • Response body must exactly match a snapshot (breaks on any content change)
  • Response time must be under 200ms (fails during normal load spikes)
Better:
  • Response body contains "status": "healthy" (tolerates other field changes)
  • Response time p95 under 2 seconds (allows occasional slow requests)
  • Status code is in the 2xx range (not just exactly 200)

Strategy 5: Separate warning and failure severities

Use two-tier assertions:
  • Warning (severity: warn): response time > 1s → log but don’t alert
  • Failure (severity: fail): response time > 5s → create incident
Warnings build a trend without waking anyone up. Failures trigger real alerts.

Strategy 6: Maintenance windows

Schedule alert suppression during planned maintenance:
  • Deployments
  • Database migrations
  • Infrastructure changes
This prevents expected downtime from creating incidents.

Measuring false positive rate

Track your false positive rate over time:
False positive rate = (resolved-without-action incidents) / (total incidents)
If more than 10% of your incidents are resolved without any team action, your monitoring is too noisy. Tighten your confirmation strategy.

DevHelm configuration

Incident policies

Configure trigger rules and multi-region confirmation.

Multi-region monitoring

Set up checks from multiple locations.