Reducing False Positives

False positives erode trust in your monitoring. When alerts fire too often for non-issues, teams start ignoring them — and then miss real incidents. Here’s how to minimize noise.

Why false positives happen

Transient network issues — a single packet drop between probe and server
DNS propagation delays — intermittent resolution failures during changes
Load balancer health checks — brief unhealthy periods during deployments
Rate limiting — monitoring probes get throttled
Server restarts — momentary unavailability during graceful shutdown

Strategy 1: Multi-region confirmation

The most effective approach. Require failures from multiple probe regions before creating an incident:

Single-region failure → likely a network path issue, not an outage
Multi-region failure → likely a real problem with your service

Configure DevHelm to require 2+ regions failing before confirming an incident.

Strategy 2: Consecutive failure thresholds

Require multiple consecutive failed checks before alerting:

Threshold	Behavior
1 failure	Alert immediately (noisy)
2 consecutive	Filters out single transient failures
3 consecutive	High confidence, but slower detection

For a 60-second check frequency, a 3-consecutive threshold means detection takes up to 3 minutes instead of 1.

Strategy 3: Confirmation windows

Instead of counting consecutive failures, count failures within a time window:

“3 failures in the last 5 minutes” catches intermittent issues
More flexible than consecutive-only thresholds
Handles scenarios where checks alternate between pass and fail

Strategy 4: Smart assertions

Overly strict assertions cause false positives: Too strict:

Response body must exactly match a snapshot (breaks on any content change)
Response time must be under 200ms (fails during normal load spikes)

Better:

Response body contains "status": "healthy" (tolerates other field changes)
Response time p95 under 2 seconds (allows occasional slow requests)
Status code is in the 2xx range (not just exactly 200)

Strategy 5: Separate warning and failure severities

Use two-tier assertions:

Warning (severity: warn): response time > 1s → log but don’t alert
Failure (severity: fail): response time > 5s → create incident

Warnings build a trend without waking anyone up. Failures trigger real alerts.

Strategy 6: Maintenance windows

Schedule alert suppression during planned maintenance:

Deployments
Database migrations
Infrastructure changes

This prevents expected downtime from creating incidents.

Measuring false positive rate

Track your false positive rate over time:

False positive rate = (resolved-without-action incidents) / (total incidents)

If more than 10% of your incidents are resolved without any team action, your monitoring is too noisy. Tighten your confirmation strategy.

Reducing False Positives

Why false positives happen

Strategy 1: Multi-region confirmation

Strategy 2: Consecutive failure thresholds

Strategy 3: Confirmation windows

Strategy 4: Smart assertions

Strategy 5: Separate warning and failure severities

Strategy 6: Maintenance windows

Measuring false positive rate

DevHelm configuration

Incident policies

Multi-region monitoring

​Why false positives happen

​Strategy 1: Multi-region confirmation

​Strategy 2: Consecutive failure thresholds

​Strategy 3: Confirmation windows

​Strategy 4: Smart assertions

​Strategy 5: Separate warning and failure severities

​Strategy 6: Maintenance windows

​Measuring false positive rate

​DevHelm configuration

Incident policies

Multi-region monitoring

Why false positives happen

Strategy 1: Multi-region confirmation

Strategy 2: Consecutive failure thresholds

Strategy 3: Confirmation windows

Strategy 4: Smart assertions

Strategy 5: Separate warning and failure severities

Strategy 6: Maintenance windows

Measuring false positive rate

DevHelm configuration