We have walked into accounts with cost anomaly alerts wired to a Slack channel that everyone muted in month one. The alerts were not wrong, exactly - they fired on normal weekly seasonality and on a 30-euro test environment. A detector that cries wolf is worse than none, because it trains people to ignore the one alert that mattered.
Segment before you threshold
The biggest single fix is monitoring per service, account, or tag - not the whole bill. A 5% jump in the total can hide a doubling of one team's spend, while a 5% wobble at the top level might just be a billing-day artifact. AWS Cost Anomaly Detection lets you define monitors per dimension, and that segmentation is what turns noise into signal.
- Monitor by service and by linked account, not the consolidated total
- Set an absolute floor - we ignore anomalies under ~50-100 euros of impact
- Account for known seasonality: weekends, month-end batch, marketing campaigns
- Route by ownership so the alert reaches whoever can act, not a shared firehose
Two thresholds, not one
We tune on both a percentage and an absolute amount, and an alert has to clear both. A 300% spike on a 5-euro resource is noise; a 12% rise on a 40,000-euro service is a real problem. Requiring both conditions kills the long tail of trivial-but-loud alerts that cause people to mute the channel in the first place.
The goal is not to catch every anomaly. It is to make sure that when the channel pings, people still look.
Every alert needs an owner and a next step
An anomaly that lands in a channel with no owner dies there. We route each monitor to the team that owns that service and include the likely culprits in the message - which usage type moved, by how much, since when. The reviewer should be able to confirm or dismiss in under two minutes, otherwise the alert quietly joins the ignored pile.
aws ce create-anomaly-monitor \
--anomaly-monitor '{"MonitorName":"per-service","MonitorType":"DIMENSIONAL","MonitorDimension":"SERVICE"}'Review the misses, not just the hits
Once a quarter we look back at the cost spikes that happened and check whether the detector caught them. False negatives - the real anomaly that never fired - are far more dangerous than a few false positives, and they only show up if you deliberately audit for them. A detector worth keeping fires maybe a handful of times a month, each one worth a glance. If it is firing daily, it is not protecting the budget - it is just background noise with a price tag.