Outliers are where the interesting stuff happens, and outliers happen to individ...

Outliers are where the interesting stuff happens, and outliers happen to individual instances. Aggregates are useful but can be very misleading. You can have milliseconds 99 percentile latency with ~1% of requests timing out.

I wouldn’t alert on a single machine having CPU issues, but I’m definitely interested in a small collection of individual machines all having CPU issues at the same time.