There was a time when the directive was to monitor everything. If there was a template, assign it. That "capture it all" attitude has created some serious database overhead issues and honestly, there are monitors that capture data we have zero use for. The result is a mess of application monitors that need care and feeding, esp. when custom passwords are changed or the service being monitored is uninstalled. I'm looking for a straight-forward method to review what application monitors are assigned to which nodes. The goal is to present a review of the monitors to the NetOps team to get a better sense of what we are collecting, be able to identify what has no use and possibly find important metrics that aren't being captured. Getting upwards of 3000 emails from "critical" and then "up" monitors is over-the-top and needs to be better managed so the emails reflect real issues in our environment, not ones that hit a default trigger value and send an email. I'd rather not waste time setting baselines if the monitor servers no useful purpose. It's possible that many of our dev servers only need to alert if they are no longer up since they are constantly being mashed and restored.
↧