We’re just getting started with SAM & LEM and am looking for suggestions on how to handle alerting & alarm mgmt. We don’t have a true 24x7 operations team, but use an oncall rotation to respond to critical issues after business hrs. I’m assuming this is typical of many – the business wants systems to be up and running 24x7 but they don’t want a 24x7 Operations Center. So, I’m curious as to how others are using SolarWinds to manage this?
Do most of you rely on the built-in alerts to page your teams 24x7 after spending a few weeks/months tweaking the thresholds and minimizing the “noise”? Or do you have different paging rules for after-hrs (maybe up/down only) and then respond to other health alarms during business hrs? Do the teams keep SAM/LEM open at all times to monitor systems or only when they’re troubleshooting an alert? Also, and I haven’t seen this feature yet, but do you have an acknowledgement process, where someone needs to respond to the alerts within a specific amount of time before it escalates to a different oncall person?