Quantcast
Channel: THWACK: All Content - Server & Application Monitor
Viewing all articles
Browse latest Browse all 12281

volume: the alert says it's down, the web console says... nothing

$
0
0

Got an alert from SAM that the volume is down:

Volume "V:\ Label:FluxCapacitor AF04DD95" on Tier 1 node "Distribution01 (Aspera)": Down

 

   Responding: N

   Type: Fixed Disk (ID: 4)

   Size: 43.7 T (Percent Used: 1 %, Index: 0)

The alert:

Screen Shot 2014-09-02 at 7.02.15 PM.png

...when I test this alert, it shows as "down".

...this volume in the web console:

Screen Shot 2014-09-02 at 6.38.38 PM.png

It doesn't seem to say here whether it's up or down except perhaps where the "Next Poll" field is clearly stale. Is there a way for the volume page to say that, along perhaps with whether it's "responding", and how many allocation errors it is having?

 

I am still investigating but so far it looks like one of our SAN volumes changed its signature, and thus is dead as far as SAM is concerned. The nodes themselves are happy: the SAN volume is available to them, just with a different signature, and apparently they couldn't care less about it as volume mapping is done by a driver and they only see a drive letter and don't care about the signature. Thus SAM sent fire alerts that the SAN got disconnected from a bunch of nodes - while the nodes themselves are OK, and the web console shows no problems.

 

Now I am working night shift (for my own tranquility and for that of my coworkers ); the alerts went off during the day; my coworkers looked at them, didn't see any indications of a problem either in SAM nor on the nodes. Then they send me tender and sweet emails, "yo dude, what did you do to Solarwinds, it says we have a problem when in fact we don't?"

 

Hence we have a few questions:

  1. How can I make SAM web console display status on the volume page, and if it's down, how long?
  2. Besides "status", the alerts seem to have access to info such as "responding" (yes/no) and I can't find a way to display it in the Orion Web Console. Where is it?
  3. Is there a way to reconnect that SAN volume with a new signature to all the nodes that used to have it, w/o too much manual labor, i.e. w/o going to each node, deleting the ghost volume, then doing "list resources" and adding the new one?
  4. Is there a way to display a list in OWC (Orion Web Console) similar to "down nodes" or "nodes with problems", but for volumes?

 

Thanks!


Viewing all articles
Browse latest Browse all 12281

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>