Node_to_Discovery_Profile_assignment.OrionReport
discovery_sonar_status.txt
Advanced Alerts sending multiple alert e-mail messages
I've been setting up e-mail alerts for our various applications and I've run into a situation I can't figure out.
I have an alert that is set up to watch an application, and alert if it enters a 'critical' state. I currently have the application in that stat on a server, so I can test it's real functionality.
The application monitor is set up performance counter monitor that monitors the application, and goes critical when the application exceeds 120MB. There is also a process monitor set up to watch the executable in the process list and go critical if the process goes over 90% CPU usage, and provide a status of 'down' if the application is not running.
These alerts are working fine within Solarwinds, but the alert e-mails are what I can't figure out.
I have the alert set to trigger when the node is 'not equal to' 'down', the application name is equal to the name of the application, and the component availability is equal to Critical.
In the Trigger actions, I have two actions.
The first is as follows:
NetperfMon Event Log:
Message to send to NPM Log:
NetPerMon Event Log: Component ${N=SwisEntity;M=ComponentAlert.ComponentName} on Application ${N=SwisEntity;M=Application.ApplicationAlert.ApplicationName} on Node ${N=SwisEntity;M=Application.Node.Caption} is ${N=SwisEntity;M=ComponentAlert.ComponentAvailability} with status: ${N=SwisEntity;M=ComponentAlert.StatusOrErrorDescription}
The second action is the send E-Mail/Page alert.
In the e-mail setup, I have:
Subject:
Alert: MonitoredApp on ${N=SwisEntity;M=Application.Node.Caption} is critical.
Message:
MonitoredApp on ${N=SwisEntity;M=Application.Node.Caption} is critical:
CPU Usage: ${N=SwisEntity;M=ComponentAlert.PercentCPU}%
Memory Usage: ${N=SwisEntity;M=ComponentAlert.StatisticData}MB
Date: ${N=Generic;M=DateTime;F=DateTime}
I want the 'CPU usage line to read the APPLICATION'S CPU usage, not the server's overall CPU usage, so that line may be wrong for what I'm looking to do.
What I get is two messages. Both have identical subjects, but the content is slightly different.
The first comes through as
MonitoredApp on Servername is critical:
CPU Usage: %
Memory Usage: 202.92578125MB
Date: Thursday, May 21, 2015 4:17 PM
The second comes through as:
MonitoredApp on Servername is critical:
CPU Usage: 0%
Memory Usage: MB
Date: Thursday, May 21, 2015 4:17 PM
Does anyone have any idea why the single alert is sending the two separate, incomplete messages, and how I night fix it so it sends one e-mail with all the information?
Thanks
AppInsight for Exchange: Exchange PerfMon counters getting disabled
I am using AppInsight for Exchange to monitor my exchange servers. I have 2 mailbox servers (both are VMs) set up in a DAG. The problem is that the servers are always in an UNKNOWN state, because the exchange Perfmon counters keep disabling themselves after about a day or so. I can confirm this on the exchange MB server, in the registry: HKLM\SYSTEM\CurrentControlSet\Services\msftesqllDX-Exchange\Performance, where the "Disable Performance Counters" is non-zero. I've searched the event logs and can't really figure out why it is getting disabled. And if I change it back to 0, the counters will start working again, and everything turns green--for about a day. Then something disables the counters again, and it goes back to Unknown...
I have tried re-installing the libraries from scratch, but the result is the same--it works at first, but then stops within a day. Everything in the "Information Store" is unavailable (Active Connection Count, Active User Count, RPC Requests, RPC Averaged Latency, etc.)
My active mailbox server also has some other "Custom Statistic Monitors" that are "Unknown". Not sure if this is related or not...
I know that this is not necessarily a SAM issue, just wondering if anyone had seen this before?
alerts not migrated in 6.2
a number of alerts that are important to us have been disabled / not migrated when we upgraded to 6.2. The error message says, "object/properties used in trigger/reset conditions of this alert are not yet supported in new web-based alerting". How do I review those un-migrated alerts so I could try and develop workarounds - or better yet - when will ALL old alerts be supported?
Server 2008 R2 vs Server 2012
We have our Orion products running on Server 2003 R2 currently on an old machine (6 years old or so). We are looking by July to go to the latest versions of things, such as SAM 6.2, NPM 11.5, etc…. My question is, if we put it on a 2008 or 2012 server, does it matter? Do you get more functionality from things like power shell, or ease of use by going Server 2012 over 2008? (we are currently running SAM 6.0, NPM 10.6, IVIM 1.8.1).
I’ve searched through thwack and I haven’t noticed anything that addresses this. I have to try to complete our plan so we can set this up by the time we get the hardware for this (around July 15th). Is it worth convincing management of 2012?
High Page/Sec and locking up server and application
Hi,
We have a critical business application (VM) where the Pages/sec starts to increase in the morning then reaches 100% after a few hours then locks the application. In Orion we want to be able to monitor the process that are causing this high Pages/sec and create a trend charts so we can monitor the various process that are causing this high Pages/Sec over time. We have increased memory in the server from 12MB, 16MB, 32MB and it just increases the time it takes to fill up the Pages/Sec to 100% - 2hr, 3hr, 4 hr them 100% then lock the application. In Orion is there a way to monitor the the process that are causing this high Pages/sec.
Thank you,
Scott DeJong
unable to add windows node
I am trying to add one windows node with SNMP. SNMP is already configured on the server. while testing SNMP getting success reply. but it also giving another error stating server does not respond with supplied read/write community string. tried after rebooting node & node added successfully. But now solarwinds is unable to fetch list resources information for the same server.
Tried same steps after delete the node & re add.
Solarwinds not polling nodes correctly(Physical or Virtual)
In Nodes details page solarwinds is showing Hardware info (Physical or Virtual). for one node its showing wrong info. for Physical Linux Server it showing Virtual and also polling hardware health for same server.
I deleted that node and re added it again in solarwinds. but issue is still there.
created case with solarwinds but not received solution (CASE#788537)
Oracle WebLogic (JMX)
RID Pool Depletion
I'm working with our Global Domain admins whom want to create an alert that monitors RID pool depletion, but wants to alert on increased over a certain count, compared to last read.
What we want is similar to the log file difference monitor template I've implemented on our Unix systems, that alerts when it sees new log files generated since last poll.
However, I couldn't find any thwack template or thwack discussion specific to RID pool monitoring. The script they have in place now simply reports what the current pool size is, not what the difference is since last poll to alert a difference/increase of 1000 since last poll/compare. Would simply adding 'Count statistic as difference' be used in what we have in place shown below work, or anyone have a count/difference powershell script in place already for RID pool script in use you could share?
SSL Cert Expiration Monitor on SAM Server Error
Just starting to deploy SAM and NPM for the first time. When I setup the SSL Certificate Expiration Date Monitor pointed at the SAM server it self it throws "Exception message is: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host"
Orion 2015.1.1, SAM 6.2 Hotfix1 NPM 11.5.1
Server 2012R2
Thanks!
Windows services restarts by a group of users
Hi
We are running SAM 6.2 as of a few weeks ago. And now have some Power users in our Production Services teams wanting more control over the servers they are responsible for.
So can...
Give control of a group of servers to a AD group of users?
AND they should only be allowed to restart particular windows services
I really only want them to be able to re-start particular windows services that might have fallen over.
thank you in advanced for any help.
regards
Richard
Storage/Volume Capacity Forecast
I just upgraded (Orion Platform 2015.1.1, SAM 6.2.0, QoE 2.0, NCM 7.3.2, NPM 11.5.1, WPM 2.2.0, IVIM 2.0.1) and found the Capacity Forecast stuff... too cool! I have been telling my self to do this through some sort of custom report, but you have save me from spending the time!
One issue (I know, why look a gift horse in the mouth)....
I think the Storage/Volume Capacity Forecast is based off percentage, this works great until you add space to the drive. Then the forecast is useless for months on that node. If you used consumed space and the calculated the percentage using current total, it would not go all wonky after adding space.
This is not exactly a situation I would have thought of if I made my report, but now that we are on the other side, I would change it to handle this.
Is this a bug? Feature request? first report?
Can you force SAM v6.1.1 to use NTLMv2 or Kerberos for authentication?
We recently applied a new security policy to some of our servers that blocks the ability for SAM to query them. The domain controllers lit up with failed password attempts. After many late nights and much investigation we found that this new policy is blocking NTLMv1 authentication attempts.
Is there a way to force SAM to use NTLMv2 or, preferably, Kerberos only?
Warranty status only for HP. Dell Servers remain in unknown.
Hi, we have sam 6.1, it correctly gets the status for HP servers, but apparently it cannot retrieve the DELL servers status.
Are there any known issues / fixes for this?
Thanks
Asset Inventory showing wrong Time Zone
One of my server having time zone "GMT-05:00) Eastern Time (US & Canada)" but in Asset Inventory showing "GMT +4.30 Kabul" as Time Zone. on server I checked and found the correct time zone is configured. by command prompt it showing correct time zone.
but not sure why solarwinds is not polling correct value.
Account used for SAM on Linux accesses passwd, hosts, etc
Hi Everyone,
We are using SAM in our company and we recently installed SAM for monitoring our Linux systems and the templates used was to CPU, Memory, Disk and NTP. I just wanted to clarify two points.
1) SAM connects via SSH every 3 minutes (the polling time set) to run the scripts in these templates. Is this the only way SAM collects the information ?
2)Each time SAM uses the account to connect to the Linux server via ssh, the server generates permission alerts for passwd, hosts, nsswitch.conf, etc/environment files. Why is that happening?
Thanks in advance.
SAM very slow to load data since upgrade to 6.2.0
Hello everyone,
We upgraded SAM to 6.2.0 several weeks ago and since then, it takes a very long time to load group data and alert data. Once it's loaded once it seems to be faster as if caching is being done on the client side. Is there a setting change that will speed up the initial load of data? We have a good many groups and it's frustrating for people just using the tool or for people that have cleared their cache.
Any help is greatly appreciated.
Greg
Baseline collection
Hello have been using Appinsight for SQL for quite some time and just recently we began using the Baseline thresholds for monitoring the components within the monitor. We have been seeing some errors in the message center about some components being unable to calculate baseline due to insufficient statistical deviation. Just wondering what the cause might be, it's occurring on multiple nodes that we've been running this monitor on for a long time, and we are seeing success on other nodes.