Measure Database/Instance Availability, NOT "Application Availability"

November 30, 2016, 11:11 am

≪ Previous: Volumes/RAM/CPU monitor and node status

In short: I want to monitor, and eventually quantify in a report down the line, the availability of either a SQL Server instance or an Oracle Database. I understand that there is an "application" availability metric that exists in the SAM monitoring tool; however, this also lumps in whether sub-components of the template have gone critical or not, and doesn't give a true database/instance uptime/availability.

At greater length: What I am after could be considered a "ping" check of sorts. I just want to know, can I reach the database/instance? If not, record it, and we would obviously get a corresponding alert as well with current (out-of-the-box) alerting in place. The issue with the built-in functionality that I've come across, is that if one counter or another goes critical for one polling interval, then it shows that the application was not in an available or was in a critical state at that time. Even though, it is likely the case that there was a dip in buffer cache-hit ratio or something for 5 minutes and the SQL instance never actually went down/unavailable.

Is something like this possible through either custom queries or tweaking a current template? I mean, I get why someone might want to see the "application" availability the way it is designed now, but it really skews results when trying to measure whether the databases/instances were truly unavailable or not.

If I'm way off base here, and this can be accomplished with the current out-of-the-box settings in SAM, please let me know. I just can't seem to find anything useful in getting something more pertinent to what I'm looking for in my research. Again, I want to know if the instance/database is accessible, and only if it is down (timeouts, instance down, etc.) then show that it is critical/unavailable.

Hopefully, in my rambling, I've made a little sense. Any help is appreciated.

Thank you in advance

↧

Covering a gap in SAM.

November 29, 2016, 7:48 am

≫ Next: Minimum Rights needed to monitor a Windows Server

≪ Previous: Measure Database/Instance Availability, NOT "Application Availability"

As you probably all come across yourselves, Im trying to cover a gap in the monitoring in which that when i server hangs it still responds to ICMP requests meaning it doesnt always flag an alert on monitoring.

I have seen whenever there is an application template applied to a particular device the application will alert as being "Unknown".

This will work for me as i can write an application template to monitor a service or a process to acheive the same result. What i need to know is there a suitable windows service or process that will always be running but should the server hang could flag up this application alert for our team to respond to and perform its 1st line troubleshooting.

Thanks all

Dan

↧

Minimum Rights needed to monitor a Windows Server

November 28, 2016, 11:53 am

≫ Next: Windows powershell monitor component errror - 'return code is different than expected'

≪ Previous: Covering a gap in SAM.

We are in the process of setting up a bunch of new servers and I am trying to find what the minimum rights that are needed for an account to monitor the server. I have been trying to search this on the SolarWinds site and here but can't find anything clear that states this. If anyone can direct me to an article on this that would be great.
Thanks!

Dave

↧

Windows powershell monitor component errror - 'return code is different than expected'

September 23, 2016, 9:22 am

≫ Next: SAM 6.3.0/NPM 12.0.1 InformationService.log EasyNetQ entries

≪ Previous: Minimum Rights needed to monitor a Windows Server

I'm testing the Symantec Netback application monitoring template with our Windows 2008 R2 Netbackup servers, and all of the components are green or "up" except for the job status component. It's down and has this message

"The return code is different than expected. Testing on node '127.0.0.1' failed with 'Down' status ('Down' might be different if script exits with a different exit code).
Error finding master server . Check "master" argument and credentials. ERROR:"

I'm not sure how to troubleshoot this issue. I know Powershell 2.0 is installed out of the box, but perhaps this template needs a newer version?

Any help would be appreciated.

Banner:

Orion Platform 2016.2.100, WPM 2.2.1, IPAM 4.3.2, SRM 6.3.0, VNQM 4.2.4, NCM 7.5.1, NPM 12.0.1, DPA 10.2.0, QoE 2.2.0, NTA 4.2.1, IVIM 2.1.2, SAM 6.3.0, NetPath 1.0.1

↧

SAM 6.3.0/NPM 12.0.1 InformationService.log EasyNetQ entries

November 1, 2016, 8:23 am

≫ Next: En Masse Group Edit

≪ Previous: Windows powershell monitor component errror - 'return code is different than expected'

Since upgrading to SAM 6.3.0/NPM 12.0.1 (we went straight from SAM 6.2.3/NPM 11.5.3 to SAM 6.2.4/NPM 12.0.1, so maybe this is prevalent in 12.0.0 as well), our SolarWinds.InformationService.ServiceV3.exe (henceforth referred to as SWISv3) memory usage went from averaging 800 - 1,000 MB to now averageing 2.5 - 5 GB. However, despite the higher memory usage this doesn't seem to effect anything until occasionally the SWISv3 process will jump up to using 15 to 25 GB and we will start seeing problems. I don't know if it is related, but in the SWISv3 log files (located at C:\ProgramData\SolarWinds\InformationService\v3.0\Orion.InformationService.log) there are nearly 30,000 lines in which virtually all of them are the following:

[EasyNetQ consumer dispatch thread] WARN  SolarWinds.InformationService.ChangeBroker.Broker - (null) (null)
 Change indication for Orion.APM.Component is missing values to provide in a notification, but additional query returned no matching entity. Please check the validity of key values in the indication reported.

With the occassional few lines of this sprinkled in:

[EasyNetQ consumer dispatch thread] WARN  SolarWinds.InformationService.ChangeBroker.Broker - (null) (null)
 Change indication for Orion.APM.SqlDatabase is required to contain {ItemID, ApplicationID} key properties, but {ItemID} missing.

Currently the 20 previous Orion.InformationService.Log files in the directory are 10,241 KB each... So over 10 MB of these EasynetQ log entries in each log file. I'm posting this on Thwack and not a Support case because there doesn't seem to be any directly attributable issues I can point to because of this, so I wanted to see if other users are experiencing the same issue and/or if a SolarWinds employee (like aLTeReGo perhaps) has seen this with other clients and can comment on this? I don't think this should be normal behavior and I'm also concerned about the exponentially larger memory usage of the SWIS service with the new version. I haven't installed the newest Orion Core hotfix that is out now, but reading the notes for it I don't see anything it will fix that has to do with this, so I'm not sure what kind of difference that will make.

↧

En Masse Group Edit

December 9, 2016, 7:31 am

≫ Next: PowerShell Component Monitor permissions issues

≪ Previous: SAM 6.3.0/NPM 12.0.1 InformationService.log EasyNetQ entries

I've only been working with Soalrwinds for 6 months now, so forgive me if this question has already been covered. Recently one of my Systems Admins brought to my attention that an Application group wasn't properly displaying down components/nodes. When I checked the group's setting I noticed under Advance setting the Status rollup mode was set to only show Worst status. After some reading and discussing it with my coworker & manager we come to the conclusion that it would be best to set the groups to show mixed status. Now here's the problem, my predecessor configured all 894 groups to Worse status. Is there a way to edit all the groups at once, like one can do when editing properties of nodes? Otherwise this is going to a long monotonous task.

-Cheers

↧

PowerShell Component Monitor permissions issues

December 9, 2016, 7:54 am

≫ Next: IIS Applications

≪ Previous: En Masse Group Edit

I created this basic test powershell component monitor that fails with the following error message when I run it

Get-Counter : Unable to access the desired computer or service. Check the permissions and authentication of the log service or the interactive user session against those on the computer or service being monitored.

# This gets a Windows performance counter from the specified IP

$counter = Get-Counter "\\${IP}\LogicalDisk(C:)\free megabytes"

$size = $counter.CounterSamples.CookedValue / 1000 # Size to GB

Write-Output "Statistic.Drive:0"

Write-Output "Message.Drive:$size"

The credentials used to run the monitor have domain admin privileges and are the same for every other component monitor and system poller.

These same credentials are local admin access to the SAM.

The line above is successful if I test run the component monitor against the SAM server (so ${IP} is the localhost)

WMI is allowed through the Windows Firewall on each server and they are all on the same VLAN

The strange thing however, is that the line works if I log into the SAM server with the same SAM service account, fire up a PowerShell console and run the Get-Counter command to query a server other than localhost

$someRemoteServer = "1.2.3.4"

Get-Counter "\\$someRemoteServer\logicaldisk(c:)\free megabytes"

If the SAM server itself can directly query other server's performance counters, what is preventing the Windows PowerShell Component Monitor using the same credentials?

Any help would be great.

↧

IIS Applications

May 27, 2016, 5:08 am

≫ Next: can I awoid using SA user on my MSSQL ?

≪ Previous: PowerShell Component Monitor permissions issues

Hi all

how do I monitor applications running on IIS. I have windows servers(WMI, WinRM, powershell enabled) and I assigned them AppInsight for IIS. some IIS components are grey, some are green(see attached)

↧

can I awoid using SA user on my MSSQL ?

December 21, 2015, 6:42 am

≫ Next: WANTED: Application Template Hunters! Win a trip to Austin SWUG, an Xbox One S and more!

≪ Previous: IIS Applications

I can find any documentation on how to avoid using SA account

Can any one help me with a link or a how to guide ?

↧

WANTED: Application Template Hunters! Win a trip to Austin SWUG, an Xbox One S and more!

December 9, 2016, 12:34 pm

≫ Next: certificate renewal reminders

≪ Previous: can I awoid using SA user on my MSSQL ?

Help us track down the most wanted monitoring templates from our list of app outlaws and create custom application templates for Server and Application Monitor. Get started now>>

The top 10 template hunters will each win an Xbox One S 500 GB Console - Battlefield Bundle. One of the top ten will be selected by a panel of judges to win the GRAND PRIZE: An all-expense paid trip to Austin, TX where you will attend a SWUG @ SolarWinds HQ, have exclusive access to ALL 5 Head Geeks, & appear LIVE before the 50th episode of SolarWinds Lab!

Take a look at the "wanted" list and submit yours to be entered to win! Submit your templates here>>

↧

certificate renewal reminders

December 9, 2016, 5:30 am

≫ Next: What Does Solarwinds Collect

≪ Previous: WANTED: Application Template Hunters! Win a trip to Austin SWUG, an Xbox One S and more!

Hello,

I'm not sure if this is the right section but I was wondering if there is a way to monitor pending certificate expirations in SolarWinds? We had a coupe of certs expire that caused some headaches yesterday and are looking to monitor this.

↧

What Does Solarwinds Collect

December 9, 2016, 6:28 am

≫ Next: Would you watch a SolarWinds reality TV show?

≪ Previous: certificate renewal reminders

I know this is a very general question and hard to answer, but management is asking for a rough number on the number of different components that our Solarwinds products collects. We use SAM and poll via WMI and use VMAN. If anyone can provide any documentation that would be great.

↧

Would you watch a SolarWinds reality TV show?

February 11, 2015, 2:20 pm

≫ Next: Funniest Superhero Death Ever!

≪ Previous: What Does Solarwinds Collect

Before you answer, consider the reality shows out there already:

Duck Dynasty
Honey Boo-Boo
Storage Wars
Swamp People
Ice Road Truckers
Finding Bigfoot
Ax Men

After interacting with all of my co-workers and product users, I realized that the software development process can be...entertaining, to say the least. Trust me, if you're a geek, there are some strange characters here, especially me. I would watch religiously. Be great marketing too. We can name it something slick like, "Developers," or "Thwack Men."

At least you wouldn't lose IQ points by watching our show. (Although, I must confess that I have watched the other shows I've listed...and screamed in agony from time to time.)

(Hey, my thwack snuggie went from dream to reality!)

↧

Funniest Superhero Death Ever!

July 11, 2014, 6:35 pm

≫ Next: SQL Server User Experience Alert Question

≪ Previous: Would you watch a SolarWinds reality TV show?

↧

SQL Server User Experience Alert Question

December 1, 2016, 1:02 pm

≫ Next: Alert Manager: Continuous loading

≪ Previous: Funniest Superhero Death Ever!

I have added a Custom SQL query to a Database server via an Application Monitoring Template and the SQL Server User Experience Monitor on that template. The query works and I get a value. I also can alert on the status. Problem is, I want to include the Statistic Data (Value from SQL) in my alert, but I can't seem to find the $variable to accomplish this. Basically, this value is the amount of jobs our server is processing and I am alarmed when it has a queue length above a certain value, but I'd like the email to include the queue length so when we get the alert we know what we're dealing with.

Any way to make this happen?

Thanks!

↧

Alert Manager: Continuous loading

November 14, 2016, 5:02 am

≫ Next: JBoss (JMX)

≪ Previous: SQL Server User Experience Alert Question

Hi all,

Does anyone else get this issue where, after carrying out some function in Alert Manager, the page refreshes and greys out whilst it re-loads (shows the loading dialog).

However, after it completes and before you can highlight or click another element, the page greys out again and reloads once more, and this continues ad infinitum.

I'm using Google Chrome Version 54.0.2840.71 m browser, and the following Orion modules.

Orion Platform 2016.1.5300

IPAM 4.3.2

NCM 7.5

NPM 12.0

DPA 10.0.1

QoE 2.1.0

NTA 4.2.0

IVIM 2.1.2

SAM 6.2.4

NetPath 1.0

Thanks.

↧

JBoss (JMX)

April 5, 2012, 2:45 pm

≫ Next: Alert Action - Run Linux Script With Putty

≪ Previous: Alert Manager: Continuous loading

↧

Alert Action - Run Linux Script With Putty

March 2, 2016, 11:39 am

≫ Next: CheckPoint SI Percentage OID

≪ Previous: JBoss (JMX)

Overview

This article shows how you can execute scripts against a linux server as an alert action. The scripts can be customized however you'd prefer them to function. This example will show how to do a simple service/daemon restart for Apache. There are some prerequisites that need to happen first beforehand. If you've already done those then skip down to the Orion Configuration part.

Prerequisites:

Change the following Orion services to run as a service account. I just created one called 'orionservice' and then restarted those 3 services once that change was made. The service account needs to be a local admin to the Orion server.

Install putty on your Orion server. Make sure you download the Windows Installer version. Once installed, log into your Orion server with the service account that you created. Open putty and connect to the linux server in question. You have to manually accept the SSH key to the host since those accepted keys are stored on a per user basis (In this case, the service account). Putty does not allow a SSH key to be automatically accepted via the command line. You only have to do this once per linux server. There are other ssh utilities that do allow SSH keys to be accepted automagically via the CLI, but this article will focus on Putty.

Optional

Add the install location for Putty into your environment variables. This help when creating the alert action since it can call just putty.exe instead of using the full Windows path to it.

Orion Configuration

1. In the Alert Trigger Condition you'll add the action: Execute an External Program.

Here is the full line in the 'Network path to external program'. I'm using an Orion variable, ${N=SwisEntity;M=Application.Node.IP_Address}, which allows the action to be dynamic.

putty.exe -l <username> -pw <password> -m C:\SolarWindsScripts\Linux\apache.sh ${N=SwisEntity;M=Application.Node.IP_Address}

***Security Note: Putty also allows you to specify a key file with -i instead of using a password which is more secure.***

2. You'll replace the <username> and <password> with the actual linux account that you use.

3. On your Orion server create a folder in a location that your service account has access too. I created mine at C:\SolarWindsScripts\Linux\. In that folder you can place your scripts, this example just has a basic sh script called apache.sh.

Example of apache.sh file

/etc/init.d/httpd restart

exit 0

4. Save Changes to that action and test. You can also test this by running the full command in step 1 in command prompt or powershell. Just replace the Orion variable at the end of the actual IP of the linux server.

From here you can write scripts to accomplish just about anything and improve any automation rolls. This gives SAM the ability to 'self heal' an application before anyone has to be involved in troubleshooting.

If this article was helpful, please feel free to rate it.

↧

CheckPoint SI Percentage OID

December 9, 2016, 1:33 pm

≫ Next: Where does WMI get the location data on a windows server?

≪ Previous: Alert Action - Run Linux Script With Putty

This may be very common knowledge among the MIB Masters but I just wanted to post this very brief configuration to help some poor soul that has the configure a poller to capture the SI of a CheckPoint Firewall as a percentage.

To spare you all a long story, I performed a brute force review of an SNMP walk in desperation and fought for hours to find an OID that would report on the System Interrupts for a CPU in a CheckPoint device. This was requested by our client who was replacing a Cacti server with a Solarwinds monitoring system.

So for all of you about to pull out your hair, try the following OIDs and transformer to see if that gets the results you want. If else, hang in there and look to the SNMP-walk. It is very possible that the feature you are looking to poll on is not supported by your device or the version of SNMP polling. Above all else, don’t forget to breathe.

****Disclaimer**** These OIDs will not work with every Checkpoint. Makes sure you perform a SNMP Walk on the desired device to get a list of the MIBs that are compatible with your device.

CheckPoint SI time:

1.3.6.1.4.1.2021.11.56 ssCpuRawInterrupts: interruptlevel CPU time. BSD

1.3.6.1.4.1.2021.11.52 ssCpuRawSystem: system CPU time

Convert with the following:

100*({ssCpuRawInterrupt}/{ssCpuRawSystem})

In our environment we have it polling every 5 minutes and created an alert with the following conditions:

Alert on:

Custom Node Poller

Scope of Alert:

Only following sets of objects

Node | DeviceType| is equal to | CheckPoint Firewall

The actual trigger condition:

All Child conditions must be satisfied (AND)

Custom Node Poller | Unique Name (Custom Poller) | is equal to | {your converted poller}

Custom Node Poller | Current Numeric Value | is greater than or equal to | 45

Condition must exist for more than 30 Minutes

↧

Where does WMI get the location data on a windows server?

November 15, 2013, 10:24 am

≫ Next: Getting Hardware polling failed: ProviderModule is Disabled Scope

≪ Previous: CheckPoint SI Percentage OID

After searching diligently through the forums and manuals, I can not find where the location field is read for a windows server when I am polling it with WMI. Could someone please point me in the right direction?

Thanks,

-Cory

↧