When Up/Down by IP isn't enough...

January 24, 2020, 10:36 am

≫ Next: Regex for Node Names in Dynamic Group Custom Queries

Hello Everyone!

I have a situation. Recently (Last night) the AV in our environment was patched and caused a number of Windows Servers to hang during reboot. The servers were pingable, therefore were not throwing out any DOWN alerts. What process or service would best be monitored by WMI on approximately 2200 Windows Servers that wouldn't tax SolarWinds or the pollers?

Any ideas would be greatly appreciated!

Thanks!

Scott

↧

Regex for Node Names in Dynamic Group Custom Queries

January 24, 2020, 10:22 am

≫ Next: When Up/Down by IP isn't enough...

≪ Previous: When Up/Down by IP isn't enough...

I'm hoping to get some help here on this.

I'm trying to build Dynamic Groups to automate template application in SAM (yay!).

In order to meet my requirements for the group, I need to use regular expressions since simple "contains" and "ends with" and so on do not allow for enough sophistication.

A support ticket gave me the answer that regex is supported by Dynamic Groups, and that it is supported for Node Names.

I have found many examples on the Thwack forums of regex being used for IPs, but 0 for node names. I've been unsuccessful in creating any regular expressions on node names as well.

Has anyone had any luck with this?

↧

When Up/Down by IP isn't enough...

January 24, 2020, 9:54 am

≫ Next: SAM 2019.4.1 - Upgrading from My Orion Deployment

≪ Previous: Regex for Node Names in Dynamic Group Custom Queries

Hey Everyone!

I am having an issue with monitoring our Windows Servers. In particular, last night our AV was patched and servers were restarted but upon reboot some failed to reboot all the way, Servers were pingable so they didn't throw out any DOWN alerts, but were not functional.

Any ideas on what services or process I could monitor on approximately 2500 servers for potential failed patching that causes the server to report as UP, but the server is non-responsive that won't tax SolarWinds and my polling engines?

Any advice is appreciated!

Thanks!

Scott

PS - Running SAM 6.7.0

↧

SAM 2019.4.1 - Upgrading from My Orion Deployment

December 23, 2019, 8:34 am

≫ Next: Dynamic Log Parser.apm-template

≪ Previous: When Up/Down by IP isn't enough...

Estimated reading time 2 minutes

In the comments of What We're Working On For Server & Application Monitor (Updated December, 2019) I noted that for those of you who are on SAM 6.9 and higher (including 2019.4.x), you now have the ability to upgrade your entire Orion deployment from the web console. In a previous walk-through I went through the steps of upgrading to SAM 6.9.1 (SAM 6.9.1 - Upgrading from My Orion Deployment ) which you can reference to see the changes in this workflow. Here, I'll be doing a similar walk-through with my upgrade from SAM 2019.4 to 2019.4.1.

Release Notes

Planning My Upgrade

In the screenshot below you can see I have my Orion deployment installed with SAM 2019.4 and Orion Platform 2019.4 Hotfix 1

Clicking on Settings -> My Orion Deployment -> Updates Available takes me to the following screen. I now have a few options including 'Upgrade all products and evaluations to the latest version",

"Install only recommended patches and hot fixes",

and "Install only product evaluations."

I'm going to keep my selection to my preferred option of 'Upgrade all products and evaluations to the latest versions.' Clicking "Next" will allow me to see whether I meet the conditions to upgrade.

Looks like my connection to all the servers in my deployment is good.

Clicking 'Next" will allow me to see if there are any potential problems that would prevent my upgrade from occurring.

It looks like I need to confirm that I have backed up my database and check that my .NET framework meets the minimum system requirements of .NET 4.8

For those of you who aren't ready to upgrade and were looking for what changes would have occurred, at this point simply hit 'cancel' and plan for your next change window. I like to take screenshots of the issues spotlighted here so I'm really ready with all the changes necessary.

Many customers aren't aware of this, but by canceling here, you're able to utilize the system checks as a planning tool to be prepared for your upgrade window. Selections up to this point in the upgrade process do not impact your system, so you can run through these steps as many times as necessary.

Upgrading

Personally I'm ready to upgrade now, so I will confirm my database backup,

Accept the EULA,

and proceed with my upgrade.

Now I'm going to brew some tea and catch up on some THWACK requests while I'm waiting for my upgrade to finish.

Note: while my main polling engine is being upgraded, my web console is down. During this time, installation progress will be shown on a temporary page, but after completion, I will be redirected to my main web console. I recommend not closing this temporary page because the only way to navigate back to it is to RDP into the main polling engine server and clicking on the shortcut "My Orion Deployment"

Now that my main polling engine has been upgraded, I've been redirected to the main Orion web console. I logged back into the web console and am treated to this glorious sight.

and I'm done!

Have you been using the web console upgrade? If so, how's it treating you? Let the product team know if we need to make some improvements, but I'm sure those of you with distributed systems are probably very interested in line item on our What We're Working On For Server & Application Monitor (Updated December, 2019) regarding: Centralized upgrades - pre-stage upgrades for reduced downtime

If you are and are interested in giving us feedback then: Sign Up to Participate in SolarWinds Feedback Sessions

Happy upgrading!

↧

Dynamic Log Parser.apm-template

January 24, 2020, 1:56 pm

≫ Next: Synthetic Testing of Multiple Services

≪ Previous: SAM 2019.4.1 - Upgrading from My Orion Deployment

↧

Synthetic Testing of Multiple Services

January 23, 2020, 9:38 am

≫ Next: Help with output

≪ Previous: Dynamic Log Parser.apm-template

I know I've seen several mentions of synthetic testing in the past and even found a PDF for SolarWinds Synthetic End User Monitor, but I can't find an actual product that does this.

I see Quality of Experience monitoring, but nothing actually performing some of the synthetic testing.

What I'm getting at is, how can I ensure services such as DHCP, DNS, or websites are loading and functioning in an acceptable time-frame?

I would like something to perform DNS look-ups and ensure the response is both correct and received within a few milliseconds.

I would imagine this being an agent you could install on a Windows or Linux server, preferably something that could run on Raspberry Pi and toss in a closet at some of our remote sites.

Thank you!

↧

Help with output

January 23, 2020, 4:20 am

≫ Next: App insight for MS SQL - SAM component monitor consumption

≪ Previous: Synthetic Testing of Multiple Services

Hi,

I just created a power shell script to check on the numbers of days before the cert expire.

The scripts runs fine in power shell with the correct output as numbers.

Can someone please guide me how to put the 360 as a statistics data so that I can configure it in the alert trigger.

Thank You

regards,
Alex

↧

App insight for MS SQL - SAM component monitor consumption

January 20, 2020, 3:20 am

≫ Next: Lenovo ThinkSystems Servers - Anyone currently monitoring, if so and how?

≪ Previous: Help with output

Dear Solarwinds Experts,

I been involved in 3 different client projects, where SAM was scoped to monitor MS SQL Databases & we activated it using Appinsight for MS SQL.

As per document, Appinsight for Mssql tenmplates consume - 50 component monitors, but ideally when i assign it to MS SQL servers, this particular application template itself consuming on average 300 -500 component monitors.

My constrains are not with the SAM licensing, i am seeing problem with Additional polling engine scalability limit, As Each Ape can accommodate 10K component monitors, if we on-board a single MS SQL servers,, it is consuming ~800 components , as shown in below case.

I am using SWQL utility to verify my understanding,

In Application table, i am retrieving Application ID (41) for MySQL application that is applied to Node id (75)

And if i check for the component list , related to Application ID 41, the count goes beyond 800.

Could some one , help me to understand, any thing wrong with my understanding..

↧

Lenovo ThinkSystems Servers - Anyone currently monitoring, if so and how?

January 27, 2020, 9:03 am

≫ Next: What We're Working On For Server & Application Monitor (Updated December, 2019)

≪ Previous: App insight for MS SQL - SAM component monitor consumption

Hi,

We're looking to migrate our data-center to Lenovo hardware. We're currently about 60% VMWare and 40% physical Windows and Linux. On the VMWare side, we're getting hardware heath through the VMWare API integration.

But, on the physical Windows and Linux boxes I've been unable to find any working hardware monitoring solutions. I wanted to reach out and see, is anyone else monitoring Lenovo servers, DE's, anything Lenovo not through vmware?

We have the feature request in to see if we can get xClarity API integration, but wanted to see if anyone has a makeshift solution in place currently.

Thanks!

↧

What We're Working On For Server & Application Monitor (Updated December, 2019)

May 15, 2015, 5:53 pm

≫ Next: NOC VIEW ROTATION TAB

≪ Previous: Lenovo ThinkSystems Servers - Anyone currently monitoring, if so and how?

The latest release of Server & Application Monitor (SAM) is available on solarwinds.com and in your customer portal. See the release notes for a comprehensive look at the features contained within. >> SAM 2019.4 Release Notes

You ask, we listen. Many of the top features being worked on in SAM are generated through your feedback, your participation in our user sessions and your votes in our Server & Application Monitor Feature Requestsforum.

Simplified API polling – targeting streamlined support for Azure and Microsoft APIs, additional features for status/response time monitoring and improvements for parsing complex data structures.
Scale improvements - increasing supported component count per polling engine in a single SAM instance
WinRM support
- WS-Manangement (WinRM) Support for Application Monitors
Improved custom property management - redesigned custom property editor
Standardizing Status in Orion - Volume status improvements
- Volume Management and Status Feature
- Granular Disk Thresholds
New dashboard framework - next generation summary dashboard framework
- Create Custom Pie Charts
- Allow Resources to Span Multiple Columns
UI performance optimizations - Faster and more responsive web UI
Centralized upgrades - pre-stage upgrades for reduced downtime
Orion maps - bridging the feature parity gap with Network Atlas

Give Us Feedback

We actively refine the product roadmap to solve your problems. Participate in user sessions for THWACK points and personalized input into the future of SAM.

↧

NOC VIEW ROTATION TAB

March 22, 2019, 12:40 am

≫ Next: Application Mapping

≪ Previous: What We're Working On For Server & Application Monitor (Updated December, 2019)

I cannot find the noc rotation view tab. Are there any suggestions please. Thank you.

↧

Application Mapping

January 27, 2020, 11:28 am

≫ Next: AppInsight for SQL Deadlocks/sec on Solarwinds DB

≪ Previous: NOC VIEW ROTATION TAB

Hello All,

Does anyone know of a way to discover a server mapping showing the communication for servers? Ultimately, we would want to select a node and see what other servers it's communicating with. We are trying to automate a process of using SW to help find what servers coincide with different applications. Some of our application platforms can consist of 20 servers, the challenge is how we would use SW to help with the discovery.

Thanks!

↧

AppInsight for SQL Deadlocks/sec on Solarwinds DB

January 28, 2020, 12:36 am

≫ Next: Expected Downtime of an Application

≪ Previous: Application Mapping

Hi All

we have SQL 2016 for our backend and we have appinsight for SQL monitoring it. All seems to be good, except the deadlocks/sec is always above 0, usually 0.10

Are there anything we should be looking at in order to sort out this issue? Everything else seems to be OK.

I'm wondering if there are some standard processes we should be going through to rectify this as obviously appinsight is showing as critical on the Monitoring DB server

This is all on the monitoring DB, no other SQL DBs seem to have this issue

↧

Expected Downtime of an Application

January 28, 2020, 4:15 am

≫ Next: Upgrade to 2019.4.1: Configuration wizard problem

≪ Previous: AppInsight for SQL Deadlocks/sec on Solarwinds DB

Hello,

I have an application being monitored on SolarWinds that sometimes goes down as expected so a backup can take place. Is there any way for me to stop alerts being generated for when it goes down? I've considered muting and unmanaging but I'd rather just filter it out of the alerts altogether because there are times when it is expected to go down and other times not. Is it possible to set an alert so that it will be generated when it unexpectedly goes down?

Many thanks.

↧

Upgrade to 2019.4.1: Configuration wizard problem

January 28, 2020, 7:23 am

≫ Next: Alert action execute external program

≪ Previous: Expected Downtime of an Application

Hello to the THWACK community,

I am currently having trouble upgrading from SAM 6.7.1 (Orion Platform 2018.4 HF3) to SAM 2019.4.1

The actual software install seems to go fine, but a problem arises during the configuration wizard.

I get to this point, and the wizard hangs and runs transactions continually on the DB until the transaction log fills up at 100GB.

Overall Progress: 72.2%

Configuring general components for plugins 98.8% - Configuring Cortex Integration General

It sits here for approximately 30 minutes, then will fail.

According to a DBA that looked in to the database during the 'hang' period:

"There is a common table expression running that is filling up the log"

I can see a repeating pattern in the configuration wizard log during this time:

2020-01-09 20:27:29,918 [147] DEBUG SqlHelper - SQL: IF EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'[dbo].[HA_PoolMembersView]') AND type in (N'V'))

AND EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'dbo.HA_PoolMembers') AND type in (N'U'))

BEGIN

SELECT PoolMemberId, PoolMemberType, PoolId, HostName, ElectionPriority, Priority, PreferredStatus, PreferredStatusTimestamp, PreferredStatusRevision, Status, StatusMessage, ReasonOfFail, ReasonOfFailRevision, HeartBeat, LastHeartBeatTimestamp, PoolIdRevision

FROM dbo.HA_PoolMembersView

WHERE PoolId<>0 AND (PoolMemberType='MainPoller' OR PoolMemberType='MainPollerStandby') ORDER BY HostName

END

2020-01-09 20:27:29,918 [147] DEBUG SqlHelper - SQL: SELECT TOP 1 ServerName FROM dbo.Engines WITH (NOLOCK) WHERE ServerType = 'Primary' ORDER BY KeepAlive DESC

2020-01-09 20:27:29,918 [147] DEBUG MessageBusTopologyProvider - No message bus host change detected.

2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown

2020-01-09 20:28:05,981 [151] DEBUG ScheduledTask - Running SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry

2020-01-09 20:28:05,981 [151] DEBUG ScheduledTask - SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry succeeded

2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Sleeping for 1.00:00:00.0200000

2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown

2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Sleeping for 00:01:00.0200000

2020-01-09 20:28:11,153 [156] DEBUG SqlHelper - SQL: SELECT CAST((CASE WHEN SERVERPROPERTY('edition') = 'SQL Azure' THEN 1 ELSE 0 END) AS INT)

2020-01-09 20:28:11,153 [156] INFO CwActiveInstanceChecker - Extending expiration for active instance of ConfigWizard on machine ITIS-SOLWIND1 (138.26.53.177)

2020-01-09 20:28:11,153 [156] DEBUG SqlHelper - SQL: SELECT CAST((CASE WHEN SERVERPROPERTY('edition') = 'SQL Azure' THEN 1 ELSE 0 END) AS INT)

2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown

2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Sleeping for 1.00:00:00.0200000

2020-01-09 20:29:06,013 [168] DEBUG ScheduledTask - Running SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry

2020-01-09 20:29:06,013 [168] DEBUG ScheduledTask - SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry succeeded

2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown

2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Sleeping for 00:01:00.0200000

2020-01-09 20:29:29,936 [157] DEBUG SqlHelper - SQL: IF EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'[dbo].[HA_PoolMembersView]') AND type in (N'V'))

AND EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'dbo.HA_PoolMembers') AND type in (N'U'))

BEGIN

FROM dbo.HA_PoolMembersView

WHERE PoolId<>0 AND (PoolMemberType='MainPoller' OR PoolMemberType='MainPollerStandby') ORDER BY HostName

END

2020-01-09 20:29:29,936 [157] DEBUG SqlHelper - SQL: SELECT TOP 1 ServerName FROM dbo.Engines WITH (NOLOCK) WHERE ServerType = 'Primary' ORDER BY KeepAlive DESC

2020-01-09 20:29:29,936 [157] DEBUG MessageBusTopologyProvider - No message bus host change detected.

2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown

2020-01-09 20:30:06,039 [173] DEBUG ScheduledTask - Running SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry

2020-01-09 20:30:06,039 [173] DEBUG ScheduledTask - SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry succeeded

2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Sleeping for 1.00:00:00.0200000

2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown

2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Sleeping for 00:01:00.0200000

This continues until the database transaction log fills up.

Before the upgrade, the database mdf was 34GB allocated with 22GB data inside. The log file was 45GB allocated with 1GB of data inside.

After the wizard fails. the DB is 67GB and the log is 96GB

I contacted support 4 weeks ago, and since then it has been a continual cycle of "I see X error in the log, try this", with the "try this" part taking several hours at the least, and leaving me where I started every time. The latest thing to try is to stand up a new server and install Orion there. This seems like it will be a painful process to me.

I definitely appreciate anyone taking the time to read through this. By posting here, I am hoping to get some help or advice that will lead me forward.

I have a few questions:

1. Is reinstalling Orion on a new server a common troubleshooting step?

2. If reinstalling on a new server is the best path forward, what can I expect to lose in the move? I am looking for any gotchas like data that is not stored in the databse. I don't *think* we have anything custom, but what should I look for?

I currently have the following in Orion:

~550 servers

~2000 volumes,

111 Application monitors (1331 component monitors)

20 universal device pollers

67 alerts

↧

Alert action execute external program

June 24, 2019, 8:23 am

≫ Next: SolarWinds Module Interaction: WPM / SAM with AppMon

≪ Previous: Upgrade to 2019.4.1: Configuration wizard problem

We have a service that hangs, but appears to still be running. By monitoring a log file, we can detect when the service loses connection to a PBX. What we need to do is to execute several commands (Batch file) that does the following:

REM When a disconnect occurs, the server may be unable to reconnect for 2 or 3 minutes.

REM Set a delay of 3 minutes before restarting the hung services.

Timeout /t 180 /NOBREAK

REM Stop the Windows Service that lost connectivity.

net stop "<ServiceName>"

REM Wait 30 seconds to allow the service to terminate.

Timeout /t 30 /NOBREAK

REM Kill any remaining instances of the service tasks still in memory.

REM This will terminate all running tasks with the name of <ServiceName.exe>

REM The /F flag is for Forcing the tasks to terminate.

TaskKill /IM <ServiceName.exe> /F

REM Wait 10 seconds to allow the tasks to terminate.

Timeout /t 10 /NOBREAK

REM Start the service.

Net Start "<ServiceName>"

What we need is a way to execute the equivalent of these steps from SolarWinds, on the impacted Node. We have set up a monitor that watches for new instances of the lost connection string in the application log file. The next step is sending emails alerts to the app admin team, and automatically restart the hung service.

I believe SAM service restart may be part of this. But I am not sure how to use it if SAM doesn't detect the service has failed. If someone would point me to an article or tutorial for setting this up, I would appreciate it.

Thanks,

↧

SolarWinds Module Interaction: WPM / SAM with AppMon

January 23, 2020, 8:53 am

≫ Next: Sending an alert when the sum of my statistics is greater than 10?

≪ Previous: Alert action execute external program

Hello -

I'm still relatively new to SW so I've been doing learning by fire

I have a SW server with some custom App Monitors configured.

The server has SAM and WPM installed right now but we are looking at removing WPM.

My question is, is there a way for me to tell if a customized Application Monitor is specifically using WPM. This is the question I need answered.

Also, generally, is there a way to tell what module (SAM, WPM, NCM, NTA, etc.) is being used on a server in alert, application monitors, etc.? With all of the different modules, at times it's difficult to ascertain which module is being used for a specific purpose.

TIA.

↧

Sending an alert when the sum of my statistics is greater than 10?

January 28, 2020, 1:43 pm

≫ Next: Widget for Warning & Critical Components

≪ Previous: SolarWinds Module Interaction: WPM / SAM with AppMon

I'm using a powershell monitor with five script outputs.

Each script output has their own display name and threshold and this has been working for over a year now.

(I've had various outputs reach the "warning" threshold and I have alerts setup for that; it works great).

But I've had a few cases where multiple script outputs reach the warning threshold at the same time, which indicated a bigger issue that I need to alert on.

Thus, is it possible to create an alert that will send an email when the sum of all the thresholds is greater than 10?

Here are my 5 script outputs:

https://i.imgur.com/uJkDQ9n.png

Is this how you do it? (This is my alert config screen)

I can't tell if this is going to work because I'm not sure if it is summing up all the statistics?

↧

Widget for Warning & Critical Components

January 10, 2020, 2:59 am

≫ Next: SAM Event Log monitor, not working with keyword

≪ Previous: Sending an alert when the sum of my statistics is greater than 10?

Hi all, looking to create a widget that lists all warning/critical components within templates. Specifically those within AppInsight Templates but not limited to.

Any easy way to do this without scripting?

Thanks

↧

SAM Event Log monitor, not working with keyword

January 29, 2020, 8:55 am

≫ Next: installing Additional polling engine in HA Environment

≪ Previous: Widget for Warning & Critical Components

I have an event log monitor that works, unless I add keyword matching.

In the section Include events:, I enter "Cannot Create Thread"

The event description comes across as:

File: This is a test - Reason Cannot Create Thread. error: The operation completed successfully.

Line: %2.

Reason: %3

Am I missing something?

↧