Hello to the THWACK community,
I am currently having trouble upgrading from SAM 6.7.1 (Orion Platform 2018.4 HF3) to SAM 2019.4.1
The actual software install seems to go fine, but a problem arises during the configuration wizard.
I get to this point, and the wizard hangs and runs transactions continually on the DB until the transaction log fills up at 100GB.
Overall Progress: 72.2%
Configuring general components for plugins 98.8% - Configuring Cortex Integration General
![]()
It sits here for approximately 30 minutes, then will fail.
![]()
According to a DBA that looked in to the database during the 'hang' period:
"There is a common table expression running that is filling up the log"
I can see a repeating pattern in the configuration wizard log during this time:
2020-01-09 20:27:29,918 [147] DEBUG SqlHelper - SQL: IF EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'[dbo].[HA_PoolMembersView]') AND type in (N'V'))
AND EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'dbo.HA_PoolMembers') AND type in (N'U'))
BEGIN
SELECT PoolMemberId, PoolMemberType, PoolId, HostName, ElectionPriority, Priority, PreferredStatus, PreferredStatusTimestamp, PreferredStatusRevision, Status, StatusMessage, ReasonOfFail, ReasonOfFailRevision, HeartBeat, LastHeartBeatTimestamp, PoolIdRevision
FROM dbo.HA_PoolMembersView
WHERE PoolId<>0 AND (PoolMemberType='MainPoller' OR PoolMemberType='MainPollerStandby') ORDER BY HostName
END
2020-01-09 20:27:29,918 [147] DEBUG SqlHelper - SQL: SELECT TOP 1 ServerName FROM dbo.Engines WITH (NOLOCK) WHERE ServerType = 'Primary' ORDER BY KeepAlive DESC
2020-01-09 20:27:29,918 [147] DEBUG MessageBusTopologyProvider - No message bus host change detected.
2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:28:05,981 [151] DEBUG ScheduledTask - Running SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry
2020-01-09 20:28:05,981 [151] DEBUG ScheduledTask - SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry succeeded
2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Sleeping for 1.00:00:00.0200000
2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:28:05,981 [Scheduler] DEBUG Scheduler - Sleeping for 00:01:00.0200000
2020-01-09 20:28:11,153 [156] DEBUG SqlHelper - SQL: SELECT CAST((CASE WHEN SERVERPROPERTY('edition') = 'SQL Azure' THEN 1 ELSE 0 END) AS INT)
2020-01-09 20:28:11,153 [156] INFO CwActiveInstanceChecker - Extending expiration for active instance of ConfigWizard on machine ITIS-SOLWIND1 (138.26.53.177)
2020-01-09 20:28:11,153 [156] DEBUG SqlHelper - SQL: SELECT CAST((CASE WHEN SERVERPROPERTY('edition') = 'SQL Azure' THEN 1 ELSE 0 END) AS INT)
2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Sleeping for 1.00:00:00.0200000
2020-01-09 20:29:06,013 [168] DEBUG ScheduledTask - Running SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry
2020-01-09 20:29:06,013 [168] DEBUG ScheduledTask - SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry succeeded
2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:29:06,013 [Scheduler] DEBUG Scheduler - Sleeping for 00:01:00.0200000
2020-01-09 20:29:29,936 [157] DEBUG SqlHelper - SQL: IF EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'[dbo].[HA_PoolMembersView]') AND type in (N'V'))
AND EXISTS(SELECT * FROM [dbo].[sysobjects] WHERE id = OBJECT_ID(N'dbo.HA_PoolMembers') AND type in (N'U'))
BEGIN
SELECT PoolMemberId, PoolMemberType, PoolId, HostName, ElectionPriority, Priority, PreferredStatus, PreferredStatusTimestamp, PreferredStatusRevision, Status, StatusMessage, ReasonOfFail, ReasonOfFailRevision, HeartBeat, LastHeartBeatTimestamp, PoolIdRevision
FROM dbo.HA_PoolMembersView
WHERE PoolId<>0 AND (PoolMemberType='MainPoller' OR PoolMemberType='MainPollerStandby') ORDER BY HostName
END
2020-01-09 20:29:29,936 [157] DEBUG SqlHelper - SQL: SELECT TOP 1 ServerName FROM dbo.Engines WITH (NOLOCK) WHERE ServerType = 'Primary' ORDER BY KeepAlive DESC
2020-01-09 20:29:29,936 [157] DEBUG MessageBusTopologyProvider - No message bus host change detected.
2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:30:06,039 [173] DEBUG ScheduledTask - Running SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry
2020-01-09 20:30:06,039 [173] DEBUG ScheduledTask - SolarWinds.Orion.LogMgmt.RuleProcessing.Rules.RuleDataSynchronizerLoadRetry succeeded
2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Sleeping for 1.00:00:00.0200000
2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Checking if Scheduler should shutdown
2020-01-09 20:30:06,039 [Scheduler] DEBUG Scheduler - Sleeping for 00:01:00.0200000
This continues until the database transaction log fills up.
Before the upgrade, the database mdf was 34GB allocated with 22GB data inside. The log file was 45GB allocated with 1GB of data inside.
After the wizard fails. the DB is 67GB and the log is 96GB
I contacted support 4 weeks ago, and since then it has been a continual cycle of "I see X error in the log, try this", with the "try this" part taking several hours at the least, and leaving me where I started every time. The latest thing to try is to stand up a new server and install Orion there. This seems like it will be a painful process to me.
I definitely appreciate anyone taking the time to read through this. By posting here, I am hoping to get some help or advice that will lead me forward.
I have a few questions:
1. Is reinstalling Orion on a new server a common troubleshooting step?
2. If reinstalling on a new server is the best path forward, what can I expect to lose in the move? I am looking for any gotchas like data that is not stored in the databse. I don't *think* we have anything custom, but what should I look for?
I currently have the following in Orion:
~550 servers
~2000 volumes,
111 Application monitors (1331 component monitors)
20 universal device pollers
67 alerts