How are you Linux gurus handling the disk space reporting discrepancy that can, among other things, hide from Oracle DBAs the fact that their file systems are nearly ready to crash?
Configuration:
- SAM 6.2.1
- LINUX: RHEL 6.4 6.7
- NET-SNMP: 5.5-44 & 5.5-54
Issue: SolarWinds uses user space consumed rather than user+reserve to calculate space remaining
$ df -B1 -P /lv*[69]
Filesystem 1-blocks Used Available Capacity Mounted on
/dev/mapper/vgdata1-lvoraundo 52710469632 49985454080 40660992 100% /lvi3pld06
/dev/mapper/vgbackup1-lvorabackup 369766273024 329342873600 21638115328 94% /lvi3pld09
Math
{1-block} - {Used} != {Available}
52710469632 - 49985454080 = 2725015552 ( != 40660992 )
Difference: 2725015552 - {Available} = 2725015552 - 40660992 = 2684354560 --> 5% of {1-block}
However,
{1-block} - ( {Used} + ( 0.05 * {1-block} ) ) == {Available}
This is due to the EXT file system reporting the reserved buffer space in its' total space, and DF is reporting the Available space as what is truly available to user processes on the system.
SNMP OID reponses:
.1.3.6.1.2.1.25.2.3.1.1.46 = INTEGER: 46
.1.3.6.1.2.1.25.2.3.1.2.46 = OID: .1.3.6.1.2.1.25.2.1.4
.1.3.6.1.2.1.25.2.3.1.3.46 = STRING: /lvi3pld06
.1.3.6.1.2.1.25.2.3.1.4.46 = INTEGER: 4096 Bytes
.1.3.6.1.2.1.25.2.3.1.5.46 = INTEGER: 12868767
.1.3.6.1.2.1.25.2.3.1.6.46 = INTEGER: 12203480
.1.3.6.1.2.1.25.2.3.1.1.49 = INTEGER: 49
.1.3.6.1.2.1.25.2.3.1.2.49 = OID: .1.3.6.1.2.1.25.2.1.4
.1.3.6.1.2.1.25.2.3.1.3.49 = STRING: /lvi3pld09
.1.3.6.1.2.1.25.2.3.1.4.49 = INTEGER: 4096 Bytes
.1.3.6.1.2.1.25.2.3.1.5.49 = INTEGER: 90274969
.1.3.6.1.2.1.25.2.3.1.6.49 = INTEGER: 80405975
.1.3.6.1.2.1.25.4.2.1.1.49 = INTEGER: 49
Math that SW uses: {.1.3.6.1.2.1.25.2.3.1.5.X} - {.1.3.6.1.2.1.25.2.3.1.6.X} = Available
--> this is the Available they then use in their calculations of free space...but as you can see in the numbers, it makes the math fault again, Not representing the buffer space on EXT file systems.
Consequence:
Graphs, alerts, etc., all show 5% more available than there actually is. This requires quite a bit of training for users who get alerts, etc., as a 95% utilization report on a file system that is 1T, thinking they have plenty of space left...which they don't.
How are you Linux gurus handling this through SolarWinds/Orion/SAM?