Endlich bin ich dazu gekommen, den schon im Post vom letzten Sommer angekündigten Check für S.M.A.R.T. -Werte zu veröffentlichen.
Ich hatte es ja schon angedroht, nun mache ich es wahr. 😉
Wir haben diesen Check nun schon ein gutes Jahr im Betrieb und er liefert seine Daten von daher kann ich ihn nur wärmstens empfehlen.
Bilder gibt es dieses Mal auch zu sehen, lasst sie am besten einfach mal auf euch wirken.
Wenn der Eine oder Andere von euch noch weitere tolle Ideen/Vorschläge hat, dann lasst es uns wissen. Wir lieben Monitoring, von daher stehen die Chancen recht gut, dass wir noch mehr umsetzen wollen.
Ich schrieb ja auch schon, dass die freundlichen Kollegen der Thomas Krenn AG auch ein Plugin für das Ermitteln der S.M.A.R.T. -Werte gebaut haben, dieses erfüllt im Wesentlichen auch die gleichen Aufgaben. Ich wollte das Ganze aber doch noch ein wenig toppen.
Hier ein paar Beispiele wie es aufzurufen ist:

# cd /path/to/check/smartvalues
# ./check_smartvalues -db check_smartvalues.db.json -c check_smartvalues.db.json -s -d 'megaraid,22 /dev/sda'
OK: [ 22 OK ] - [ 0 WARNING ] - [ 0 CRITICAL ] - [ 0 UNKNOWN ] in 0.163578987121582ms ( for details pls take a look in longoutput )
OK: on LSI MegaRAID - [ DeviceID 22 ] - #5 - Reallocated_Sector_Ct is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #9 - Power_On_Hours is [ 100 ] - [ w: 200, c: 300 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #12 - Power_Cycle_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #170 - Available_Reserved_Space is [ 100 / 010 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #171 - Program_Fail_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #172 - Erase_Fail_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #174 - Unexpected_Power_Loss is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #183 - Runtime_Bad_Block is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #184 - End-to-End_Error is [ 100 / 090 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #187 - Reported_Uncorrect is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #190 - Airflow_Temperature_Cel is [ 31 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #192 - Power-Off_Retract_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #199 - UDMA_CRC_Error_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #225 - Host_Writes is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #226 - Timed_Workload_Media_Wear is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #227 - Timed_Workload_Host_Read/Write _Ratio is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #228 - Power-off_Retract_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #232 - Available_Reservd_Space is [ 100 / 010 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #233 - Media_Wearout_Indicator is [ 043 ] - [ w: 35, c: 25 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #241 - Total_LBAs_Written is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #242 - Total_LBAs_Read is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #249 - Total_NAND_Writes is [ 100 ]

oder mit mehr als nur einer Platte …

# ./check_smartvalues -db check_smartvalues.db.json -c check_smartvalues.db.json -s -d 'megaraid,22 /dev/sda' -d 'megaraid,21 /dev/sda'
OK: [ 44 OK ] - [ 0 WARNING ] - [ 0 CRITICAL ] - [ 0 UNKNOWN ] in 0.18438987121582ms ( for details pls take a look in longoutput )
OK: on LSI MegaRAID - [ DeviceID 22 ] - #5 - Reallocated_Sector_Ct is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #9 - Power_On_Hours is [ 100 ] - [ w: 200, c: 300 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #12 - Power_Cycle_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #170 - Available_Reserved_Space is [ 100 / 010 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #171 - Program_Fail_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #172 - Erase_Fail_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #174 - Unexpected_Power_Loss is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #183 - Runtime_Bad_Block is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #184 - End-to-End_Error is [ 100 / 090 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #187 - Reported_Uncorrect is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #190 - Airflow_Temperature_Cel is [ 31 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #192 - Power-Off_Retract_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #199 - UDMA_CRC_Error_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #225 - Host_Writes is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #226 - Timed_Workload_Media_Wear is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #227 - Timed_Workload_Host_Read/Write _Ratio is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #228 - Power-off_Retract_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #232 - Available_Reservd_Space is [ 100 / 010 ]
OK: on LSI MegaRAID - [ DeviceID 22 ] - #233 - Media_Wearout_Indicator is [ 043 ] - [ w: 35, c: 25 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #241 - Total_LBAs_Written is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #242 - Total_LBAs_Read is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #249 - Total_NAND_Writes is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #5 - Reallocated_Sector_Ct is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #9 - Power_On_Hours is [ 100 ] - [ w: 200, c: 300 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #12 - Power_Cycle_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #170 - Available_Reserved_Space is [ 100 / 010 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #171 - Program_Fail_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #172 - Erase_Fail_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #174 - Unexpected_Power_Loss is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #183 - Runtime_Bad_Block is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #184 - End-to-End_Error is [ 100 / 090 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #187 - Reported_Uncorrect is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #190 - Airflow_Temperature_Cel is [ 32 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #192 - Power-Off_Retract_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #199 - UDMA_CRC_Error_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #225 - Host_Writes is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #226 - Timed_Workload_Media_Wear is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #227 - Timed_Workload_Host_Read/Write _Ratio is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #228 - Power-off_Retract_Count is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #232 - Available_Reservd_Space is [ 100 / 010 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #233 - Media_Wearout_Indicator is [ 041 ] - [ w: 35, c: 25 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #241 - Total_LBAs_Written is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #242 - Total_LBAs_Read is [ 100 ]
OK: on LSI MegaRAID - [ DeviceID 21 ] - #249 - Total_NAND_Writes is [ 100 ]

und hier das Ergebnis im neuen Icinga Web 2 …
sdd22-smart-values
 
Nützliche Links: