With two bulletins published by Hewlett Packard Enterprise (HPE), several solid state disks (SSD) were declared vulnerable to a software bug, which causes the counter for uptime hours to overflow after 32768 or 4000 hours and renders the disk completely inaccessible. A quote from the vendor:

This … firmware is considered a critical fix and is required to address the issue detailed below. HPE strongly recommends immediate application of this critical fix. Neglecting to update to SSD Firmware Version … will result in drive failure and data loss at 40,000 (or 32,768) hours of operation and require restoration of data from backup if there is no fault tolerance, such as RAID 0 or even in a fault tolerance RAID mode if more SSDs fail than can be supported by the fault tolerance of the RAID mode on the logical drive. Example: RAID 5 logical drive with two failed SSDs.

One of our customers asked us for help with identifying the affected drives, since they noticed some of their servers being affected. We have written a custom Icinga plugin to check for affected drives and to identify where firmware updates are required. The only requirement is SNMP access to the servers or devices that need to be checked. The plugin lists all found drives, compares them against a list of affected models and compares the firmware version against the recommended fix by HPE.

When everything is fine, you should see something like this:

OK - All 2 controllers and 33 drives seem fine
[OK] controller (0) model=p816i-a serial=XXX firmware=1.65 - firmware older than affected
[OK] controller (4) model=p408e-p serial=XXX firmware=1.65 - firmware older than affected
[OK] (0.9 ) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=8086
[OK] (0.11) model=EK000400GWEPE serial=XXX firmware=HPG0 hours=8086
[OK] (0.12) model=EK000400GWEPE serial=XXX firmware=HPG0 hours=8086
[OK] (0.14) model=MO003200JWFWR serial=XXX firmware=HPD2 hours=8086
[OK] (4.0 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.1 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.2 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.3 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.4 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied
[OK] (4.5 ) model=MO3200JFFCL serial=XXX firmware=HPD8 hours=7568 - firmware update applied

You can find the plugin on GitHub under check_hp_firmware where the release page provides the built binaries for Linux.

Feedback or questions are welcome as GitHub issues, directly in the project.

Please make sure you have also read the official documents from HPE:

Update 2020-04-09: The plugin was enhanced to check for controller firmware vulnerabilities as well, and is now named check_hp_firmware. See the new blog post.

Markus Frosch
Markus Frosch
Principal Consultant

Markus arbeitet bei NETWAYS als Principal Consultant und unterstützt Kunden bei der Implementierung von Nagios, Icinga und anderen Open Source Systems Management Tools. Neben seiner beruflichen Tätigkeit ist Markus aktiver Mitarbeiter im Debian Projekt.