All of our Dell Servers are managed via OMSA, so we can easily tackle most of the occurring hardware issues.
The aim of this blogpost is to prepare a comprehensive list addressing the most common obstacles we did run into so far. This list is sorted by my personal preferences and, of course, by no means complete. Please excuse my paint skills as well.
Please note: We‘re mostly running some kind of Linux on our Servers, some solutions might work for Windows, too. Also root access and the current OMSA version are implied.
Most of the following issues can be resolved by merely scheduling a 4 hour downtime, upgrading kernel, BIOS, firmware and several reboots and prolonged downtimes later, you may be greeted by the OMSA webinterface. Yay. Of course this is not a suitable way to go for simply generating an HDD report for warranty purposes.
Let‘s dive into the list:
Please ensure that all OMSA related processes are running correctly. Simply ssh to your machine and run „srvadmin-services.sh status“ (in this case located at /opt/dell/srvadmin/sbin)
The srvadmin-services script is a convenient tool to check the processes. It can also be used to restart the processes in the correct order.
You were not using SSL. Please use „https“ to connect to your server.
Error code: SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY
You were now using SSL and, presumably, Firefox. This is an issue everybody will be facing in the future. Just a quick workaround and not really a solution: Chrome works for me (as of Dec 27).
There will be a follow up/recap to this issue, very likely in a separate and more detailed post.
Browser type is not supported:
Simply wait or hit „Try Again“
When you‘re using IPMI as well, you may have configured different user for different tasks. OMSA works differently, so by default you have to login as „root“, using the root password of the OS running on this server.
500 internal Server Error, java.lang.NoClassDefFound :
Ah, this one was tricky. In our case, this issue boiled down to the sysadmin obsession: uptime.
We tried everything, from simply restarting the services by hand in multiple order, reinstalling the binaries to even looking for help at the support forums, with the expected outcomes:
During the long uptime of the machine, some old processes of the OMSA services kept running and could not be killed by srvadmin-services. With a simple „ps aux | grep dsm_“ and fearless killing the found processes (some /etc/init.d/ related) via „pkill -9 -f dsm_“ and restarting via srvadmin-services, we could finally access the Webinterface again.
Other admins may have different issues and different solutions respectively.