This week I ran in to an error message I hadn’t seen before when setting up a brand new UCS environment using a couple of ebay purchased fabric interconnects. The test lab environment consisted of two 6248UP FIs and a 5108 chassis populated with 3 half-width B200 blades.
The error message that I was coming across read “B: UNRESPONSIVE: INAPPLICABLE” and “B: memb state UNRESPONSIVE, lead state INAPPLICABLE, mgmt services state: INVALID heartbeat state PRIMARY_FAILED”
This procedure is disruptive and I trust you won’t attempt it in a production environment (Anyway, you won’t likely see this state in a stable production environment. Unless of course a firmware upgrade was made recently and wasn’t executed properly.)
My cluster IP address 10.100.1.210 wasn’t responding. So, I had to directly ssh in to fabric interconnect A’s IP address 10.100.1.208 and force it to become primary so that I could access it from a remote location. Otherwise, it would spew the following error when trying to login to UCSM.
SSH directly to the responsive fabric interconnect and:
Test-Fab-A# connect local-mgmt Test-Fab-A(local-mgmt)# cluster force primary
When I checked the cluster state using show cluster extended-state command in local-mgmt mode, I saw error message for fabric interconnect B. This was happening after each reboot of fabric interconnect B and the cluster wouldn’t form properly. Fabric interconnect B keeps throwing the UNRESPONSIVE, INAPPLICABLE error message.
When consoled directly in to the affected fabric interconnect, I found fabric interconnect B was stuck at the loader> prompt. One issue turned out to be the kickstart file the fabric interconnect was trying to initialize to didn’t even exist in memory. Hence, forcing the boot process to get stuck at loader prompt. This was only one part of the problem.
The fix for me took two main steps. If the second step isn’t executed, the solution will only be temporary and appears again whenever the problematic fabric interconnect reboots.
Issuing a dir command at the loader> prompt lists files available in memory. In this case, look for a kickstart image and a system image.
From the loader prompt issue a boot [name_of_KICKSTART_image_file]. This will extract the kickstart image and when it finishes places you at the Switch(boot)# prompt.
While at the Switch(boot)# prompt, execute the command load [name_of_SYSTEM_image_file].
Now the fabric interconnect is fully booted with the system image and will be able to join in to the cluster (assuming you’ve the proper L1 and L2 cable connections in place and you matched proper images with the primary fabric interconnect)
Verify the fabric interconnect is clustered in properly, then go to the UCSM page by browsing to the cluster IP address and login.
In UCSM, select Equipment and look for tab Firmware Management > Installed Firmware, then click on Activate Firmware. Here I was able to see a mismatch in Running Version between the FIs and Startup Version for the affected fabric interconnect B showed an empty value.
From the drop down menu of available packages, I pointed it to the kernel and system images I wanted it to boot to and also made sure the Startup Versions of kernel and system images on fabric interconnects A and B match.
As soon as you click “Apply” a dialog box notifies you this is a procedure that requires a reboot of the fabric interconnects and that you will lose access to the UCS manager.
This will activate the firmware you selected for both fabric interconnects and after a certain period of time will complete. Uniformity in version will prevail as follows.
Finally I tested by rebooting either the whole fabric interconnect cluster or only fabric interconnect B.
Boot process was now able to proceed to extraction of the proper kickstart and system images. This got rid of the error message and fabric interconnect B joined the cluster without any issues.