Nutanix: AHV(Hypervisor) Upgrade stuck on Nutanix CE Cluster
Hello everybody,
in my LAB CE Cluster we run in following problem during Upgrade our AHV Version from 2018.01.31 to 2018.05.01.
We start the upgrade over the Upgrade Wizard from Prism… so far so good
but also after hours the AHV Upgrade are stuck..
after some genesis and cluster restarts the Upgrade failed with the error „Failed to revoke token from „x.x.x.x“, taken for reason…“
so we try again to run the AHV Upgrade over the „Upgrade Software“ Wizard from Nutanix, but again it stucks but on a another Subtask..
we have to dig deeper and we found out that we have a host that are in the maintenance mode.
to list all host we use the command „acli host.list“ or you use first only „acli“ to enter in the „acropolis“ cli and use only the „host.list“ command. (quicktipp in the acropolis cli you can use the tab 😉 )
with the command „host.exit_maintenance_mode x.x.x.x“ the host should exit the maintenance mode, but not in our case 🙁
we don´t find any solution also in the web, nobody have the same problem.. 🙁 so we decided to try a manually upgrade of the AHV Version with a new USB Stick Image -> the only way we have, to upgrade our AHV Version without data loss.
IMPORTANT: Our AOS Version was already at the 2018.05.01 Version. Only Upgrade your USB Image manually when you have the same AOS Version.
Because our CVM´s are on the SSD Storage no data or configuration should be lost after we upgrade manually the AHV Version. So i prepared three new USB-Stick with the new 2018.05.01 Version.
We shutdown our first CVM with the command „shutdown -h now“ and check the status on the AHV with the command „virsh list“ after a while the cvm are stopped. (sorry for the bad picture quality i was on the physical console)
after the CVM is stopped we change the USB-Stick and starting the install process from Nutanix CE. At the Point with the Install Options we choose the option „Repair Host (All data preserved)“
a few minutes later our new AHV are finish and the „old“ CVM is running and after a check all hosts are not longer in the maintenance mode and uptodate. 🙂
But wait.. our shitty Task are still at the Recent Task list.. 🙁
To delete this task follow the KB1217 Article from Nutanix.
On a CVM with the command „progress_monitor_cli -fetchall| egrep „entity_id|entity_type|operation““ you list only the important information that you need to delete the tasks.
With the command „progress_monitor_cli –entity_id=“6″ –entity_type=node –operation=upgrade_hypervisor -delete“ i delete every single task. (quicktip: write the operation,entitytype only lowercase)
And Done! We hope this Post will help someone of you 😉
Greetz diekolbs