HA is natively builtin to AHV to ensure the availability of guest VMs in the event of a node outage.
There are two options for VM HA, Default and Guaranteed.
With Default, there is no configuration needed and the configuration is on automatically out of the box. When an AHV node becomes unavailable, VMs that were running on the failed node will restart on remaining hosts within the cluster as long as there is sufficient resources. If resources are not available, not all VMs will restart.
Guarantee needs to be enabled from Prism, selecting the options cog, then Manage VM High availability
This configuration option reserves space throughout the nodes in a cluster to guarantee that all VMs on a node will be able to restart in the event of a node failure.
The Acropolis master is responsible for keeping track of node health by monitoring all connections to libvirt on all hosts within the cluster. Acropolis Master is also responsible for restarting any VMs on healthy hosts in an HA event. The following diagram, courtesy of NutanixBible.com explains HA host monitoring.
As you would expect, if the Acropolis Master is impacted, a new Acropolis Master will be elected on the remaining nodes in the cluster.
After reading the above, you are probably wondering how AHV calculates the amount of resource it needs to reserve in a ‘Guaranteed’ state. The amount of resource is dependant on the Replication Factor that is set. One node worth or resources will be reserved if all containers are set at RF2, and two nodes worth of resources will be reserved if ANY containers are set at RF3. If a cluster has nodes with different memory capacities, AHV will automatically use the node with the largest capacity when making calculations.
Keeping simplicity at the forefront VM HA within AHV is no different.