Wednesday, April 11, 2012

When you cant afford host based HA failover and use % based HA failover

Issue - when trying to put the host in maintenance mode the VMs are not being moved automatically for a cluster.
Only few VMs migrated off of the host.
Action - It was a 2 node cluster.
50% of the resources have been reserved for HA failover that means 1 out of 2 hosts is reserved for HA failover.
When 1 of the 2 hosts is put into maintenance mode the HA cannot satisfy its rule of reserving 50% of the resources of the cluster where only one host is powered on and further more the admission control for the HA is enabled which stops the VMs from being powered on which violate the resource availability.
Disabling the admission control and disabling HA moved the VMs automatically to another host when one of the host was put into the maintenance mode.
Recommendation: In a 2 node cluster never enable the admission control for HA especially if we are using the % based HA failover instead of host based failover.
In a cluster with more than 3 hosts it is advisable to use the host based HA failover but not % based failover. Wherever % based failover is used for the lack of resources to use the host based HA failover admission control should be disabled if you want all of your VMs to be powered on irrespective of the availability of resources.

Upgrading your storage of VMware ? hold on and read this first

The best practice to remove a lun or datastore should be
1. Make sure no data is being used from the lun/datastore [migrate the running VMs to a storage which is not being upgraded or power off, if there are powered off VMs then remove them from the inventory, make sure no ISO file is mounted to any VM or host]
2.Remove/unpresent it from the host in the VMware or better unpresent these luns in the storage console to the VMware.
3.Right click on the cluster and scan for datastores[if it was presented to multiple hosts outside the cluster then right click on the datacenter and scan for datastores]
Failure to unpresent a lun or datastore properly will result in the freezing of the host sometime in the future when it goes into an endless loop since it is unable to connect to the particular lun or datastore.
suggestions are welcome
Finally VMware has an article now on this now at vmwarekb/2004605

Tuesday, April 3, 2012

Upgrading ESXi 5 to 5 update 1 fails with update manager or esxcli

Issue : the VMware update manager fails to remediate 2 of 3 ESXi 5 hosts with the error “vmware vsphere update manager had an unknown error. check the events and log files for details”
All 3 were of the build 474610 and only one was updated to 623860.
update manager 5.0.08039
What did not work :
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1017377
the user was already in the dbo schema
http://communities.vmware.com/thread/282048
tried to migrate all the VMs off of the host and then do a remediation after staging but it failed too.
http://communities.vmware.com/message/1984212
we can browse the syslogs directory and contents from the shell.
Detached the baselines and reattached them but no go.
Recreated the baselines but no go.
The scanning fails with the same error.
Started downloading the offline bundle for the upgrade and tried to update but it errored out
esxcli software vib update --depot /vmfs/volumes/syslogs/update-from-esxi5.0-5.0_update01.zip
Got no data from process /usr/lib/vmware/esxcli-software vib.install --updateonly   -d "/vmfs/volumes/syslogs/update-from-esxi5.0-5.0_update01.zip"
Resolution : configured the scratch config partition manually and  tried installing patches again. it worked fine.