Monday, November 24, 2014

2/3 Node Cluster and HA Admission Control

Everybody knows how much we love HA from VMware which is one of the key selling point of VMware and an awesome fail over feature to reduce unplanned downtime. There is however something which many of the VMware users overlook or give less importance to and that is admission control settings. Now Let us see what the heck is this first ( I know you know it but there are some who don't so skip a paragraph if you want).
Admission control as the name itself suggests it controls the admission of VMs into a host when one of the co host goes down unexpectedly and all of it's VMs are now being restarted onto different hosts. To put it simply if a BUS 'A' (HOST A) and 'B' (HOST B) are going from X to Y where both can accommodate 52 passengers (VMs) each but carrying only 30 in each. Let us say the BUS B breaks down (HOST B) and the passengers (VMs) too want to travel by BUS A. If the company policy (admission control) is to not allow such a movement (admission control set to default or enabled) then these 30 passengers will be stranded there itself till the BUS B is repaired (HOST B is repaired and brought up). If the company policy okays such an adjustment where all passengers may not travel comfortably but they will however travel from X to Y (All VMs will not get all the resources promised but they will be powered on).
Now If you have a 2 node cluster and you have somehow set the admission control to default or enabled then think again. The default HA setting is to tolerate 1 host failure (In other words always make sure to reserve enough resources to accommodate all the VMs of 1 host) but if 1 of the 2 host fails then there is only 1 host running and there is no other host where HA can reserve resources and also admission control will stop the 2nd host's VMs to be powered on on the 1st host if the 2nd host goes down if it violates the HA rules and most probably it will. So please test your 2 node cluster for fail over with default HA settings (which most probably will fail). You have 2 solutions.
1) Disable admission control
2) Add a 3rd host
I believe the 1st option is much more sensible than the 2nd one. please let me know if you agree or differ from what I just said but before you do please check out this http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1007006 and I know it talks about DPM too but the article is mostly true when you aren't using the DPM or even DRS. I myself have always advised my clients in the past against enabling admission control on a 2node cluster or a cluster with resource crunch you may differ
but if you haven't tested your cluster then yes you will suffer. [I like rhymes ;) ]

Not enough resources to failover this virtual 
machine. vSphere HA will retry when resources 
become available.
warning
1/12/2011 7:11:08 AM
EXCHANGE