Thursday, June 20, 2013

your VMware vswitch redundancy may not be so failproof

Incident : A host with 4 vmnics has recently lost connection due to a failure in the vmnic. you might be wondering how come all the vmnics failed at the same time ?
Design Flaw : The vswitch0 had 4 vmnic in it vmnicA0,A1,A3,A4 but it still failed because the all these vmnics were not a separate entity but 4 ports of a physical nic. It was a quadport Nic and the firmware on the PCI NIC crashed momentarily, even though it was less than a minute it was still a issue.
Design consideration :
Let us say the host has 2 quad port NICs NIC A & NIC B
make sure any vswitch is made up of nics from both the NICA and NICB.
example:
vswitch0=vmnicA0+vmnicB0
pros: avoid 3 single points of failure
a) failure of a single multi-port PCI (hardware failure) NIC,
b) failure/crashing of the firmware of the multi-port NIC,
c) failure/crashing of the driver of the multi-port NIC.