Wednesday, June 19, 2013

Intermittent network drop/disconnectivity in VMware environment

It is hard to identify what is wrong or what is going on when your VMs or host has an intermittent network issue. Here is what I have made a note of on my recent encounter with a same problem.
assuming vswitch0 has 2 nics(vmnic0, vmnic1)
start a continous ping to a test VM in the problematic host.
put vmnic0 as active and vmnic1 as unsed and check for network drops.
put vmnic1 as active and vmnic0 as unused and check for network drops.
let us assume on vmnic1 as active there were network packet drops.
assume that
vmnic1 is connected to switchport1
vmnic0 is connected to swtchport0
swap those connections
vmnic1 will now connect to switchport0
vmnic0 will now connect to switchport1
and check whether you are still having the packet drops on the same vmnic1.
If yes then it might be either the cable or the vmnic1 which is faulty.
[You can isolate this by changing the cable with a known good one]
If no, and the packets are now dropping on vmnic0 then it is either the switchport1 or the cable which is faulty.
[You can isolate this by changing the cable with a known good one]

extended scenario:
 If you have 4 vmnics then divide them in a group of 2[groupA=vmnic0,1 GroupB=vmni2,3]
Once you identify on which group you are seeing the packet drop, repeat the process for the vmnic inside the group too.
I mean if the packets are dropping when GroupB is active then keep GroupA and vmnic2 of GroupB as unused and vmnic3 of GroupB as active to check the network drop.
Then redo the same this time with GroupA and vmnic3 of GroupB as unused and vmnic2 of GroupB as active to check the network drop.
If the network drop is on vmnic2 then keep only that as active, othes as unused and then swap the cable and the switch port one at a time to eliminate the possibility of them being faulty.

b4 u do all this make sure that u r up2date on ur drivers/firmware on ur IO devices as per the vmware OS version.