Monday, December 17, 2012

ESX 4.1 host disconnected

Issue: - both the hosts of the cluster are in a disconnected state.
What didnt work:
udpated the ip address of the vcenter server in the VC>administration>runtime settings.
removed and added the hosts back to the vcenter server but no go.
Opened the ilo for another host and renamed the host with lowercase.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003735
 and made sure the file the entries in the files
/etc/hosts
/etc/resolv.conf
/etc/sysconfig/network
/etc/vmware/esx.conf
Reinstalled the management agents and restarted the vCenter service in the host but only one host stayed connected and the other one keeps getting disconnected after 1 or 2 minutes.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003714
on the nonworking host tried
service mgmt-vmware stop && service vmware-vpxa stop && service vmware-vmkauthd stop && service xinetd restart && rpm -qa | grep -i vpxa | awk '{print $1}' | xargs rpm -ef $1 && userdel vpxuser && rpm -qa | grep -i aam | awk '{print $1}' | xargs rpm -ef $1 && service mgmt-vmware start && service vmware-vmkauthd start
made sure the services are started back on and tried to connect but no go.
Nslookup works from the host to other hosts, vcenter server and vice versa,
reverse nslookup works too.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001493
Updated the /etc/opt/vmware/vpxa/vpxa.cfg  file with the correct vcenter ip address but no go.
http://kb.vmware.com/selfservice/microsites/microsite.do?cmd=displayKC&docType=kc&externalId=1012382&sliceId=1&docTypeID=DT_KB_1_1
Checked the port requirements for the host from the vcenter using telent and
 443, 902, 80, 5989 they were open but  623 for the DPM wasnt.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1005757
added the
<heartbeat>
<notRespondingTimeout>120</notRespondingTimeout>
</heartbeat>
but no go.

What worked/what i forgot to check: noticed issue with the Managed IP Address in vCenter Settings as one of the hosts IP address was mentioned there
Changed it to vCenter IP
Finally turned out to be an issue with windows firewall. Disabling firewall reconnected all the hosts.
Hosts stayed connected after that. worked on the DNS pointers and fixed the DNS issues as there was a change in IP address done on one of the hosts