Thursday, January 5, 2012

ESX 4.1 CPU utlization 0, slow vmotion, snapshot

Issue:- 3 out of many ESX 4.1 hosts (hp bl465c+2*dl785) are showing their cpu utilization as 0, snapshots are very slow, vMotion is slow too.
Symptoms:- 1. The cpu utilization of the affected hosts in the vCenter is 0 all the time. Whenever the cpu utilization is 0, the vMotion and the snapshots are very slow and thus the backups fails since the snapshots are slow.
2. If we restart the management agents on these hosts the hosts become disconnected and when we connect them back by right clicking on them and click connect; after a minute or two the cpu utilization comes up normally but only to go back to 0 after nearly 30 minutes.
3. Manual disconnecting or connecting the host to the vCenter server (right click on the host>connect/disconnect) takes a very long time.
Resolution: Restart the management agents
service mgmt-vmware restart
service vmware-vpxa restart
so that the host becomes disconnected quickly (usually restarting the management agents doesnt cause the host to disconnect from the vcenter server but here it does in this case due to faulty management agents)
uninstall the vCenter agent and the HA agent
[root@esx-server /]# rpm -qa | grep vpxa
[root@esx-server /]# rpm -e  VMware-vpxa-x.x.0-xxxxx
[root@esx-server /]# rpm -qa | grep -i aam. 
[root@esx-server /]# rpm -e VMware-aam-vcint-#.#.#-#
[root@esx-server /]# rpm -e VMware-aam-haa-#.#.#-#
Now right click on the host connect the host back to the vcenter server.
Now the cpu utilization will be back to normal,
the vmotion will occur with its normal speed and
the snapshot will also occur with its normal speed.
Repeat the same for all the affected hosts who have their cpu utilization set to 0.


Monday, January 2, 2012

ESXi 5 host loses lun connectivity after reboot

Issue: Every time the host is rebooted the host loses datastore connections and does not see any luns.
Other similar servers do not have this issue which are running the ESXi 5.0 and these hosts were upgraded at the same time with the same software.

Symptoms: We have to uncheck the override switch failover for the nic teaming assigned for iscsi portgroup and then again check them back, rescan for datastores then the host sees the datastore.

Resolution:changed the nic teaming policy to keep one nic as standby instead of unused for both the iscsi portgroups. Reboot the host. Issue Resolved.
More detailed explanation here: