Thursday, January 5, 2012

ESX 4.1 CPU utlization 0, slow vmotion, snapshot

Issue:- 3 out of many ESX 4.1 hosts (hp bl465c+2*dl785) are showing their cpu utilization as 0, snapshots are very slow, vMotion is slow too.
Symptoms:- 1. The cpu utilization of the affected hosts in the vCenter is 0 all the time. Whenever the cpu utilization is 0, the vMotion and the snapshots are very slow and thus the backups fails since the snapshots are slow.
2. If we restart the management agents on these hosts the hosts become disconnected and when we connect them back by right clicking on them and click connect; after a minute or two the cpu utilization comes up normally but only to go back to 0 after nearly 30 minutes.
3. Manual disconnecting or connecting the host to the vCenter server (right click on the host>connect/disconnect) takes a very long time.
Resolution: Restart the management agents
service mgmt-vmware restart
service vmware-vpxa restart
so that the host becomes disconnected quickly (usually restarting the management agents doesnt cause the host to disconnect from the vcenter server but here it does in this case due to faulty management agents)
uninstall the vCenter agent and the HA agent
[root@esx-server /]# rpm -qa | grep vpxa
VMware-vpxa-x.x.0-xxxxx
[root@esx-server /]# rpm -e  VMware-vpxa-x.x.0-xxxxx
[root@esx-server /]# rpm -qa | grep -i aam. 
VMware-aam-haa-#.#.#-# 
VMware-aam-vcint-#.#.#-# 
[root@esx-server /]# rpm -e VMware-aam-vcint-#.#.#-#
[root@esx-server /]# rpm -e VMware-aam-haa-#.#.#-#
Now right click on the host connect the host back to the vcenter server.
Now the cpu utilization will be back to normal,
the vmotion will occur with its normal speed and
the snapshot will also occur with its normal speed.
Repeat the same for all the affected hosts who have their cpu utilization set to 0.