Friday, June 20, 2014

migrating vcenter 5.x express DB to SQL cluster

Issue: We need to migrate the windows 2008 express edition database of the vcenter 5.1 to a remote sql cluster (cluster of 3 windows 2008 r2 sql servers).

What worked :
Backup the DB of the existing express edition db and restore it to the primary sql server using the restore from the backup and of course you can do that after you create a dummy db in the primary sql server with the same name as the original db (ex:VCDB)  and then right click on it, restore from backup and point it to the backup files of the express db. make sure you use a username with a sysadmin role.
now please use the odbc and point the sso and vcenter server to the new sql cluster ip (not individual sql servers).
this is for me to lookback and use it when i need it in future.

Thursday, June 12, 2014

VMware HA agent unreachable :(

Issue : one of the host has an ha error message
"the vsphere ha agent on the host cannot be reached.
this condition indicates that
1)a situation exists which is preventing the agent on the host from running or existing the uninitialized state or
2)vcenter server is unable to connect to any of the agents running on the cluster hosts due to a networking failure or total of cluster failure."

What really worked :
Disable HA on the cluster.
restarted all hosts in the cluster (one by one after moving off all the VMs).
remove hosts from the cluster.
Enable HA on the cluster and make sure check ssl cert is enabled.
add hosts back to the cluster.

What should have worked:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1019200
all hosts in the cluster have the same management network configuration.

it is a new installation (3 weeks old) and it hasnt worked properly since then.
forward and reverse nslookup works from the vcenter to the hosts.
using telnet made sure the 902 port is open to the esxi hosts from the vcenter server.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001596
http://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=1003735
updated the vcenter ip under runtime settings, reconnected the host but the operation timed out.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1001493
the vpxa.cfg has the right ip addresses.
ntp and time sync are fine.
there are no advanced configurations set for ha.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2011974 but no go.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2008609
fdm.log has Error message "[ClusterManagerImpl::IsBadIP] x.x.x.x is bad ip" showing in /var/log/fdm.log on ESXi hosts.
http://tech.zsoldier.com/2012/06/esxi-hosts-timing-out-during-ha-cluster.html
 the vm, management network all are on the same vlan and there isnt a firewall configured between the hosts.

hostd.log entries
"http transaction failed on stream tcp (error:transport endpoint is not connected) with error n7vmacore15systemexceptione(connection reset by peer)"

fdm log entries
2014-06-09T13:50:32.006Z [FFEB9B90 verbose 'Cluster' opID=SWI-6058ed8] [ClusterManagerImpl::IsBadIP] x.x.x.x is bad ip.

Found SSL related errors:
2014-06-09T13:51:23.069Z [6DD59B90 error 'Message' opID=SWI-29e297b3] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore16TimeoutExceptionE(Operation timed out) on handshake
2014-06-09T13:51:24.842Z [6DD18B90 error 'Message' opID=SWI-5992c13d] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore16TimeoutExceptionE(Operation timed out) on handshake
2014-06-09T13:51:42.841Z [6DE1CB90 error 'Message' opID=SWI-2f2d0b51] [MsgConnectionImpl::FinishSSLConnect] Error N7Vmacore16TimeoutExceptionE(Operation timed out) on handshake
2014-06-09T13:51:43.071Z [6DD59B90 error 'Message' opID=SWI-29e297b3] [AcceptorImpl::FinishSSLAccept] Error N7Vmacore16TimeoutExceptionE(Operation timed out)
creating ssl stream or doing handshake

2014-06-09T14:03:58.959Z [6DCD7B90 info 'Cluster' opID=SWI-4b7216e3] [ClusterManagerImpl::VerifyHost] Untrusted thumbprint (02:2D:63:09:48:E3:D8:7F:94:C1:7A:
FB:11:12:B7:C7:EB:F5:20:3F) for host 10.1.100.233 - failing verify
2014-06-09T14:04:59.032Z [6DD18B90 info 'Cluster' opID=SWI-18eb3cb4] [ClusterManagerImpl::VerifyHost] Untrusted thumbprint (02:2D:63:09:48:E3:D8:7F:94:C1:7A:
FB:11:12:B7:C7:EB:F5:20:3F) for host 10.1.100.233 - failing verify

2014-06-09T13:42:05.513Z [6DD9AB90 verbose 'HttpConnectionPool-000001'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x0d9062cc, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0
2014-06-09T13:24:30.312Z [FFC92B90 verbose 'HttpConnectionPool-000001'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x04d1117c, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 0
2014-06-09T13:56:23.892Z [FFE15460 verbose 'HttpConnectionPool-000000'] [RemoveConnection] Connection removed; cnx: <SSL(<io_obj p:0x0d90316c, h:-1, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:443'>>)>; pooled: 2

2014-06-09T13:32:58.357Z [FFBEE460 error 'Message' opID=SWI-14a96433] [AcceptorImpl::FinishSSLAccept] Error N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:140000DB:SSL routines:SSL routines:short read) creating ssl stream or doing handshake --> * unable to get local issuer certificate) on handshake
2014-06-09T13:33:59.431Z [FFF5CB90 error 'Message' opID=SWI-77ccbfb7] [AcceptorImpl::FinishSSLAccept] Error N7Vmacore3Ssl12SSLExceptionE(SSL Exception: error:140000DB:SSL routines:SSL routines:short read) creating ssl stream or doing handshake

vpxd log:

During election:

2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] Timed out waiting for election to complete or for host to join existing master
2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] EnableDAS failed on host [vim.HostSystem:host-1476,uk-mal-esx-p05.dyson.global.corp]: class Vim::Fault::Timedout::Exception(vim.fault.Timedout)
2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] Timed out waiting for election to complete or for host to join existing master
2014-06-09T14:25:47.648+01:00 [05472 error 'DAS' opID=D428CBEC-00001580-9b-1d] [VpxdDasConfigLRO::Config] EnableDAS failed on host [vim.HostSystem:host-1476,uk-mal-esx-p05.dyson.global.corp]: class Vim::Fault::Timedout::Exception(vim.fault.Timedout)

FDM log:

2014-06-09T10:58:35.777Z [FFC63B90 error 'Cluster' opID=SWI-46c45c9d] [ClusterDatastore::AcquireTraditionalDatastore] open(/vmfs/volumes/5118d934-a159136a-43cd-d48564c61fed/.vSphere-HA/FDM-1D88A749-CC95-4D5C-BF5D-3CE3B8A5075D-73-603131e-UK-MAL-VC-P01/protectedlist) failed: Device or resource busy
2014-06-09T10:58:35.777Z [FFADEB90 error 'Cluster' opID=SWI-3bb36853] [ClusterDatastore::AcquireTraditionalDatastore] open(/vmfs/volumes/5118d96e-7feaf4e4-1c30-d48564c61fed/.vSphere-HA/FDM-1D88A749-CC95-4D5C-BF5D-3CE3B8A5075D-73-603131e-UK-MAL-VC-P01/protectedlist) failed: Device or resource busy
2014-06-09T10:59:05.819Z [FFD67B90 error 'Cluster' opID=SWI-6c77b0d1] [ClusterDatastore::AcquireTraditionalDatastore] open(/vmfs/volumes/5118d96e-7feaf4e4-1c30-d48564c61fed/.vSphere-HA/FDM-1D88A749-CC95-4D5C-BF5D-3CE3B8A5075D-73-603131e-UK-MAL-VC-P01/protectedlist) failed: Device or resource busy


http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2017233
our action plan was to
Review SSL configuration and certificates in vCenter
disable the Denial-of-Service protection feature.
Review any security scan on your ESXi host via VMware HA agent port (port 8182)
Update NIC Adapter firmware to the latest on all the hosts since they were out of date

did the following but that didnt work too
1. Disable HA under Cluster settings
2. Ensure that SSL Certificate Checking is enabled.

For vCenter Server 5.1 and later:
In the vSphere Web Client, navigate to the vCenter Server instance.
Click the Manage tab.
Under Settings, click General.
Click Edit and select SSL settings.

3. Select vCenter requires verified host SSL certificates. If there are hosts that require manual validation, these hosts appear in the host list at the bottom of the dialog.
4. Click OK.
5. Click OK. Hosts that you have not selected are now disconnected.
6. Reconnect the host to vCenter Server.
7. Enable HA under Cluster setting

SSL certs have been validated – the certificates are valid and are issued from a template also used for ESX hosts which don’t have this issue.

Wednesday, June 11, 2014

MCU (Most Commonly Used) vmware commands

All you command junkies out there dont make fun of me for writing this since i am only writing this to look back to when i or someone like me need to refer it. :P

check dead paths on an esxi
esxcfg-mpath -b grep | -i dead

Please try this command to test the entire snapshot chain .. This will display any errors related to snapshot ..
Vmkfstools –t0 –v10 lastsnapshot-00000n.vmdk - This command has to be issued for each hdd of the virtual machine when you have more than one hdd per VM.
or
vmkfstools -q -v10 "your_disk.vmdk"
Try this command to display the CID , PID & parent file names for all the snapshots of a VM..
Change directory to the VM, then issue this command .

 telnet (alternative) in an esxi
nc -z <target ip> <port>


Tuesday, June 3, 2014

windows 2008 r2 VM black screens after VMware tools upgrade

Issue: after upgrading vmware tools on windows 2008 r2 vm few or many of the windows 2k r2 VMs black screen.

Cause: unstability of the svga video driver on win sk8 r2 platform.

resolution: boot the VM to the BIOS and after that as soon as you exit the BIOS keep tapping f8 to see the advanced options.
Try booting to the last known good configuration and if that doesnt work boot to the safe mode with networking.
Uninstall the vmware tools.
custom install the vmware tools without the video drivers.
copy the C:\Program Files\Common Files\VMware\Drivers\wddm_video
from another VM (which is working fine) to this VM on the same place (just to be consistent).
Right click on the video adapter under device manager, udpate driver >search your computer for the driver and select the C:\Program Files\Common Files\VMware\Drivers\wddm_video location and it will auto udpate the driver.
Issue Resolved.