Disaster recovery of vSphere after disk array failure
Yesterday i had the opportunity to help a customer that had a disk array failure where both their PDU died and one hdd broke, after the hw supplier had replaced the parts some virtual machines had hdd´s and databases that where corrupt.
To be able to assist the customer i used the TeamView program that works excellent and is fast to get up and running, no need to get the fw guy to open new ports or get a new vpn account! For personal use there is a free version that works in two hours at a time, but the enterprise license is not to expensive and should be considered as the tool is so powerful!
This leads to the most important lesson of this is that you will have to be very thorough in your design and not put all your eggs in one basket. In this particular case the vRanger backup server, domain controller, sql server and vCenter server was on this datastore, luckily the vRanger did not have any corrupt data and we could restore the whole vCenter server. When the vCenter was restored we could continue and restore mailbox, sql and domain controller. I did restore the vCenter to another VM and when we saw that restore was ok we deleted the old vCenter server, but as the customer had vSphere Essentials we will have to manually rename the vCenter-Temp and how to do it you can read on this link, otherwise if you have Enterprise or higher as your license you can rename it and then do a storage Vmotion and the files will get the right names, its kind of difficult to do a storage migration when connecting directly to the ESXi host.
The following points should be considered
- Ensure the placement of the vCenter server by datastore and host affinity to know where it is in a disaster recovery scenario
- Make sure that the backup server can be used in case of a total datastore blackout
- After implementing a backup solution, check that you actually can do a restore also!
- Do not put all Domain controllers and DNS servers in the same vSphere cluster and datastore
- Make sure that you do not use Thin disks and over commit the datastore without the appropriate alarms set.
Using a backup solution that can do backup on the virtualization platform and not inside the vm is the most effective solution to be able to recover fast and easily in the case of a total datastore failure. Most backup solution providers can offer this feature and for those not using it today should definitively consider it. For example the Quest vRanger can do file level restore from a VM backup.