Hyper-V Cluster Heartbeat for MDT ref VM goes bananas or?

I have been helping a customer with their environment and we had a problem that took me a while to figure out.

They were baking reference images for their SCCM environment and the best and easiest way is to use VM´s of course. The problem that occurred was that when the image was being transferred back to the MDT server the VM rebooted after half of the image had been uploaded….

So what was doing this crazy behavior? It took me a little while before I realized what it was all about and it had to do with the Hyper-V cluster platform and resilience and heartbeat functionality!

So at first the build VM boots from the MDT image, no integration tools yet then but then it restarts to install applications and stuff within the OS and as the customer works on a Windows 7 image you can see it starts to send heartbeat to the host.

workingheartbeat

As you might know, clients and servers since Windows Vista and 2008 have integrational services by default in them although best practice is to upgrade them as soon as possible if the VM shall continue to reside in Hyper-V.

The interesting part in this case was that the OS rebooted within itself when it was finished with sysprep to start the MDT image for transferring the WIM back to the MDT server and the cluster/hyper-v did not notice this and thus it thought that the heartbeat stopped.

lostcom

And as it was a cluster resource this heartbeat loss was handled by default, and guess what, rebooted!

So what settings in the cluster resource does this madness? First of all, the Heartbeat setting on the cluster vm resource properties

hearbeatsetting

 

This can be read on the technet site about Heatbeat setting for Hyper-V clusters:

Screen Shot 2015-06-04 at 10.35.22

And then you have policy what the cluster should do after it thinks the vm has become unresponsive:

Screen Shot 2015-06-04 at 14.14.56 1

 

 

There are different ways to stop the cluster from rebooting the machine and one of them is to disable heartbeat check and another is to set the response to failure to do nothing,

The customer uses mostly VMM console and for them when building a new VM for MDT reference builds they can set the integrational services to disable heartbeat check and thus not get their work postponed by unwanted reboots.

VMM-VMSettings

During the search for why I checked Host Nic drivers as I thought that it might have something with a transfer error but could not find anything, positively the hosts got up on the latest nic firmware and drivers 😉 . My suspicion that it had to be the cluster was awaken after I had spun up a test VM that was not part of the cluster and that one succeeded in the build and transfer.

This is a rare case and I would say that in 99% of the cases you want the default behaviour to happened as a VM can become unresponsive and then the cluster can try a reboot to regain into operations..

Clarification: If you spin up a VM with a OS or pxe image that does not have integrational services it will not reboot the VM after the cluster timeout, the OS has to start sending heartbeat to the Hyper-V host and then it will be under surveillance and managed by the cluster until properly shut down!

Hope that this helps for someone out there wondering what happens…

 

 

Taking the SCVMM 2012 R2 UR6 for a test drive

Noticed this evening that Microsoft released the UR6 for System Center and my interest is in Virtual Machine Manager so I wanted to test-install it  and also connect to an Azure IaaS subscription as this was one of the new added features besides all fixes and also of course the other added feature with Generation 2 Service template support etc.

Screen Shot 2015-04-28 at 20.15.14

Here you can read more about the fixes and also if you do not use Microsoft Update, download the files.

As I had my environment connected to the Internet I could press install,

Screen Shot 2015-04-28 at 20.18.03

Once it was finished a reboot of the server had to be done and I could start to add Azure subscriptions to VMM. Here you have to use a management certificate and that is easily created with makecert if you do not have any other CA available!

Screen Shot 2015-04-28 at 20.59.47

And when that is complete you can see my VM´s in Azure on the subscription and the commands that I can use on them,

Screen Shot 2015-04-28 at 21.03.15

Good luck in your tests of this nice new feature.

 

Exclude VM´s from dynamic optimization in SC VMM

In a case where we had a VM that was a bit sensitive with an application that does not like the ping loss during live migration between hosts in a cluster we wanted to exclude the vm from the automatic dynamic optimization. It might not be totally clear where you find this setting for the VM´s you want to exclude from this load balancing act.

First of all, do you know where you set up automatic dynamic optimization in System Center VMM 2012 R2? For some reason you set it up on the hosts folder and not on the cluster object in VMM:

Screen Shot 2015-04-16 at 13.00.13

 

So where do you exclude VM´s from this optimization? If you look under a VM´s properties you can check the actions and there you see the magic checkbox :

Screen Shot 2015-04-16 at 12.55.35

And now when the automatic job runs my sensitive VM will stay on the host.

SC VMM Error 803 after restoring a duplicate VM with alternative name

I was at a customer today testing some backup/restore scenarios with their backup providers software and where we got an interesting error: 803 “Virtual Machine restore-Test5_gen1 already exists on the virtual host other cluster nodes” Recommended Action: “Specify a new name for the virtual machine and then try the operation again”

image003.jpg

 

In the Backup/restore console we wanted to do a restore to alternative place and an alternative name as we had not deleted the original VM (for instance if you need files or just verify some state or something), in the configuration of the restore job we checked that the restore process would create a new VM Id which was the first thing we thought that was why VMM complained.

The thing was that this error only appeared when we did a restore to another hyper-v host, if we restored to the same where the original VM was residing there was no error..

As you can see after the restore both the original VM and the alternate had the same “#CLUSTER-INVARIANT#” id but different VMId´s, and when we tried to refresh VM´s we got the error above.

Screen Shot 2015-03-30 at 20.56.59

The solution was not so farfetched and can be read about in the KB2974441 although that case is about RDS VDI but still, and as can be read about why the ID is in the notes field from the beginning: “VMM adds a #CLUSTER-INVARIANT#:{<guid> } entry to the description of the VM in Hyper-V. This GUID is the VM ID SCVMM uses to track the VM.”

For the VM not showing up in VMM console we just went into the notes field on Hyper-V Manager and removed that specific “#CLUSTER-INVARIANT#” id and after that VMM generated a new for that VM and it appeared in the VM list on the VMM server.

So why was it no problem when we restored to that same host? For some reason VMM managed to see the duplicate residing on the same host and generate a new id in the notes field for that and thus appearing in the VM list without any massage..