Hyper-V R2 Core host networking problem in VMM 2008 R2

This Friday i helped a customer with a little problem, they have a Hyper-V cluster with 4 nodes, after a switch firmware upgrade they experienced some networking instability and in their search of failure they accidentally unchecked the Host access checkbox in the networking properties on the net, as their networking configuration did not allow a separate nic for the management we had to set it up with this enabled. Of course when they applied this change the host lost connection.

I instructed them to use the 59manager to configure the host locally and set the host access on this virtual switch, this did not help cause the server would still not respond. I went over to help them on site, the first thing we did was to remove the host from the cluster and then start to test and when we removed the virtual switch the nic started to respond to ping but as soon as we re-added it to the virtual switch it stopped working. We also tried to remove the nic config and set it to dhcp and then add it to the virtual switch with the host access enabled, which did not work either. After this when we tried to set the IP in SCONFIG we got an error that stated that there was an error and the address could not be set. I thought it might be some bug in the team networking sw or drivers so we updated those as well, but no more luck there either..

Then we found the following site that described the exact same problem, Microsoft Enterprise Networking Team , in this blog they referred to a script that clean out the whole hosts virtual networking config, “nvspcrub.js” , with the /p option. Well we had to try something so we ran the script and cleared the hosts all virtual switches.  Then we added a new virtual switch and checked the Host access and tried to set the IP in SCONFIG, Still same error with “Can not set IP address”, After this we where almost on our way to give up and reinstall the host, then we thought of one last chance to set the IP through netsh (this after reading about a bug with SCONFIG) so with the command

netsh int ip set add "Local area connection 3" static 192.168.8.139 255.255.255.0 192.168.8.254

It actually worked and the host started to respond to ping 🙂 , quite frustrating that we removed all the config and then find out it was a bug in sconfig that was the causing the error.

Now we had to re-add all virtual networking switches, this was of course a perfect job for powershell, so i wrote a script that took one of the other hosts config and created the same virtual switches on the failed host, also connecting to the NIC corresponding to the right net and vlan

# Create Virtual Networks on Host
#
# Niklas Akerlund /RTS

# Take ref nic from another host

Add-PSSnapin Microsoft.SystemCenter.VirtualMachineManager
$VMMserver = Get-VMMServer sbgvmm01

$Networks = Get-VirtualNetwork | where {$_.VMHost -eq "HYP04.desso.se" -and $_.HostBoundVlanId -ne "3750"}

$NICs = Get-VMHostNetworkAdapter | where {$_.VMHost -eq "HYP01.desso.se"}

$VMHost = Get-VMhost -ComputerName "HYP01.desso.se"

foreach($Network in $Networks){
$split = $Network.Name -split ' '
if ($split[1] -eq "1"){
$Name = $split[0] + " " + $split[1]
$match = $split[2] -match "\d+"
$vlanid = $Matches[0]
$vlanid = [int]$vlanid
}elseif ($split[1] -like "VLAN*"){
$Name = $split[0]
$match = $split[1] -match "\d+"
$vlanid = $Matches[0]
$vlanid = [int]$vlanid
}else{
$Name = $split[0] + " " + $split[1]
$match = $split[2] -match "\d+"
$vlanid = $Matches[0]
$vlanid = [int]$vlanid
}

$HostNIC = Get-VMHostNetworkAdapter -VMHost $VMHost | where {$_.ConnectionName -eq $Name}

if ($HostNIC -ne $null){

New-VirtualNetwork -Name $Network.Name -VMHost $VMHost -VMHostNetworkAdapters $HostNIC -BoundToVMHost $FALSE
Set-VMHostNetworkAdapter -VMHostNetworkAdapter $HostNIC -VLANEnabled $TRUE -VLANMode "Trunk" -VLANTrunkID $vlanid
write-host $HostNIC.ConnectionName
Write-Host $vlanid
}
}

After i ran this and the host got all it´s virtual networks back i could add the host back to the cluster again. Instead of some typo errors with manually entering all the virtual switches, with some powershell we could be sure that we got the same config as the other host already in the cluster!

One thing that i first missed was the -BoundToVMHost $FALSE in the New-VirtualNetwork which resulted that all my virtual networks had the Host Access checkbox marked and i had one NIC for each of them on my host, this of course was not what i wanted, one could think that this would be false as default but for some reason MS and the VMM team thought different, well no worries i created a small script to just update my virtual networks with that option (the script above is corrected after my mistake), so i ran:

# Update networks with BoundtoHost $false
#
# Niklas Akerlund /RTS

$Networks = Get-VirtualNetwork | where {$_.VMHost -eq "HYP01.desso.se" -and $_.HostBoundVlanId -ne "3750"}

foreach ($Network in $Networks){
Set-VirtualNetwork -VirtualNetwork $Network -BoundToVMHost $false
}

Where the network with the VLAN 3750 was the one i wanted the host access to stay because it was the management nic of the host.