Thursday, May 15, 2008

Hyper-V + TCPOffloading = poor network performance?

* Update * - Here I use the term TCP Offload.  The actual feature is properly termed TCP Task Offload.  TCP Offload is better referred to as TCP Chimney and that should never be turned off.

Here are some tips that I have passed on regarding Hyper-V and TCPOffloading.

In theory TCPOffloading should always make things better. It takes work away from your server OS, thus everything should work better. Right?

As a former System Administrator, I have been disabling TCPOffloading on Windows Servers for years. Always to fix network performance issues. This works for SQL servers, Terminal Servers, and now Hyper-V servers.

Why? I honestly have not gotten deep enough into the issue to fully understand why, I just know that if your network throughput is pathetic on a Windows Server that is running some application or role, try this trick.

(as always take caution, this might work, it might have no effect - I take no risk from you)

In this case, focus on the NIC driver on the base network connection. (not the virtual one that your parent partition uses, but the Physical NIC properties (the NIC with only the Microsoft Virtual Network Swith Protocol enabled))

Make sure it is either the WS08 provided driver, or the latest from the manufacturer (it is usually always safer (read this as: overall more stable) to use the one that ships with Windows - unless you need a feature provided by a different driver)

Try turning off TCP offloading - as this can cause strange behavior on some networks.

Also, with Hyper-V if you are using a manufacturer provided driver and you want to do fancy teaming and stuff like that at the driver level - please make sure the driver has been updated for Hyper-V, or else you will get weird results here as well.

Don't use NIC teaming, or NIC management software unless the manufacturer says it is Hyper-V role happy (not just WS08 happy).

Some NIC teaming management software needs special control of the hardware and it might not work after Hyper-V is installed, since the physical hardware is now shared.

Monday, May 12, 2008

Using a single base VHD to create many individual VMs

First of all, please note - this is a test and development trick. The use of differencing disks does cause a reduction in disk performance of a VM (a snapshot is also a differencing disk).

Setting Read only on the parent VHD is not critical in Hyper-V as it was in Virtual Server.

Here are the basic steps:
1) create your Base VM - base.vhd
2) prepare him with sysprep (or your tool of choice)
*it is at this point (or after) that you can tag base.vhd 'read-only'
3) create differencing disks using base.vhd as the parent.
* Use New -> Hard disk -> Differencing from the Hyper-V manager
4) create a new VM 'using an existing VHD' for each differencing disk

You are done.

In the end you get:
_ differencing disk 1.vhd
_ differencing disk 2.vhd
_ differencing disk 3.vhd

Base.vhd can never be used for a VM - so the VM that you created when you created him must be deleted.

The only reason for making the base.vhd read-only is that if it is modified at all, all the rest of your children differencing disks will be 'broken' - broken as in they will stop booting and will crash, just as if a hard drive failed.

VHD is block based and differencing disks contain a 'map' that references the parent - so parts of your VM disk reside within both VHD files (the parent and the child).

The versatility of Hyper-V Export / Import - fixing Hyper-V VM failover and backup problems.

The Export and Import functions of Hyper-V are not well understood. Are they a simple tool just to move virtual machines about?

I challenge that thinking, the functions can be used for much more, such as 'fixing' some VM behavior in Hyper-V.

First of all - lets explore what these two function do:

Export collects the pieces of a Hyper-V virtual machine and collects them into a single folder (with the same name as the VM) on a volume or at a local path.

Import simply 'attaches' to the exported virtual machine, reads in the configuration, and allows you to run the VM from the location that you imported it from.

Cases where these functions are useful:

The obvious - export, move, import

The not so obvious:

Fixing Failover Clustering of a VM. - When a VM is created the default settings place the configuration and snapshots under the path %programdata% and the virtual disk files under \users\public. If you want your VM to be highly available, you need all of these bits in a shared storage location. Using Export to do this is a really simple solution to collect the VM together and to change its configuration, snapshot, and VHD location to that shared volume. *Poof* a bit of administration magic, you just fixed your VM for failover clustering.

Enabling a VM for easy backup and recovery - All administrators know that backup and restore is an exercise that involves chicken sacrifices, finger crossing, and sweaty palms. You quickly learn how good your backup plan is when you work through the exercise of a performing a staged disaster recovery of a particular VM or application within your enterprise.

Anyway, just like for failover clustering, having your VM neatly bundled in one location makes it easier to comb through volumes and folders within your backup catalog (not everyone uses DFS).

You can also Export a copy of your running VM, just for safe keeping, until that application is upgraded again, then you have a quick machine to use for testing and recovery.

Sunday, May 11, 2008

How do I stop Failover Clustering long enough to perform some maintenance that involves a reboot without chasing my VM between my clustered hosts?

I have actually been going around with support about this issue myself. Maintenance mode does not exist for the VM workload.

How do I stop Failover Clustering long enough to perform some maintenance that involves a reboot without chasing my VM between my clustered hosts?

In failover cluster manager I can select the Virtual Machine and take it offline - this causes my HA VM to be powered down - and also removes it from the Hyper-V manager.

If I select Manage Virtual Machine in failover cluster manager I get directed to the Hyper-V manager.

If I take my Virtual Machine configuration offline - then my VM is shut down but it is not removed from Hyper-V manager.

If I take a snapshot and try to revert, as soon as my VM gets stopped to perform the revert, it gets migrated to another host.

Here is a solution that works ONLY IF you need your VM off: (this works for changing hardware, etc.)
Open the properties of the Virtual Machine resource in Failover clustering.
Set the off-line action to Shut down (this is supposed to stop the VM from failing over and restarting on another host).
Take the resource off-line.
return to Hyper-V and make your changes.

Here is a solution if you want failover clustering to leave your VM alone for a while (so you can reboot it without if failing over):
Open the Virtual Machine properties in Failover Clustering
Select the policies tab
Select "if resource fails, do not restart"
Save that setting.
Return to the Hyper-V Manager and you can open your Virtual Machine Connection console and do things within your VM that may involve a shutdown or reboot to your hearts content.

Just remember that when you want Failover Clustering to be back in charge, you need to return to Failover Clustering and undo what we did above.

Thursday, May 8, 2008

Hyper-V plus failover clustering, an interesting marriage

Hyper-V is a really cool Windows add-on as by itself it is “just another hypervisor” but with the addition of a bunch of other Windows Roles and Features it quickly becomes much more.
Take High Availability for example. Hyper-V, plus a VM workload, plus Failover Clustering.

For those of you not already familiar with Failover Clustering I am going to talk a bit about Windows clustering in general. First of all, I am speaking of Failover Clustering, not Network Load Balancing clustering, that is totally different.

In the generic sense, Failover Clustering is a way of taking a workload that runs on a clustered node and keeping that workload available. With Hyper-V it involves keeping a VM powered on.
The only big requirement is shared storage. This can be old fashioned SCSI shared storage, fiber SAN, or iSCSI. If Windows can see it as storage and you can present it to more than one server then you have shared storage.

The Failover Clustering setup and validation wizards in Windows Server 2008 make clustering really super simple (makes me cringe when I recall my first NT 4 cluster). You run the wizard, and if you listen to it, you have a fully MSFT supported cluster – you even get a recommended quorum configuration.

One limitation to consider is NTFS. By default only one node (clustering term for a member server in a cluster) can own a LUN at any one time. To be a bit more granular, only one server can write to an NTFS partition at any one time. It is possible to share a LUN with two Windows servers, but even having one reading and the other only looking your volume will begin to degrade very quickly.

This sets up a one Highly Available VM to one LUN model for Hyper-V.

A highly available (HA) guest is made up of three parts. Part one – a configuration file. Part two – the workload. Part three – the LUN (that contains the VHD).

When a HA guest is failed over from one node to another all three parts must be moved between the nodes. The configuration is passed, the volume is passed, and the workload is passed.

The logistics behind this is that your HA guest is saved, its LUN is passed (assuming that all the bits of the VM reside in one folder), and then the guest is started (resumed).

The passing of the LUN prevents having more than one VM workload on a shared volume as the other VMs end up being ignored.

Why? You might ask. In a previous post I had mentioned about struggling with failover clustering for an hour or so, and above I mention that Hyper-V is not making the guest highly available but it is Failover Clustering.

Failover Clustering is acting upon that HA vm workload and doing whatever it takes to keep that VM up and running. It is what is controlling the VM, not Hyper-V.

Hyper-V is still involved, but from the standpoint that the VM guest heartbeat is lost for a moment, then failover clustering is right there, ready to move that VM in a snap and keep that darn thing running. IF there is collateral damage, that is not the fault of failover clustering, but the admin.

Will this behavior change? Who knows. Windows clustering has worked this way for a long time now, and so has NTFS. I guess that if you could get past the NTFS limitation, then you could do it.

That is enough for now. More later.

Tuesday, May 6, 2008

Hyper-V Snapshots, more about how they work

Previously, I wrote about snapshots with Hyper-V and the architecture behind them.
There still seems to be some confusion among administrators regarding them, so let’s take another look.

The big first hurdle to understanding snapshots is that a snapshot is a moment in time.
(Start thinking like Lieutenant Daniels in Star Trek Enterprise and forget the Vulcan Science Directorate opinion that time travel is impossible).

The second concept is how snapshots work.

I have a virtual machine that I will call “TestVM.” I just created TestVM using the new virtual machine wizard and finished installing Windows Server 2008. My intent with TestVM is to test various WS08 Roles and Features.

My root run state is my fresh and clean install of WS08.

To make an offline snapshot I shutdown the VM and take a snapshot – this snapshot is (by default) labeled with the moment in time of ‘now’ (5/6/08 1:06 pm).

In the background Hyper-V copied my hardware configuration and attached a differencing disk to my TestVH.VHD thus when I power on the VM anything that I do while the VM is running will be written to the differencing disk.

If I power on the VM and do something then decide that what I did was ‘bad’ for my VM I perform a Revert. What does the revert do?

The revert takes the differencing disk that my current running state is using and throws it away and creates a new one, thus returning me back to the moment in time that the snapshot was taken.

After performing the revert I added IIS and This is good. But I want to go back to my clean snapshot moment (saving my current moment) and do something different. In this case I choose apply. And I select the option to “take a snapshot and apply”.

Now I have a new Online snapshot of ‘now’ (5/6/08 1:12pm) and I return back to my previous Offline snapshot.

In the background Hyper-V did something a bit different. Hyper-V copied my hardware configuration, but it also saved my running memory state, and created a differencing disk for when I return to that snapshot.

Now that I have gone back in time I am running within (yet another) new differencing disk attached to TestVM.VHD

Wow, this is getting deep. And it can. You can build an entire tree with many branches.
What I need to remember is what each moment in time represents (I can rename snapshots), that they return me to a moment in time that includes the hardware configuration (in case I change that), the disk state of the VM (whatever was written) and the running state of the VM at that moment in time.

You can see how this is pretty powerful in a test and development type of scenario. But production is a totally different issue, and smells like a new post.