XenApp 7.x Architecture and Sizing

image

Peter Fine here from Dell CCC Solution Engineering, where we just finished an extensive refresh for our XenApp recommendation within the Dell Wyse Datacenter for Citrix solution architecture.  Although not called “XenApp” in XenDesktop versions 7 and 7.1 of the product, the name has returned officially for version 7.5. XenApp is still architecturally linked to XenDesktop from a management infrastructure perspective but can also be deployed as a standalone architecture from a compute perspective. The best part of all now is flexibility. Start with XenApp or start with XenDesktop then seamlessly integrate the other at a later time with no disruption to your environment. All XenApp really is now, is a Windows Server OS running the Citrix Virtual Delivery Agent (VDA). That’s it! XenDesktop on the other hand = a Windows desktop OS running the VDA.

Architecture

The logical architecture depicted below displays the relationship with the two use cases outlined in red. All of the infrastructure that controls the brokering, licensing, etc is the same between them. This simplification of architecture comes as a result of XenApp shifting from the legacy Independent Management Architecture (IMA) to XenDesktop’s Flexcast Management Architecture (FMA). It just makes sense and we are very happy to see Citrix make this move. You can read more about the individual service components of XenDesktop/ XenApp here.

image

Expanding the architectural view to include the physical and communication elements, XenApp fits quite nicely with XenDesktop and compliments any VDI deployment. For simplicity, I recommend using compute hosts dedicated to XenApp and XenDesktop, respectively, for simpler scaling and sizing. Below you can see the physical management and compute hosts on the far left side with each of their respective components considered within. Management will remain the same regardless of what type of compute host you ultimately deploy but there are several different deployment options. Tier 1 and tier 2 storage are comprehended the same way when XenApp is in play, which can make use of local or shared disk depending on your requirements. XenApp also integrates nicely with PVS which can be used for deployment and easy scale out scenarios.  I have another post queued up for PVS sizing in XenDesktop.

image

From a stack view perspective, XenApp fits seamlessly into an existing XenDesktop architecture or can be deployed into a dedicated stack. Below is a view of a Dell Wyse Datacenter stack tailored for XenApp running on either vSphere or Hyper-v using local disks for Tier 1. XenDesktop slips easily into the compute layer here with our optimized host configuration. Be mindful of the upper scale utilizing a single management stack as 10K users and above is generally considered very large for a single farm. The important point to note is that the network, mgmt and storage layers are completely interchangeable between XenDesktop and XenApp. Only the host config in the compute layer changes slightly for XenApp enabled hosts based on our optimized configuration.

image

Use Cases

There are a number of use cases for XenApp which ultimately relies on Windows Server’s RDSH role (terminal services). The age-old and most obvious use case is for hosted shared sessions, i.e. many users logging into and sharing the same Windows Server instance via RDP. This is useful for managing access to legacy apps, providing a remote access/ VPN alternative, or controlling access to an environment through which can only be accessed via the XenApp servers. The next step up naturally extends to application virtualization where instead of multiple users being presented with and working from a full desktop, they simply launch the applications they need to use from another device. These virtualized apps, of course, consume a full shared session on the backend even though the user only interacts with a single application. Either scenario can now be deployed easily via Delivery Groups in Citrix Studio.

image

XenApp also compliments full XenDesktop VDI through the use of application off-load. It is entirely viable to load every application a user might need within their desktop VM, but this comes at a performance and management cost. Every VDI user on a given compute host will have a percentage of allocated resources consumed by running these applications which all have to be kept up to date and patched unless part of the base image. Leveraging XenApp with XenDesktop provides the ability to off-load applications and their loads from the VDI sessions to the XenApp hosts. Let XenApp absorb those burdens for the applications that make sense. Now instead of running MS Office in every VM, run it from XenApp and publish it to your VDI users. Patch it in one place, shrink your gold images for XenDesktop and free up resources for other more intensive non-XenApp friendly apps you really need to run locally. Best of all, your users won’t be able to tell the difference!

image

Optimization

We performed a number of tests to identify the optimal configuration for XenApp. There are a number of ways to go here: physical, virtual, or PVS streamed to physical/ virtual using a variety of caching options. There are also a number of ways in which XenApp can be optimized. Citrix wrote a very good blog article covering many of these optimization options, of which most we confirmed. The one outlier turned out to be NUMA where we really didn’t see much difference with it turned on or off. We ran through the following test scenarios using the core DWD architecture with LoginVSI light and medium workloads for both vSphere and Hyper-V:

  • Virtual XenApp server optimization on both vSphere and Hyper-V to discover the right mix of vCPUs, oversubscription, RAM and total number of VMs per host
  • Physical Windows 2012 R2 host running XenApp
  • The performance impact and benefit of NUMA enabled to keep the RAM accessed by a CPU local to its adjacent DIMM bank.
  • The performance impact of various provisioning mechanisms for VMs: MCS, PVS write cache to disk, PVS write cache to RAM
  • The performance impact of an increased user idle time to simulate a less than 80+% concurrency of user activity on any given host.

To identify the best XenApp VM config we tried a number of configurations including a mix of 1.5x CPU core oversubscription, fewer very beefy VMs and many less beefy VMs. Important to note here that we based on this on the 10-core Ivy Bridge part E5-2690v2 that features hyperthreading and Turbo boost. These things matter! The highest density and best user experience came with 6 x VMs each outfitted with 5 x vCPUs and 16GB RAM.  Of the delivery methods we tried (outlined in the table below), Hyper-V netted the best results regardless of provisioning methodology. We did not get a better density between PVS caching methods but PVS cache in RAM completely removed any IOPS generated against the local disk. I’ll got more into PVS caching methods and results in another post.

Interestingly, of all the scenarios we tested, the native Server 2012 R2 + XenApp combination performed the poorest. PVS streamed to a physical host is another matter entirely, but unfortunately we did not test that scenario. We also saw no benefit from enabling NUMA. There was a time when a CPU accessing an adjacent CPU’s remote memory banks across the interconnect paths hampered performance, but given the current architecture in Ivy Bridge and its fat QPIs, this doesn’t appear to be a problem any longer.

The “Dell Light” workload below was adjusted to account for less than 80% user concurrency where we typically plan for in traditional VDI. Citrix observed that XenApp users in the real world tend to not work all at the same time. Less users working concurrently means freed resources and opportunity to run more total users on a given compute host.

The net of this study shows that the hypervisor and XenApp VM configuration matter more than the delivery mechanism. MCS and PVS ultimately netted the same performance results but PVS can be used to solve other specific problems if you have them (IOPS).

image

* CPU % for ESX Hosts was adjusted to account for the fact that Intel E5-2600v2 series processors with the Turbo Boost feature enabled will exceed the ESXi host CPU metrics of 100% utilization. With E5-2690v2 CPUs the rated 100% in ESXi is 60000 MHz of usage, while actual usage with Turbo has been seen to reach 67000 MHz in some cases. The Adjusted CPU % Usage is based on 100% = 66000 MHz usage and is used in all charts for ESXi to account for Turbo Boost. Windows Hyper-V metrics by comparison do not report usage in MHz, so only the reported CPU % usage is used in those cases.

** The “Dell Light” workload is a modified VSI workload to represent a significantly lighter type of user. In this case the workload was modified to produce about 50% idle time.

†Avg IOPS observed on disk is 0 because it is offloaded to RAM.

Summary of configuration recommendations:

  • Enable Hyper-Threading and Turbo for oversubscribed performance gains.
  • NUMA did not show to have a tremendous impact enabled or disabled.
  • 1.5x CPU oversubscription per host produced excellent results. 20 physical cores x 1.5 oversubscription netting 30 logical vCPUs assigned to VMs.
  • Virtual XenApp servers outperform dedicated physical hosts with no hypervisor so we recommend virtualized XenApp instances.
  • Using 10-Core Ivy Bridge CPUs, we recommend running 6 x XenApp VMs per host, each VM assigned 5 x vCPUs and 16GB RAM.
  • PVS cache in RAM (with HD overflow) will reduce the user IO generated to disk almost nothing but may require greater RAM densities on the compute hosts. 256GB is a safe high water mark using PVS cache in RAM based on a 21GB cache per XenApp VM.

Resources:

Dell Wyse Datacenter for Citrix – Reference Architecture

XenApp/ XenDesktop Core Concepts

Citrix Blogs – XenApp Scalability

Performance Considerations for Enterprise VDI

[This post references portions of published test results from various Dell DVS Enterprise reference architectures. We design, build, test and publish E2E enterprise VDI architectures, SKU’d and sold globally. Head over here and have a look for more information.]
There are four core elements that we typically focus on for performance analysis of VDI: CPU, memory, disk, and network. Each plays a uniquely integral role in the system overall with the software in play defining how each element is consumed and to what extent. In this post I’ll go over some of the key considerations when planning an enterprise VDI infrastructure.

CPU

First things first, no matter what you’ve heard or read, the primary VDI bottleneck is CPU. We in CCC/ DVS at Dell prove this again and again, across all hypervisors, all brokers and any hardware configuration. There are special use case caveats to this of course, but generally speaking, your VDI environment will run out of compute CPU before it runs out of anything else! CPU is a finite shared resource unlike memory, disk or network which can all be incrementally increased or adjusted. There are many special purpose vendors and products out there that will tell you the VDI problem is memory or IOPS, those can be issues but you will always come back to CPU.
Intel’s Ivy Bridge is upon us now, delivering more cores at higher clocks with more cache and supporting faster memory. It is decidedly cheaper to purchase a more expensive pair of CPUs then it is to purchase an entire additional server. For that reason, we recommend running [near] top bin CPUs in your compute hosts as we see measurable benefit in running faster chips. For management hosts you can get by with a lower spec CPU but if you want to get the best return on investment for your compute hosts and get as many users as you can per host, buy the fast CPUs! Our recommendation in DVS enterprise will be following the lateral succession to Ivy Bridge from the Sandy Bridge parts we previously recommended: Xeon E5-2690v2 (IB) vs E5-2690 (SB). 
The 2690v2 is a 10 core part using a 22nm process with a 130w TDP clocking in at 3.0GHz and supporting up to 1866MHz DDR3 memory. We tested the top of the line E5-2697v2 (12 cores) as well as the faster 1866MHz memory and saw no meaningful improvement in either case to warrant a core recommendation. It’s all about the delicate balance of the right performance for the right price.
There is no 10c part in the AMD Opteron 6300 line so the closest competitor is the Opteron 6348 (Piledriver). As has always been the case, the AMD parts are a bit cheaper and run more efficiently. AMD clocks lower (with turbo) and due to no hyperthreading feature, executes fewer simultaneous threads. The 6348 also only supports 1600MHz memory but provides a few additional instruction sets. Both run 4 memory channels with an integrated memory controller. AMD also offers a 16c part at its top end in the 6386SE. I have no empirical data on AMD vs Intel for VDI at this time.
Relevant CPU spec comparison, our default selection for DVS Enterprise highlighted in red:

Performance analysis:

To drive this point home regarding the importance of CPU in VDI, here are 2 sets of test results published in my reference architecture for DVS Enterprise on XenDesktop, one on vSphere, one of Hyper-V, both based on Sandy Bridge (we haven’t published our Ivy Bridge data yet). MCS vs PVS is another discussion entirely but in either case, CPU is always the determining factor of scale. These graphs are based on tests using MCS and R720’s fitted with 2 x E5-2690 CPUs and 192GB RAM running the LoginVSI Light workload.
Hyper-V:
The 4 graphs below tell the tale fairly well for 160 concurrent users. Hyper-V does a very good job of optimizing CPU while consuming slightly higher amounts of other resources. Network consumption, while very reasonable and much lower than you might expect for 160 users, is considerably larger than in the vSphere use case in the next example. Once steady state has been reached, CPU peaks right around 85% which is the generally accepted utilization sweet spot making the most of your hardware investment while leaving head room for unforeseen spikes or temporary resource consumption. Memory in use is on the high side given the host had 192GB, but that can be easily remedied by raising to 256GB. 

vSphere:

Similar story for vSphere, although the user density below is representative of only 125 desktops of the same user workload. This speaks to another trend we are seeing more and more of which is a stark CPU performance reduction of vSphere compared to Hyper-V for non-View brokers. 35 fewer users overall here but disk performance is also acceptable. In the example below, CPU spikes slightly above 85% at full load with disk performance and network consumption well within reasonable margins. Where vSphere really shines is in it’s memory management capabilities thanks to features the likes of Transparent Page Sharing, as you can see the active memory is quite a bit lower than what has actually been assigned.

Are 4 sockets better than 2?

Not necessarily. 4-socket servers, such as the Dell PowerEdge R820, use a different mutli-processor (MP) CPU architecture, currently based on Sandy Bridge EP (E5-4600 family) versus the dual processor (DP) CPU architectures of its dual socket server counterparts.  MP CPUs and their 4-socket servers are inherently more expensive, especially considering the additional RAM required to run whatever additional user densities. 2 additional CPUs does not mean twice the user density in a 4-socket platform either!  A similarly configured 2-socket server is roughly half the cost of a 4-socket box and it is for this reason that we recommend the Dell PowerEdge R720 for DVS Enterprise. You will get more users on 2 x R720s cheaper than if you ran a single R820.

Memory

Memory architecture is an important consideration for any infrastructure planning project. Our experience shows that VDI appears to be less sensitive to memory bandwidth performance than other enterprise applications. Besides overall RAM density per host, DIMM speed and loading configuration are important considerations. In Sandy and Ivy Bridge CPUs, there are 4 memory channels, 3 DIMM slots each, per CPU (12 slots total). Your resulting DIMM clock speed and total available bandwidth will vary depending on how you populate these slots.

As you can see from the table below, loading all slots on a server via 3 DPC (3 DIMMs per channel) will result in a forced clock reduction to either 1066 or 1333 depending on the DIMM voltage. If you desire to run at 1600MHz or 1866Mhz (Ivy) you cannot populate the 3rd slot per channel which will net 8 empty DIMM slots per server. You’ll notice that the higher memory clocks are also achievable using lower voltage RDIMMs.

Make sure to always use the same DIMM size, clock and slot loading to ensure a balanced configuration. To follow the example of 256GB in a compute host, the proper loading to maintain maximum clock speeds and 4-channel bandwidth is as follows per CPU:

If 256GB is not required, leaving the 4th channel empty is also acceptable in “memory optimized” BIOS mode although it does reduce the overall memory bandwidth by 25%. In our tests with the older sandy bridge E5-2690 CPUs, we did not find that this affected desktop VM performance.

Disk

There are 3 general considerations for storage that vary depending on the requirements of the implementation: capacity, performance and protocol.
Usable capacity must be sufficient to meet the needs of both Tier1 and Tier2 storage requirements which will differ greatly based on persistent or non-persistent desktops. We generally see an excess of usable capacity as a result of the number of disks required to provide proper performance. This of course is not always the case as bottlenecks can often arise in other areas, such as array controllers. It is less expensive to run RAID10 in fewer arrays to achieve a given performance requirement, than it is to run more arrays at RAID50. Sometimes you need to maximize capacity, sometimes you need to maximize performance. Persistent desktops (full clones) will consume much more disk than their non-persistent counterparts so additional storage capacity can be purchased or a deduplication technology can be leveraged to reduce the amount of actual disk required. If using local disk, in a Local Tier 1 solution model, inline dedupe software can be implemented to reduce the amount of storage required by several fold. Some shared storage arrays have this capability built in. Other solutions, such as Microsoft’s native dedupe capability in Server 2012 R2, make use of file servers to host Hyper-V VMs via SMB3 to reduce the amount of storage required.
Disk performance is another deep well of potential and caveats again related directly to the problems one needs to solve.  A VDI desktop can consume anywhere from 3 to over 20 IOPS for steady state operation depending on the use case requirements. Sufficient steady state disk performance can be provided without necessarily solving the problems related to boot or login storms (many desktop VMs being provisioned/ booted or many users logging in all at once). Designing a storage architecture to withstand boot or login storms requires providing a large amount of available IOPS capability which can be via hardware or software based solutions, neither generally inexpensive. Some products combine the ability to provide high IOPS while also providing dedupe capabilities. Generally speaking, it is much more expensive to provide high performance for potential storms than it is to provide sufficient performance for normal steady state operations. When considering SSDs and shared storage, one needs to be careful to consider the capabilities of the array’s controllers which will almost always exhaust before the attached disk will. Just because you have 50K IOPS potential in your disk shelves on the back end, does not mean that the array is capable of providing that level of performance on the front end!
There is not a tremendous performance difference between storage protocols used to access a shared array on the front end, this boils down to preference these days. Fiber Channel has been proven to outperform iSCSI and file protocols (NFS) by a small margin but performance alone is not really reason enough to choose between them. Local disk also works well but concessions may need to be made with regard to HA and VM portability. Speed, reliability, limits/ maximums, infrastructure costs and features are key considerations when deciding on a storage protocol. At the end of the day, any of these methods will work well for an enterprise VDI deployment. What features do you need and how much are you willing to spend?

Network

Network utilization is consistently (and maybe surprisingly) low, often in the Kbps/ per user. VDI architectures in and of themselves simply don’t drive a ton of steady network utilization. VDI is bursty and will exhibit higher levels of consumption during large aggregate activities such as provisioning or logins. Technologies like Citrix Provisioning Server will inherently drive greater consumption by nature. What will drive the most variance here is much more reliant on upstream applications in use by the enterprise and their associated architectures. This is about as subjective as it gets, so impossible to speculate in any kind of fashion across the board. Now you will have a potentially high number of users on a large number of hosts, so comfortable network oversubscription planning should be done to ensure proper bandwidth in and out of the compute host or blade chassis. Utilizing enterprise-class switching components that are capable of operating at line rates for all ports is advisable. Will you really need hundreds of gigs of bandwidth? I really doubt it. Proper HA is generally desirable along with adherence to sensible network architectures (core/ distribution, leaf/spline). I prefer to do all of my routing at the core leaving anything Layer2 at the Top of Rack. Uplink to your core or distribution layers using 10Gb links which can be copper (TwinAx) for shorter runs or fiber for longer runs.

In Closing

That about sums it up for the core 4 performance elements. To put a bow on this, hardware utilization analysis is fine and definitely worth doing, but user experience is ultimately what’s important here. All components must sing together in harmony to provide the proper level of scale and user experience. A combination of subjective and automated monitoring tests during a POC will give a good indication of what users will experience.  At Dell, we use Stratusphere UX by Liquidware Labs to measure user experience in combination with Login Consultants LoginVSI for load generation. A personal, subjective test (actually log in to a session) is always a good idea when putting your environment under load, but a tool like Stratusphere UX can identify potential pitfalls that might otherwise go unnoticed.
Keeping tabs on latencies, queue lengths and faults, then reporting each users’ experience into a magic-style quadrant will give you the information required to ascertain if your environment will either perform as designed, or send you scrambling to make adjustments.

Server 2012 Native NIC Teaming and Hyper-V

One of the cool new features introduced in Server 2012 is the ability to team your NICs natively which is fully supported by Microsoft, without using any vendor-specific drivers or software. OEM driver-based teaming solutions have been around for much longer but have never been supported by Microsoft and are usually the first thing asked to be disabled if you ever call for PSS support. Server 2012 teaming is easy to configure and can be used for simple to very complex networking scenarios, converged or non. In this post I’ll cover the basics of teaming with convergence options having a focus around Hyper-V scenarios. A quick note on vernacular: Hyper-V host, parent, parent partition, or management OS all refer to essentially the same thing. Microsoft likes to refer to the network connections utilized by the hyper-V host as the “management OS” in the system dialogs.

100% of the teaming configuration can be handled via PowerShell with certain items specifically requiring it. Basic functionality can be configured via the GUI which is most easily accessed from the Local Server page in Server Manager:

The NIC Teaming app isn’t technically part of Server Manager but it sure looks and feels like it. Click New Team under the tasks drop down of the Teams section:

Name the team, select the adapters you want to include (up to 32) then set the additional properties. Switch independent teaming mode should suit the majority of use cases but can be changed to static or LACP mode if required.  Load balancing mode options consists of address hash or Hyper-V port. If you intend to use this team within a Hyper-V switch, make sure the latter is selected. Specifying all adapters as active should also serve the majority of use cases but a standby adapter can be specified if required.

Click ok and your new team will appear within the info boxes in the lower portion of the screen. Additional adapters can be added to an existing team at any time.

Every team and vNIC you create will be named and enabled via the Microsoft Network Adapter Multiplexor Driver. This is the device name that Hyper-V will see so name your teams intuitively and take note of which driver number is assigned to which team (multiplexor driver #2, etc).

In the Hyper-V Virtual Switch Manager, under the external network connection type you will see every physical NIC in the system as well as a multiplexor driver for every NIC team. Checking the box below the selected interface does exactly what it suggests: shares that connection with the management OS (parent).

VLANs (non Hyper-V)

VLANs can be configured a number of different ways depending on your requirements. The highest level of methods can be configured in the NIC team itself. From within the Adapters and Interfaces dialog, click the Team Interfaces tab, then right click the team you wish to configure.

image

Entering a specific VLAN will limit all traffic on this team accordingly. If you intend to use this team with a Hyper-V switch DO NOT DO THIS! It will likely cause confusion and problems later. Leave any team to be used in Hyper-V in default mode with a single team interface and do your filtering within Hyper-V.

Team Interfaces can also be configured from this dialog which will create vNICs tied to a specific VLAN. This can be useful for specific services that need to communicate on a dedicated VLAN not part of Hyper-V. First select the team on the left, then right and the Add Interface drop down item will become active.

image

Name the interface and set the intended VLAN. Once created these will also appear as multiplexor driver devices in the network connections list. They can then be assigned IP addresses the same as any other NIC. vNICs created in this manner are not intended for use in Hyper-V switches!

VLANS (Hyper-V)

For Hyper-V VLAN assignments, the preferred method is to let the Hyper-V switch perform all filtering. This varies a bit depending on if you are assigning management VLANs to the Hyper-V parent or to guest VMs. For guest VMs, VLAN IDs should be specified within vNICs connected to a Hyper-V port. If multiple VLANs need to be assigned, add multiple network adapters to the VM and assign VLAN IDs as appropriate. This can also be accomplished in PowerShell using the Set-VMNetworkAdapterVlan command.

NIC teaming can also be enabled within a guest VM by configuring the advanced feature item within the VM’s network adapter settings. This is an either/ or scenario, guest vNICs can be teamed if there is no teaming in the Management OS.

Assigning VLANs to interfaces used by the Hyper-V host can be done a couple of different ways. Most basically, if your Hyper-V host is to share the team or NIC of a virtual switch this can be specified in the Virtual Switch Manager for one of your external vSwitches. Optionally a different VLAN can be specified for the management OS.

Converged vs Non-converged

Before I go much further on carving VLANs for the management OS out of Hyper-V switches, let’s look at a few scenarios and identify why we may or may not want to do this. Thinking through some of these scenarios can be a bit tricky conceptually, especially if you’re used to working with vSphere. In ESXi all traffic sources and consumers go through a vSwitch, in all cases. ESXi management, cluster communications, vMotion, guest VM traffic…everything. In Hyper-V you can do it this way, but you don’t have to, and depending on how or what you’re deploying you may not want to.

First let’s look at a simple convergence model. This use case applies to a clustered VDI infrastructure with all desktops and management components running on the same set of hosts. 2 10Gb NICs configured in a team, all network traffic for both guest VMs and parent management will traverse the same interfaces, storage traffic split out via NPAR to MPIO drivers. The NICs are teamed for redundancy, guest VMs attach to the vSwitch and the management OS receives weighted vNICs from the same vSwitch. Once a team is assigned to a Hyper-V switch, it cannot be used by any other vSwitch.

Assigning vNICs to be consumed by the management OS is done via PowerShell using the Add-VMNetworkAdapter command. There is no way to do this via the GUI. The vNIC is assigned to the parent via the ManagementOS designator, named whatever you like and assigned to the virtual switch (as named in Virtual Switch Manager).

Once the commands are successfully issued, you will see the new vNICs created in the network connections list that can be assigned IP addresses and configured like any other interface.

You can also see the lay of the land in PowerShell (elevated) by issuing the Get-VMNetworkAdapter command.

image

Going a step further, assign VLANs and weighted priorities for the management OS vNICs:

Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Cluster” -Access -VlanId 25
Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Live Migration” -Access -VlanId 50
Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Management” -Access -VlanId 10

Set-VMNetworkAdapter -ManagementOS -Name “Cluster” -MinimumBandwidthWeight 40
Set-VMNetworkAdapter -ManagementOS -Name “Live Migration” -MinimumBandwidthWeight 20
Set-VMNetworkAdapter -ManagementOS -Name “Management” -MinimumBandwidthWeight 5

For our second scenario let’s consider a non-converged model with 6 NICs. 2 for iSCSI storage, 2 for the Hyper-V switch and 2 for the management OS. The NICs used to access any Ethernet based storage protocol traffic should not be teamed, let MPIO take care of this. The NIC team used by the Hyper-V switch will not share with any other host function, the same goes for the management team.

This is a perfectly acceptable method of utilizing more NICs on a Hyper-V host if you haven’t bought into the converged network models. Not everything must go through a vSwitch if you don’t want it to. Flexibility is good but this is the point that may confuse folks. vNICs dedicated to the management OS attached to the Mgmt NIC team can be carved out via PowerShell or via the Teaming GUI as Team Interfaces. Since these interfaces will exist and function outside of Hyper-V, it is perfectly acceptable to configure them in this manner. Storage traffic should be configured using MPIO and not NIC teaming. 

The native Server 2012 NIC teaming offering is powerful and a much welcomed addition to the Server software stack. There are certainly opportunities for Microsoft to enhance and simply this feature further by adding best practices gates to prevent misconfiguration. The bandwidth weighting calculations could also be made simpler and more in line with WYSIWYG. Converged or non, there are a number of ways to configure teaming in Server 2012 to fit any use case or solution model with an enhanced focus paid to Hyper-V.

Dell DVS Enterprise for Server 2012 RDS (2013 update)

We just published another version of our architecture for Server 2012 RDS originally published a year ago. You can find the full architecture here that includes all configuration and lab test items, but I’ll summarize some of the interesting items.

Increased User Density on Windows 8

Hyper-V 2012 previously posted some impressive numbers when we tested guest VMs running Windows 7. Hyper-V continues to impress as we put Windows 8 through it’s paces. Single host RDVH densities rose to 250 sessions with RDSH coming in at 300 (4 x RDSH VMs). This puts the Dell Windows 2012 RDS solution bundle at 4 servers total now at maximum scale (3 x compute + 1 mgmt) assuming no HA. Hyper-V is quickly leaving vSphere based solutions behind as those can no longer compete when it comes virtualization performance. It’s not even close anymore.

Shared Graphic Acceleration

New in this release is the support of shared GPU acceleration using AMD S7000 and S9000 cards. There is a bug in the nVidia-based enterprise GPU parts which we’ll enable as soon as it makes sense. GPU-enabled compute hosts have a lower CPU spec with a higher power demand. The workload we associate with GPU enablement is for premium+ users assuming a need for higher end compute resources. We net around 70 premium graphics user per GPU-enabled host. These can be added at any scale anywhere in the base RDS solution bundle. 

Dell VRTX for ROBO VDI

Newly launched is the Dell VRTX platform that combines up to 4 M-series server blades, shared DAS and networking all within a 5U chassis.

We have designed around the platform to provide scaled SMB VDI for 250-500 users in a clustered Hyper-V configuration. Check out the RA for a deeper dive!

Server 2012 RDS and Hyper-v continue to prove a formidable solution in the VDI space. There are scaling limits to RDS specifically that we impose due to limited management functionality over a certain user scale. Our recommendation for environments requiring greater than 600-700 pooled VDI users is to consider the Dell DVS Enterprise for vWorkspace solution that I talk about here.

Server 2012 RC to RTM Upgrade

I’m finally getting around to rebuilding my Server 2012 lab, having a mix of RC and Beta builds that will time bomb soon. A clean install of everything would be ideal but I’m feeling lazy and want to leave my existing settings and configurations in place. For my last lone 2012 Beta install, which is unfortunately my domain controller, I’ll have to rebuild, but for my RC installs I can upgrade. Although natively this process would fail and not officially supported by MSFT as an upgrade path, this can be done. Obviously this applies only to test environments as surely no one would run a non RTM OS build in production anyway.

The key is to edit the cversions.ini file in the sources folder of your installation media. Change the value of the “MinServer=” line to 8400.0 (the last RC build). This tells the upgrade installer that it’s ok to have at least the RC build for an in place upgrade. Otherwise the upgrade will fail.

To upgrade you must launch the installer from within the OS not via booting the media. If you forget and try the latter, the installer will kindly direct you to reboot and try again using the proper method. Unfortunately, this doesn’t work for Beta builds but you’ll probably want to upgrade those to fresher bits anyway.

For more info check out Ivo Beerens post here.

Unidesk: Layered VDI Management

VDI is one of the most intensive workloads in the datacenter today and by nature uses every major component of the enterprise technology stack: networking, servers, virtualization, storage, load balancing. No stone is left unturned when it comes to enterprise VDI. Physical desktop management can also be an arduous task with large infrastructure requirements of its own. The sheer complexity of VDI drives a lot of interesting and feverish innovation in this space but also drives a general adoption reluctance for some who fear the shift too burdensome for their existing teams and datacenters. The value proposition Unidesk 2.0 brings to the table is a simplification of the virtual desktops themselves, simplified management of the brokers that support them, and comprehensive application management .

The Unidesk solution plugs seamlessly into a new or existing VDI environment and is comprised of the following key components:

  • Management virtual appliance
  • Master CachePoint
  • Secondary CachePoints
  • Installation Machine

 

Solution Architecture

At its core, Unidesk is a VDI management solution that does some very interesting things under the covers. Unidesk requires vSphere at the moment but can manage VMware View, Citrix XenDesktop, Dell Quest vWorkspace, or Microsoft RDS. You could even manage each type of environment from a single Unidesk management console if you had the need or proclivity. Unidesk is not a VDI broker in and of itself, so that piece of the puzzle is very much required in the overall architecture. The Unidesk solution works from the concept of layering, which is increasingly becoming a hotter topic as both Citrix and VMware add native layering technologies to their software stacks. I’ll touch on those later. Unidesk works by creating, maintaining, and compositing numerous layers to create VMs that can share common items like base OS and IT applications, while providing the ability to persist user data including user installed applications, if desired. Each layer is stored and maintained as a discrete VMDK and can be assigned to any VM created within the environment. Application or OS layers can be patched independently and refreshed to a user VM. Because of Unidesk’s layering technology, customers needing persistent desktops can take advantage of capacity savings over traditional methods of persistence. A persistent desktop in Unidesk consumes, on average, a similar disk footprint to what a non-persistent desktop would typically consume.

CachePoints (CP) are virtual appliances that are responsible for the heavy lifting in the layering process. Currently there are two distinct types of CachPoints: Master and Secondary. The Master CP is the first to be provisioned during the setup process and maintains the primary copy of all layers in the environment. Master CPs replicate the pertinent layers to Secondary CPs who have the task of actually combining layers to build the individual VMs, a process called Compositing. Due to the role played by each CP type, the Secondary CPs will need to live on the Compute hosts with the VMs they create. Local or Shared Tier 1 solution models can be supported here, but the Secondary CPs will need to be able to the “CachePoint and Layers” volume at a minimum.

The Management Appliance is another virtual machine that comes with the solution to manage the environment and individual components. This appliance provides a web interface used to manage the CPs, layers, images, as well as connections to the various VDI brokers you need to interface with. Using the Unidesk management console you can easily manage an entire VDI environment almost completely ignoring vCenter and the individual broker management GUIs. There are no additional infrastructure requirements for Unidesk specifically outside of what is required for the VDI broker solution itself.

Installation Machines are provided by Unidesk to capture application layers and make them available for assignment to any VM in the solution. This process is very simple and intuitive requiring only that a given application is installed within a regular VM. The management framework is then able to isolate the application and create it as an assignable layer (VMDK). Many of the problems traditionally experienced using other application virtualization methods are overcome here. OS and application layers can be updated independently and distributed to existing desktop VMs.

Here is an exploded and descriptive view of the overall solution architecture summarizing the points above:

Storage Architecture

The Unidesk solution is able to leverage three distinct storage tiers to house the key volumes: Boot Images, CachePoint and Layers, and Archive.

  • Boot Images – Contains images having very small footprints and consist of a kernel and pagefile used for booting a VM. These images are stored as VMDKs, like all other layers, and can be easily recreated if need be. This tier does not require high performance disk.
  • CachePoint and Layers – This tier stores all OS, application, and personalization layers. Of the three tiers, this one sees the most IO so if you have high performance disk available, use it with this tier.
  • Archive – This tier is used for layer backup including personalization. Repairs and restored layers can be pulled from the archive and placed into the CachePoint and Layers volume for re-deployment, if need be. This tier does not require high performance disk.

image

The Master CP stores layers in the following folder structure, each layer organized and stored as a VMDK.

Installation and Configuration

New in Unidesk 2.x is the ability to execute a completely scripted installation. You’ll need to decide ahead of time what IPs and names you want to use for the Unidesk management components as these are defined during setup. This portion of the install is rather lengthy to it’s best to have things squared away before you begin. Once the environment variables are defined, the setup script takes over and builds the environment according to your design.

Once setup has finished, the Management appliance and Master CP will be ready, so you can log into the mgmt console to take the configuration further. Of the initial key activities to complete will be setting up an Active Directory junction point and connecting Unidesk to your VDI broker. Unidesk should already be talking to your vCenter server at this point.

Your broker mgmt server will need to have the Unidesk Integration Agent installed which you should find in the bundle downloaded with the install. This agent listens on TCP 390 and will connect the Unidesk management server to the broker. Once this agent is installed on the VMware View Connection Server or Citrix Desktop Delivery Controller, you can point the Unidesk management configuration at it. Once synchronized all pool information will be visible from the Unidesk console.

A very neat feature of Unidesk is that you can build many AD junction points from different forests if necessary. These junction points will allow Unidesk to interact with AD and provide the ability to create machine accounts within the domains.

Desktop VM and Application Management

Once Unidesk can talk to your vSphere and VDI environments, you can get started building OS layers which  will serve as your gold images for the desktops you create. A killer feature of the Unidesk solution is that you only need a single gold image per OS type even for numerous VDI brokers. Because the broker agents can be layered and deployed as needed, you can reuse a single image across disparate View and XenDesktop environments, for example. Setting up an OS layer simply points Unidesk at an existing gold image VM in vCenter and makes it consumable for subsequent provisioning. 

Once successfully created, you will see your OS layers available and marked as deployable.

 

Before you can install and deploy applications, you will need to deploy a Unidesk Installation Machine which is done quite simply from the System page. You should create an Installation Machine for each type of desktop OS in your environment.

Once the Installation Machine is ready, creating layers is easy. From the Layers page, simply select “Create Layer,” fill in the details, choose the OS layer you’ll be using along with the Installation machine and any prerequisite layers.

 

To finish the process, you’ll need to log into the Installation Machine, perform the install, then tell the Unidesk management console when you’re finished and the layer will be deployable to any VM.

Desktops can now be created as either persistent of non-persistent. You can deploy to already existing pools or if you need a new persistent pool created, Unidesk will take care of it. Choose the type of OS template to deploy (XP or Win7), select the connected broker to which you want to deploy the desktops, choose an existing pool or create a new one, and select the number of desktops to create.

Next select the CachePoint that will deploy the new desktops along with the network they need to connect to and the desktop type.

Select the OS layer that should be assigned to the new desktops.

Select the application layers you wish to assign to this desktop group. All your layers will be visible here.

Choose the virtual hardware, performance characteristics and backup frequency (Unidesk Archive) of the desktop group you are deploying.

Select an existing or create a new maintenance schedule that defines when layers can be updated within this desktop group.

Deploy the desktops.

Once the creation process is underway, the activity will be reflected under the Desktops page as well as in vCenter tasks. When completed all desktops will be visible and can be managed entirely from the Unidesk console.

Sample Architecture

Below are some possible designs that can be used to deploy Unidesk into a Local or Shared Tier 1 VDI solution model. For Local Tier 1, both the Compute and Management hosts will need access to shared storage, even though VDI sessions will be hosted locally on the Compute hosts. 1Gb PowerConnect or Force10 switches can be used in the Network layer for LAN and iSCSI. The Unidesk boot images should be stored locally on the Compute hosts along with the Secondary CachePoints that will host the sessions on that host. All of the typical VDI management components will still be hosted on the Mgmt layer hosts along with the additional Unidesk management components. Since the Mgmt hosts connect to and run their VMs from shared storage, all of the additional Unidesk volumes should be created on shared storage. Recoverability is achieved primarily in this model through use of the Unidesk Archive function. Any failed Compute host VDI session information can be recreated from the Archive on a surviving host.

Here is a view of the server network and storage architecture with some of the solution components broken out:

For Shared Tier 1 the layout is slightly different. The VDI sessions and “CachePoint and Layers” volumes must live together on Tier 1 storage while all other volumes can live on Tier 2. You could combine the two tiers for smaller deployments, perhaps, but your mileage will vary. Blades are also an option here, of course. All normal vSphere HA options apply here with the Unidesk Archive function bolstering the protection of the environment.

Unidesk vs. the Competition

Both Citrix and VMware have native solutions available for management, application virtualization, and persistence so you will have to decide if Unidesk if worth the price of admission. On the View side, if you buy a Premier license, you get ThinApp for applications, Composer for non-persistent linked clones, and soon the technology from VMware’s recent Wanova acquisition will be available. The native View persistence story isn’t great at the moment, but Wanova Mirage will change that when made available. Mirage will add a few layers to the mix including OS, apps, and persistent data but will not be as granular as the multi-layer Unidesk solution. The Wanova tech notwithstanding, you should be able to buy a cheaper/ lower level View license as with Unidesk you will need neither ThinApp nor Composer. Unidesk’s application layering is superior to ThinApp, with little in the way of applications that cannot be layered, and can provide persistent or non-persistent desktops with almost the same footprint on disk. Add to that the Unidesk single management pane for both applications and desktops, and there is a compelling value to be considered.

On the Citrix side, if you buy an Enterprise license, you get XenApp for application virtualization, Provisioning Services (PVS) and Personal vDisk (PVD) for persistence from the recent RingCube acquisition. With XenDesktop you can leverage Machine Creation Services (MCS) or PVS for either persistent or non-persistent desktops. MCS is deadly simple while PVS is incredibly powerful but an extraordinary pain to set up and configure. XenApp builds on top of Microsoft’s RDS infrastructure and requires additional components of its own such as SQL Server. PVD can be deployed with either catalog type, PVS or MCS, and adds a layer of persistence for user data and user installed applications. While PVD provides only a single layer, that may be more than suitable for any number of customers. The overall Citrix solution is time tested and works well although the underlying infrastructure requirements are numerous and expensive. XenApp offloads application execution from the XenDesktop sessions which will in turn drive greater overall host densities. Adding Unidesk to a Citrix stack again affords a customer to buy in at a lower licensing level, although Citrix is seemingly removing value for augmenting its software stack by including more at lower license levels. For instance, PVD and PVS are available at all licensing levels now. The big upsell now is for the inclusion of XenApp. Unidesk removes the need for MCS, PVS, PVD, and XenApp so you will have to ask yourself if the Unidesk approach is preferred to the Citrix method. The net result will certainly be less overall infrastructure required but net licensing costs may very well be a wash.

Dell DVS Reference Architecture for Windows Server 2012 (update)

This architecture is a comprehensive update to a previous RA I wrote a few months back on RDS VDI using Server 2012. Below are the pertinent updated content sections of the document or you can pull from its published location on dell.com (August 2012): LINK

Summary

The Dell Windows Server 2012 RDS solution provides a robust and scalable VDI platform for pooled, personal and Session host deployments. Using VDI-optimized hardware in a configuration that has been validated and proven by Dell DVS Engineering, you can deploy Microsoft based VDI that is both cost effective and high performing. Our layered architecture provides flexibility to maximize your infrastructure investment with the capability to expand and contract where necessary.

New content:

  • Single server 50 user/ POC offering
  • Improved densities for both RDSH and RDVH session sources
  • RDS options and Hyper-V architecture detail
  • HA offering
  • Updated test results and analysis

Local Tier 1 – Solution Layers

Only a single high performance Force10 S55 48-port switch is required to get started in the Network layer. This switch will host all solution traffic consisting of 1Gb iSCSI and LAN sources. Additional switches can be added and stacked as required to provide High Availability for the Network layer.

The Compute layer consists of the server resources responsible for hosting the user sessions, whether shared via RDSH (formerly Terminal Services) or pooled via RDVH (see section 4.5.1 for a detailed explanation of each role). The RDVH role requires Hyper-V as well as hardware assisted virtualization so must be installed into the parent partition of the Hyper-V instance. The RDSH role is enabled within dedicated VMs on the same or dedicated hosts in the Compute layer.

Management components are dedicated to their own layer so as to not negatively impact the user sessions running in the Compute layer. This physical separation of resources provides clean, linear, and predictable scaling without the need to reconfigure or move resources within the solution as you grow.The Management layer will host all the RDS VMs necessary to support the infrastructure as well as a file server to host SMB shares for user Profile Disks or data.

The Storage layer is made up by the capacity dense and performance capable Equallogic 4100E iSCSI array. 12TB is provided in base form that can scale as high as 36TB to suit your capacity requirements. A second 4100E can be added to group the two arrays to provided greater capacity or performance.

 

Local Tier 1 – 50 User/ Pilot

For a very small deployment or pilot effort to familiarize yourself with the solution architecture, we offer a 50 user/ pilot solution. This architecture is non-distributed with all VDI and Management functions on a single host. If additional scaling is desired, you can grow into a larger distributed architecture seamlessly with no loss on your initial investment.

Local Tier 1 – Combined

As a logical entry point to the distributed RDS solution stack, a combined architecture is offered to host both the RD Virtualization Host (RDVH) and RD Session Host (RDSH) roles within the same physical Compute host while separating the Management layer. This will enable users requiring either shared RDP or pooled VDI sessions to be hosted on the same physical server. The value of this solution is a minimum infrastructure investment with maximum VDI flexibility easily tailored to shared and pooled user types. Horizontal scaling is achieved simply by adding additional Compute hosts.

Local Tier 1 – Base

In the base distributed architecture the RDVH or RDSH roles are assigned to a dedicated Compute host. This architecture can support either a single RDVH or RDSH Compute host or one of each. This solution provides maximum Compute host user density for each broker model and allows clean linear upward scaling. You’ll notice that the hardware spec is slightly different for the two Compute host types, giving additional RAM to the virtualization host. This of course can be adjusted to suit your specific needs.

Fully Expanded

The fully expanded architecture provides linear upward scale for both the RDVH and RDSH roles optimized for 600 pooled VDI sessions or over 1000 shared. See Appendix for test results. This solution supports up to 4 Compute hosts of any combination running either RDVH or RDSH roles to meet the needs of the enterprise.

High Availability

High availability (HA) is currently offered to protect all layers of the solution architecture. An additional ToR switch is added to the Network layer and stacked to provide redundancy, additional Compute and Mgmt hosts are added to their respective layers, and Hyper-V clustering is introduced in the Management layer.

Solution Density Summary

Design Scale

Management Hosts

Compute Hosts

RDSH Sessions

OR

RDVH Sessions

HA

50 User / Pilot

0

1

100

50

Combined

1

1

130 + 75

+ 1 Compute

+1 Mgmt

Base

1

1

260

145

+ 1 Compute

+1 Mgmt

Expanded

1

2

520

290

+ 1 Compute

+1 Mgmt

Fully Expanded

1

4

1040

600

+ 1 Compute

+1 Mgmt

RDS Options

Server 2012 RDS provides a number of VDI options to meet your needs, all within a single, simple, wizard-driven environment that is easy to set up and manage.

  • Sessions, hosted by the RDSH role (formerly Terminal Services), provide easy access to a densely shared session environment. Each RDP-based session shares the total available server resources with all other sessions logged in concurrently on the server. This is the most cost effective option and a great place to start with Server 2012 RDS. An RDS CAL is required for each user or device accessing this type environment.
  • Pooled VMs are the non-persistent user desktop VMs traditionally associated with VDI. Each user VM is assigned a dedicated slice of the host server’s resources to guarantee the performance of each desktop. The desktop VM is dedicated to a single user while in use then returned to the pool at logoff or reboot and reset to a pristine gold image state for the next user. Applications can be built into gold images or published via RemoteApps. A VDA license is required for each non-PC device accessing this type of environment.
  • Personal VMs are persistent 1-to-1 desktop VMs assigned to a specific entitled user. All changes made by Personal VM users will persist through logoffs and reboots making this a truly personalized computing experience. A VDA license is required for each non-PC device accessing this type of environment.

Compute Server Infrastructure

The Compute host configuration varies slightly as to whether it will be hosting RDSH or RDVH roles, or both. The RDVH role must be enabled in the Hyper-V parent partition thus providing one RDVH role per Compute host if pooled or personal VMs are required. The RDSH role should be enabled in up to 4 VMs on a single Compute host to support up to 260 session-based users.

The requirements for RDSH VMs are outlined below. All application and non-OS related files should be installed in the 5GB data disk:

Role

vCPU

Startup RAM (GB)

Dynamic Memory

NIC

OS + Data

vDisk (GB)

Tier 2 Volume (GB)

Min|Max

Buffer

Weight

RD Session Host

8

16

512MB | 20GB

20%

Med

1

40 + 20

 

Management Server Infrastructure

The Management host configuration consists of VMs running in Hyper-V child partitions with the pertinent RDS roles enabled. No RDS roles need to be enabled in the root partition for Management hosts.

Management role requirements for the base solution are summarized below. Data disks should be used for role-specific application files/ data, logs, IIS web files, etc and should exist in the Management volume on the 4100E array. Please note that the Tier2 volume presented to the file server is designated as a pass-through disk (PTD).

Role

vCPU

Startup RAM (GB)

Dynamic Memory

NIC

OS + Data

vDisk (GB)

Tier 2 Volume (GB)

Min|Max

Buffer

Weight

RDCB + License Server

1

4

512MB|8GB

20%

Med

1

40 + 10

RDWA + RDG

1

4

512MB|8GB

20%

Med

1

40 + 10

File Server

1

4

512MB|8GB

20%

Med

1

40 + 10

2048 (PTD)

TOTALS

3

12

 

 

 

3

120 + 30

2048

 

Solution High Availability

Each layer in the solution architecture can be individually protected to prevent an extended service outage. The Network layer only requires an additional switch configured in a stack with the first.

Protecting the Compute layer for RDSH and RDVH is provided by adding an additional host to a collection, thus effectively increasing the hosting capacity of a given collection. Session requests will be fulfilled by all hosts in the collection and as a result, each will have reserve capacity to insure against a host failure. Care needs to be taken to ensure that user provisioning does not exceed the overflow capacity provided by the additional node. A simple fail-safe measure would be to ensure that the appropriate number of users entitled to connect to the environment be tightly controlled via Active Directory. In a failure scenario users working on a failed host would simply reconnect to a fresh session on a surviving Compute host.

To implement HA for the Management layer, we will also add an additional host but will add a few more layers of redundancy. The following will protect each of the critical infrastructure components in the solution:

  • The Management hosts will be configured in a Hyper-V cluster (Node and Disk Majority).
  • The storage volume that hosts the Management VMs will be upgraded to a Cluster Shared Volume (CSV).
  • SQL Server will be added to the environment to support RD Connection Broker HA.
    • Optionally SQL mirroring can be configured to further protect SQL.
  • The RD Connection Broker will be configured for HA.

Volumes

Host

Size (GB)

RAID

Storage Array

Purpose

File System

CSV

Management

1

500

50

Tier 2

RDS VMs, File Server

NTFS

Yes

Management

2

500

50

Tier 2

RDS VMs, File Server

NTFS

Yes

SQL Data

2

100

50

Tier 2

SQL Data Disk

NTFS

Yes

SQL Logs

2

100

50

Tier 2

SQL Logs Disk

NTFS

Yes

SQL TempDB Data

2

5

50

Tier 2

SQL TempDB Data Disk

NTFS

Yes

SQL TempDB Logs

2

5

50

Tier 2

SQL TempDB Logs Disk

NTFS

Yes

SQL Witness

1

1

50

Tier 2

SQL Witness Disk

NTFS

Yes

Quorum 1

500MB

50

Tier 2

Hyper-V Cluster Quorum

NTFS

Yes

User Data

2048

50

Tier 2

File Server

NTFS

No

User Profiles

20

50

Tier 2

User profiles

NTFS

No

Templates/ ISO

200

50

Tier 2

ISO/ gold image storage (optional)

NTFS

Yes

 

Architecture Flow

 

Dell DVS and Microsoft team up to deliver VDI on Windows Server 2012

My name is Peter and I am the Principal Engineering Architect for Desktop Virtualization at Dell.

The DVS team at Dell has partnered with Microsoft to launch a new product delivering VDI on Server 2012. This product was announced at the Microsoft Worldwide Partner Conference in Toronto this week and we have published a Reference Architecture detailing the solution (link). This initial release is targeting the SMB market providing support for ~500 pooled VDI desktops. The architecture can and will scale much higher but the intent was to get the ball rolling in the smaller markets.
The Dell solution stack for Remote Desktop Services (RDS) on Server 2012 can be configured in a few different ways.  RDS comprises two key roles for hosting desktops: RD Session Host (RDSH), formerly Terminal Services, and RD Virtualization Host (RDVH) for pooled or personal desktops. Both of these roles can coexist on the same compute host, if desired, to provide each type of VDI technology. Your lighter users that may only need web and email access, for example, should do fine on an RDSH host in a hosted shared session model. Knowledge workers would leverage the RDVH technology, comparable to VMware View and Citrix XenDesktop, for persistent or non-persistent desktops. Windows Server 2012 provides a one-stop-shop for VDI in a robust and easy to deliver software stack.

In this solution, Windows Server Hyper-V 2012 is run on both management and compute layer hosts in the parent partition, while all management roles are enabled on dedicated VMs in child partitions. In the combined solution stack, two VMs are created on the compute host to run the RDSH and RDVH roles, respectively. Density numbers are dependent on the amount of resources given to and consumed by the RDVH VDI sessions, so scaling is highly relative and dependent on the specific use case. Only 3 management VMs are required to host the RDS environment and unlike every other VDI solution on the market, SQL Server is not required here in the early stages. If you wish to provide HA for your RD connection broker, then a SQL Server is required. Top of rack we provide the best-in-class Force10 S55 1Gb switch that includes unique features such as the Open Automation and Virtualization frameworks. For user data and management VM storage we leverage the Equallogic PS4100E iSCSI array with 12TB of raw storage. As is the case in our other DVS Enterprise offerings, the network and storage layers are optional purchases from Dell.

The base offering of this solution dedicates the RDSH and RDVH roles in the compute layer depending on customer desire. Each role type has its own per host user densities that scale independently of each other. These are very conservative densities for the first phase of this product launch and in no way represent the limit of what this platform is capable of.

We top out the solution stack with 5 total servers, 4 in the compute layer, and 1 in the management layer. The compute layer hosts can be mixed and matched with regard to RD role.

The basic guidance at this point is that if this solution architecture meets the needs of your enterprise, fantastic, if not we urge you to look at the DVS Enterprise 6020 solution (discussed here). This was meant to serve as a high level overview of this new solution stack so if you’d like to dig deeper, please take a look at the RA document below.
Dell DVS Reference Architecture for Windows Server 2012: here.
Product launch announcement: Link

Resource Sharing in Windows Remote Desktop Services

Resource sharing is at the crux of every virtual environment and is ultimately what makes a shared environment feasible at all. A very common pain point in Remote Desktop Services/ Terminal Services (RDS) environments is the potential for a single user to negatively impact every other user on that RDS host. Server 2008 R2 includes a feature called Dynamic Fair Share Scheduling (DFSS) to balance CPU usage between users. This is a proactive feature so it is enabled by default and levels the CPU playing field at all times depending on how many users are logged in and CPU available. In Server 2012 the fair share features have been expanded to include network and disk as well.
From the Server 2012 RC Whitepaper, the 2012 fair share experience:

  • Network Fair Share. Dynamically distributes available bandwidth across sessions based on the number of active sessions to enable equal bandwidth usage.
  • Disk Fair Share. Prevents sessions from excessive disk usage by equal distribution of disk I/O among sessions.
  • CPU Fair Share. Dynamically distributes processor time across sessions based on the number of active sessions and load on these sessions. This was introduced in Windows Server 2008 R2 and has been improved

Cool! New features are great. Fair sharing has been traditionally an on or off feature, this is where the extent of the configuration ends. From what I can see in the 2012 RC, that doesn’t appear to have changed. Of course if you are running Citrix XenApp, you would want to disable all Windows fair share features and let XA take care of those functions. Fair sharing can be controlled via Group Policy or registry but only the CPU piece is visible in the RC GPOs. 

 

It is also stored all by itself in the registry under \Quota System.

I do see the additional fair share elements in the registry, however, so the missing GPO elements should appear in the RTS version of the product. Obviously, 1 = on, 0 = off. Looks like there is some registry organization work that still needs to happen.

In an environment where 100% of the users run 100% of the same applications with 100% predictable usage patterns, this model works fine. The trouble begins when you need to support an environment the requires some users or applications to be prioritized over others. There is also nothing there to deal with application memory usage. You could make the argument that this is the point at which these special users should be carved off of the shared session host and given VDI sessions. Luckily, WSRM is available to provide tighter controls in a shared session environment.
Windows System Resource Manager (WSRM) is a tool that has been around since Server 2003. It’s purpose is to granularly define resource limits as they pertain to specific users, sessions, applications or IIS app pools. WSRM’s use isn’t limited to RD Session Hosts, it just happens to be very useful in a shared environment. It should be noted that WSRM is a reactive tool, so you have to cross a certain threshold before the limits it imposes kick in. In the case of CPU, the host has to reach 70% utilization first, then any defined WSRM CPU limitation policies would begin to ratchet down the host. Best practices call for the use of targeted CPU limits to restrict resources, not memory limits. Use memory limits only if an application is exhibiting a memory consumption problem.
Here is a quick example of an allocation policy limiting IE to 25% CPU. This policy would need to be set as the managing policy for it to take affect after the host’s total CPU reached or exceeded 70%.


Another simpler option could be to use weighted remote sessions, categorizing users into basic, standard, and premium workloads to appropriately prioritize resources.

In the Server 2012 RC Add Roles and Features wizard, it is clearly called out that WSRM is now deprecated. Server 2012 will have the tool but the next server release in 4 years will not. Hopefully Microsoft has something up their sleeves to replace this tool or bolster the configurability of the fair sharing features.

References:
2012 list of deprecated features: http://technet.microsoft.com/en-us/library/hh831568.aspx
WSRM for Server 2012: http://technet.microsoft.com/library/hh997019
Server 2012 RC whitepaper: http://download.microsoft.com/download/5/D/B/5DB1C7BF-6286-4431-A244-438D4605DB1D/WS%202012%20White%20Paper_Hyper-V.pdf