Citrix – Exit | the | Fast

Dell EMC ScaleIO – Scale and Performance for Citrix XenServer

You’ll be hearing a lot more about Dell EMC ScaleIO in the coming months and years. ScaleIo is a Software Defined Storage solution from Dell EMC that boasts massive flexibility, scalability and performance potential. It can run on nearly any hardware configuration from any vendor, all-flash or spinning disk, as well as supports XenServer, Linux and VMware vSphere. Like most other SDS solutions, ScaleIO requires a minimum of three nodes to build a cluster but unlike most others, ScaleIO can scale to over 1000 nodes in a single contiguous cluster. ScaleIO uses a parallel IO technology that makes active use of all disks in the cluster at all times to process IO. This performance potential increases as you scale and add nodes with additional disks to the cluster. This translates to every VM within the cluster being able to leverage the entire available performance profile of every disk within the cluster. Limits can be imposed, of course, to prevent any single consumer from draining all available IO but the greater potential is immense.

Key Features

Tiering – Unlike most SDS offers available, ScaleIO requires no tiering. For an all-flash configuration, every SSD in every node is completely usable. DAS Cache is required on hybrid models for write-back caching. It should also be noted that for hybrid configurations, each disk type must be assigned to separate volumes. There is no mechanism to move data from SSD to HDD automatically.
Scalability – 3 nodes are required to get started and the cluster can scale to over 1000 nodes.
Flexibility – Software solution that supports all hardware, all major hypervisors. Able to be deployed hyper-converged or storage only.
Data protection – two-copy mirror mesh architecture eliminates single points of failure.
Enterprise features – QOS, thin provisioning, snapshotting.
Performance – Parallel IO architecture eliminates disk bottlenecks by using every disk available in the cluster 100% of the time. The bigger you scale, the more disks you have contributing capacity and parallel IO. This is the ScaleIO killer feature.

ScaleIO Architecture

ScaleIO can be configured in an all-combined Hyper-converged Infrastructure (HCI) model or in a traditional distributed storage model. The HCI variant installs all component roles on every node in the cluster with any being able to host VMs. The distributed storage model installs the server pieces on dedicated infrastructure utilized for presenting storage only. The client consumer pieces are then installed on separate compute-only infrastructure. This model offers the ability to scale storage and compute completely separately if desired. For this post I’ll be focusing on the the HCI model using Citrix XenServer. XenServer is a free to use open source hypervisor with a paid support option. It doesn’t get much easier than this with XenServer running on each node and XenCenter running on the admin device of your choosing. There is no additional management infrastructure required! ScaleIO is licensed via a simple $/TB model but too can be used in trial mode.

The following are the primary components that comprise the ScaleIO architecture:

ScaleIO Data Client (SDC) – Block device driver that runs on the same server instance as the consuming application, in this case VMs. In the HCI model all nodes will run the SDC.
ScaleIO Data Server (SDS) – Not to be confused with the general industry term SDS, “SDS” in the ScaleIO context is a server role installed on nodes that contribute storage to the cluster. The SDS performs IOs at the request of the SDCs. In the HCI model all nodes will run the SDS role.
Metadata Manager (MDM) – Very important role! The MDM manages the device mappings, volumes, snapshots, storage capacity, errors and failures. MDM also monitors the state of the storage system and initiates rebalances or rebuilds as required.
- The MDM communicates asynchronously with the SDC and SDS services via a separate data path so to not affect their performance.
- ScaleIO requires at least 3 instances of the MDM: Master, Slave and Tie-Breaker.
- A maximum of 5 MDM roles can be installed within a single cluster: 3 x MDMs + 2 x tie-breakers.
- Only one MDM Master is active at any time within a cluster. The other roles are passive unless a failure occurs.
ScaleIO Gateway – This role includes Installation Manager for initial setup as well as the REST Gateway and SNMP trap sender. The Gateway can be installed on the ScaleIO cluster nodes or an external management node.
- If the gateway is installed on a Linux server, it can only be used to deploy ScaleIO to Linux hosts.
- If the gateway is installed on a Windows server, it can be used to deploy ScaleIO to Linux or Windows hosts.
- XenServer, based on CentOS, fully qualifies as a “Linux host”.

Once the ScaleIO cluster is configured, all disks on each host are assigned to the SDS local to that host. Finally volumes are created and mounted as consumable to applications within the cluster.

The diagram below illustrates the relationship of the SDC and SDS roles in the HCI configuration:

My Lab Environment:

3 x Dell PowerEdge R630
- Dual Intel E5-2698v4
- 512GB RAM
- 480GB Boot SSD
- 4 x 400GB SSDs
- 5 x 1TB (7.2K RPM)
- 2 x 10Gb NICs
XenServer 6.5
ScaleIO 2.0
XenDesktop 7.11
I also used a separate Windows server on an old R610 for my ScaleIO Gateway/ Installation Manager

Here is the high-level architecture of my deployment. Note that there is a single compute volume which is mounted locally on each host via the XenServer Pool, so depicted below as logical on each:

Prep

As of this writing, XenServer 7.0 is shipping but 6.5 is the currently supported version by ScaleIO. The deployment steps for 7.0 should be very similar once officially supported. First install XenServer on each node, install XenCenter on your PC or management server, create a new pool, add all nodes to XenCenter, create a pool and fully patch each node with the XenCenter integrated utility. If the disks installed in your nodes have any prior formatting this needs to be removed in the PERC BIOS or via the fdisk utility within XenServer.

Now would be a good time to increase the memory available to Dom0 to the maximum 4GB, especially if you plan to run more than 50 VMs per node. From an SSH session or local command shell, execute:

/opt/xensource/libexec/xen-cmdline --set-xen dom0_mem=4096M,max:4096M

Install the packages required for ScaleIO on each node: numactl and libaio. OpenSSL needs to be updated to 1.0.1 using the XenCenter update utility via Hotfix XS65ESP1022.

Libaio should already be present but before numactl can be added, the repositories will need to be edited. Open the base repository configuration file:

vi /etc/yum.repos.d/CentOS-Base.repo

Enable the Base and released updates repositories changing “enabled=0” to “enabled=1”. Save the file, :wq

Next install numactl, libaio should report as already installed, nothing to do. Repeat this on each node.

yum install numactl

ScaleIO Installation

Download all the required ScaleIO files: Gateway for Windows or Linux, as well as all the installation packages for ScaleIO (SDC, SDS, MDM, xcache, and lia). Install the ScaleIO Gateway files, either for Windows to do an external remote deployment, or for Linux to install the gateway on one of the XenServer nodes. For this installation I used an external Windows server to conduct my deployment. The ScaleIO Gateway installation for Windows has two prerequisites which must be installed first:

Java JRE
Microsoft Visual C++ 2010 x64 Redist

Next run the gateway MSI which will create a local web instance used for the remainder of the setup process. Once complete, in the local browser connect to https://localhost/login.jsp, and login using admin plus the password you specified during the Gateway setup.

Once logged in, browse to and upload the XenServer installation packages to the Installation Manager (installed with the Gateway).

Here you can see that I have the ScaleIO installation packages for Xen 6.5 specifically.

Once ready to install, you will be presented with options to either upload an installation CSV file, easier for large deployments, or if just getting started you can select the installation wizard for 3 or 5-node clusters. I will be selecting the wizard for 3-nodes.

Specify the passwords for MDM and LIA, accept the EULA, then enter IP addresses and passwords for the mandatory three MDM instances at the bottom. The IP addresses at the bottom should be those of your physical XenServer nodes. Once all information is properly entered, the Start Installation button will become clickable. If you did not install the 1.0.1 OpenSSL patch earlier, this step will FAIL.

Several phases will follow (query, upload, install, configure) which should be initiated and monitored from the Monitor tab. You will be prompted to start the following phase assuming there were no failures during the current phase. Once all steps complete successfully, the operation will report as successful and can be marked as complete.

ScaleIO Configuration

Next install then open the ScaleIO GUI on your mgmt server or workstation and connect to master MDM node configured previously. The rest of the configuration steps will be carried out here.

First thing, from the Backend tab, rename your default system and Protection Domain names to something of your choosing. Then create a new storage pool or rename the default pool.

I’m naming my pools based on the disk media, flash and spinning. Create the pool within the Protection Domain, give it a name and select the caching options as appropriate.

Before we can build volumes we need to assign each disk to each SDS instance running on each node. First identify the disk device names on each host via SSH or local command shell by running:

fdisk -l

Each device will be listed as /dev/sdx. Right-click the first SDS on the first node and select Add Device. In the dialog that follows, add each local disk on this node, assign it to the appropriate pool and name it something meaningfully unique. If your disk add operation fails here, you probably have previous formatting on your disks which needs to be removed first using fdisk or the PERC BIOS! You can add both SSD and HDD devices as part of this operation.

Once this has been successfully completed for each node, you will see all disks assigned to each SDS in the GUI along with the total capacity contributed by each. You’ll notice that the circular disk icons next to each line are empty, because so are the disks at the moment.

Now keep in mind that ScaleIO is a mirror mesh architecture, so only half of that 4.4TB capacity is usable. From the Frontend tab, right-click the Storage Pool and create a new volume with this in mind. I’m using thick provisioning here for the sake of simplicity.

Once the volume is created, map it to each SDC which will make it available as a new disk device on each node. Right-click the volume you just created and select Map Volumes. Select all hosts then map volumes.

If you run fdisk -l now, you will see a new device called /dev/scinia. The final step required before we can use our new storage is mounting this new device on each host which only needs to be done once if your hosts are configured in a Pool. By default the XenServer LVM is filtering so does not see our ScaleIO devices called “scini”. Edit the lvm configuration file and add this device type as highlighted below. Pay particular care to the text formatting here or LVM will continue to ignore your new volumes.

vi /etc/lvm/lvm.conf

Next confirm that LVM can see the new ScaleIO device on each node, run:

lvmdiskscan

Now it’s time to create the XenServer Storage Repository (SR). Identify the UUIDs of the ScaleIO volumes presented to a host and identify the UUID of the host itself, just pick a host from the pool to work with. You will need the output of both of these commands when we create the SR.

ls -l /dev/disk/by-id | scini

xe host-list

This next part is critical! Because ScaleIO presents storage as a block device, specifically as an LVM, all thin provisioning support will be absent. This means that all VMs deployed in this environment will be full thick clones only. You can thin provision on the ScaleIO backend but all blocks must be allocated to your provisioned VMs. This may or may not work for your scenario but for those of you paying attention, this knowledge is your reward. 🙂

Another very important consideration is that you need to change the “name-label” value for every SR you create. XenServer will allow you to create duplicate entries leaving you to guess which is which later! Change the portions in red below to match your environment.

xe sr-create content-type="ScaleIO" host-uuid=8ce515b8-bd42-4cac-9f76-b6456501ad12 type=LVM device-config:device=/dev/disk/by-id/scsi-emc-vol-6f560f045964776a-d35d155b00000000 shared=true name-label="SIO_FlashVol"

This command will give no response if successful, only an error if it wasn’t. Verify that the volumes are now present by looking at the Pool storage tab in XenCenter. Creating this SR on one host will enable access to all within the pool.

Now the disk resources are usable and we can confirm on the ScaleIO side of things by going back to the dashboard in the ScaleIO GUI. Here we can see the total amount of raw storage in this cluster, 15.3TB, the spare amount is listed in blue (4.1TB), the thin yellow band is the volume capacity with the thicker green band behind it showing the protected capacity. Because I’m using thick volumes here, I only have 1.4TB unused. The rest of the information displayed here should be self-explanatory.

ScaleIO & XenDesktop

Since we’re talking about Citrix here, I would be remiss if I didn’t mention how XenDesktop works in a ScaleIO environment. The setup is fairly straight-forward and since we’re using XenServer as the hypervisor, there is no middle management layer like vCenter or SCVMM. XenDesktop talks directly to XenServer for provisioning, which is awesome.

If running a hybrid configuration, you can separate OS and Temporary files from PvDs, if desired. Here I’ve placed my PvD’s on my less expensive spinning volume. It’s important to note that Intellicache will not work with this architecture!

Select the VM network which should be a bond of at least 2 NICs that you created within your XenServer Pool. If you haven’t done this yet now would be a good time.

Finish the setup process then build a machine catalog using a gold master VM that you configured and shut down previously. I’ve configured this catalog for 100 VMs with 2GB RAM and 10GB disk cache. The gold master VM has a disk size of 24GB. Here you can see my catalog being built and VMs are being created on each host in my pool.

This is where things get a bit…sticky. Because ScaleIO is a block device, XenServer does not support the use of thin provisioning, as mentioned earlier. What this means is that the disk saving benefits of non-persistent VDI will be absent as well. Each disk image, PvD and temporary storage allocated to each VM in your catalog will consume the full allotment of its assignment. This may or may not be a deal breaker for you. The only way to get thin provisioning within XenServer & XenDesktop is by using shared storage presented as NFS or volumes presented as EXT3, which in this mix of ingredients, applies to local disk only. In short, if you choose to deploy VDI on ScaleIO using XenServer, you will have to use thick clones for all VMs.

Performance

Lastly, a quick note on performance, which I tested using the Diskspd utility from Microsoft. You can get very granular with diskspd to model your workload, if you know the characteristics. I ran the following command to model 4K blocks weighted at 70% random writes using 4 threads, cache enabled, with latency captured.

Diskspd.exe -b4k -d60 -L -o2 -t4 -r -w70 -c500M c:\io.dat

Here’s the output of that command real time from the ScaleIO viewpoint to illustrate performance against the SSD tier. Keep in mind this is all being generated from a single VM, all disks in the entire cluster are active. You might notice the capacity layout is different in the screenshot below, I reconfigured my protection domain to use thin provisioning vs thick when I ran this test.

Here is a look at the ScaleIO management component resource consumption during a diskspd test against the spinning tier generated from a single VM. Clearly the SDS role is the busiest and this will be the case across all nodes in the cluster. The SDC doesn’t generate enough load to make it into Top plus it changes its PID every second.

Closing

There are a lot of awesome HCI options in the market right now, many from Dell EMC. The questions you should be asking are: Does the HCI solution support the hypervisor I want to run? Does the HCI solution provide flexibility of hardware I want to run? And, is the HCI solution cost effective? ScaleIO might be perfect for your use case or maybe one of our solutions based on Nutanix or VSAN might be better suited. ScaleIO and XenServer can provide a massively scalable, massively flexible solution at a massively competitive price point. Keep in mind the rules on usable disk and thin provisioning when you go through your sizing exercise.

Resources

XenServer Hotfixes

Configure Dom0 memory

ScaleIO Installation Guide

ScaleIO User Guide

Dell XC Series 2.0: Product Architectures

Following our launch of the new 13G-based XC series platform, I present our product architectures for the VDI-specific use cases. Of the platforms available, this use case is focused on the extremely powerful 1U XC630 with Haswell CPUs and 2133MHz RAM. We offer these appliances on both Server 2012 R2 Hyper-V and vSphere 5.5 U2 with Citrix XenDesktop, VMware Horizon View, or Dell vWorkspace. All platform architectures have been optimized, configured for best performance and documented.

Platforms

We have three platforms to choose from optimized around cost and performance, all being ultimately flexible should specific parameters need to change. The A5 model is the most cost effective leveraging 8-core CPUs, 256GB RAM 2 x 200GB SSDs for performance and 4 x 1TB HDDs for capacity. For POCs, small deployments or light application virtualization, this platform is well suited. The B5 model steps up the performance by adding four cores per socket, increasing the RAM density to 384GB and doubling the performance tier to 2 x 400GB SSDs. This platform will provide the best bang for the buck on medium density deployments of light or medium level workloads. The B7 is the top of the line offering 16-core CPUs and a higher capacity tier of 6 x 1TB HDDs. For deployments requiring maximum density of knowledge or power user workloads, this is the platform for you.

At 1U with dual CPUs, 24 DIMM slots and 10 drive bays…loads of potential and flexibility!

Solution Architecture

Utilizing 3 platform hardware configurations, we are offering 3 VDI solutions on 2 hypervisors. Lots of flexibility and many options. 3 node cluster minimum is required with every node containing a Nutanix Controller VM (CVM) to handle all IO. The SATADOM is present for boot responsibilities to host the hypervisor as well as initial setup of the Nutanix Home area. The SSDs and NL SAS disks are passed through directly to each CVM which straddle the hypervisor and hardware. Every CVM contributes its directly-attached disks to the storage pool which is stretched across all nodes in the Nutanix Distributed File System (NDFS) cluster. NDFS is not dependent on the hypervisor for HA or clustering. Hyper-V cluster storage pools will present SMB version 3 to the hosts and vSphere clusters will be presented with NFS. Containers can be created within the storage pool to logically separate VMs based on function. These containers also provide isolation of configurable storage characteristics in the form of dedupe and compression. In other words, you can enable compression on your management VMs within their dedicated container, but not on your VDI desktops, also within their own container. The namespace is presented to the cluster in the form of \\NDFS_Cluster_name\container_name.
The first solution I’ll cover is Dell’s Wyse vWorkspace which supports either 2012 R2 Hyper-V or vSphere 5.5. For small deployments or POCs we offer this solution in a “floating mgmt” architecture which combines the vWorkspace infrastructure management roles and VDI desktops or shared session VMs. vWorkspace and Hyper-V enables a special technology for non-persistent/ shared image desktops called Hyper-V Catalyst which includes 2 components: HyperCache and HyperDeploy. Hyper-V Catalyst provides some incredible performance boosts and requires that the vWorkspace infrastructure components communicate directly with the hyper-V hypervisor. This also means that vWorkspace does not require SCVMM to perform provisioning tasks for non-persistent desktops!

HyperCache – Provides virtual desktop performance enhancement by caching parent VHDs in host RAM. Read requests are satisfied from cache including requests from all subsequent child VMs.
HyperDeploy – Provides instant cloning of parent VHDs massively diminishing virtual desktop pool deployment times.

You’ll notice the HyperCache components included on the Hyper-V architectures below. 3 to 6 hosts in the floating management model, depicted below with management, desktops and RDSH VMs logically separated only from a storage container perspective by function. The recommendation of 3-7 RDSH VMs is based our work optimizing around NUMA boundaries. I’ll dive deeper into that in an upcoming post. The B7 platform is used in the architectures below.

Above ~1000 users we recommend the traditional distributed management architecture to enable more predictable scaling and performance of both the compute and management hosts. The same basic architecture is the same and scales to the full extent supported by the hypervisor, is this case Hyper-V which supports up to 64 hosts. NDFS does not have a scaling limitation so several hypervisor clusters can be built within a single contiguous NDFS namespace. Our recommendation is to then build independent Failover Clusters for compute and management discretely so they can scale up or out independently.
The architecture below depicts a B7 build on Hyper-V applicable to Citrix XenDesktop or Wyse vWorkspace.

This architecture is relatively similar for Wyse vWorkspace or VMware Horizon View on vSphere 5.5 U2 but fewer total compute hosts per HA cluster, 32 total. For vWorkspace, Hyper-V Catalyst is not present in this scenario so vCenter is required to perform desktop provisioning tasks.

For the storage containers, the best practice of less is more still stands. If you don’t need a particular feature don’t enable it, as it will consume additional resources. Deduplication is always recommended on the performance tier since the primary OpLog lives on SSD and will always benefit. Dedupe or compression on the capacity tier is not recommended, of course you absolutely need it. And if you do prepare to increase each CVM RAM allocation to 32GB.

Container	Purpose	Replication Factor	Perf Tier Deduplication	Capacity Tier Deduplication	Compression
Ds_compute	Desktop VMs	2	Enabled	Disabled	Disabled
Ds_mgmt	Mgmt Infra VMs	2	Enabled	Disabled	Disabled
Ds_rdsh	RDSH Server VMs	2	Enabled	Disabled	Disabled

Network Architecture

As a hyperconverged appliance, the network architecture leverages the converged model. A pair of 10Gb NICs minimum in each node handle all traffic for the hypervisor, guests and storage operations between CVMs. Remember that the storage of all VMs is kept local to the host to which the VM resides, so the only traffic that will traverse the network is LAN and replication. There is no need to isolate storage protocol traffic when using Nutanix.
Hyper-V and vSphere are functionally similar. For Hyper-V there are 2 vSwitches per host, 1 external that aggregates all of the services of the host management OS as well as the vNICs for the connected VMs. The 10Gb NICs are connected to a LBFO team configured in Dynamic Mode. The CVM alone connects to a private internal vSwitch so it can communicate with the hypervisor.

In vSphere it’s the same story but with the concept of Port Groups and vMotion.

We have tested the various configurations per our standard processes and documented the performance results which can be found in the link below. These docs will be updated as we validate additional configurations.

Product Architectures for 13G XC launch:

http://bit.ly/DellXCLib

Resources:

About Wyse vWorkspace HyperCache
About Wyse vWorkspace HyperDeploy
SJbWUYl4LRCwFNcatHuG

Dell XC Series – Product Architectures

Hyperconverged Web-scale Software Defined Storage (SDS) solutions are white hot right now and Nutanix is leading the pack with their ability to support all major hypervisors (vSphere, Hyper-V and KVM) while providing nearly unlimited scale. Dell partnering with Nutanix was an obvious mutually beneficial choice for the reasons above plus supplying a much more robust server platform. Dell also provides a global reach for services and support as well as solving other challenges such as hypervisors installed in the factory.

Nutanix operates below the hypervisor layer so as a result requires a lot of tightly coupled interaction with the hardware directly. Many competing platforms in this space sit above the hypervisor so require vSphere, for example, to provide access to storage and HA but they are also limited by the hypervisor’s limitations (scale). Nutanix uses its own algorithm for clustering and doesn’t rely on a common transactional database which can cause additional challenges when building solutions that span multiple sites. Because of this the Nutanix Distributed Filesystem (NDFS) supports no known limits of scale. There are current Nutanix installations in the thousands of nodes across a contiguous namespace and now you can build them on Dell hardware.

Along with the Dell XC720xd appliances, we have released a number of complementary workload Product Architectures to help customers and partners build solutions using these new platforms. I’ll discuss the primary architectural elements below.

Wyse Datacenter Appliance Architecture for Citrix

Wyse Datacenter Appliance Architecture for VMware

Wyse Datacenter Appliance Architecture for vWorkspace

Nutanix Architecture

Three nodes minimum are required for NDFS to achieve quorum so that is the minimum solution buy in, then storage and compute capacity can be increased incrementally beyond by adding one or more nodes to an existing cluster. The Nutanix architecture uses a Controller VM (CVM) on each host which participates in the NDFS cluster and manages the hard disks local to its own host. Each host requires two tiers of storage: high performance/ SSD and capacity/ HDD. The CVM manages the reads/writes on each host and automatically tiers the IO across these disks. A key value proposition of the Nutanix model is data locality which means that the data for a given VM running on a given host is stored locally on that host as apposed to having reads and writes crossing the network. This model scales indefinitely in a linear block manner where you simply buy and add capacity as you need it. Nutanix creates a storage pool that is distributed across all hosts in the cluster and presents this pool back to the hypervisor as NFS or SMB.

You can see from the figure below that the CVM engages directly with the SCSI controller through which it accesses the disks local to the host it resides. Since Nutanix sits below the hypervisor and handles its own clustering and data HA, it is not dependent upon the hypervisor to provide any features nor is it limited by any related limitations.

From a storage management and feature perspective, Nutanix provides two tiers of optional deduplication performed locally on each host (SSD and HDD individually), compression, tunable replication (number of copies of each write spread across disparate nodes in the cluster) and data locality (keeps data local to the node the VM lives on). Within a storage pool, containers are created to logically group VMs stored within the namespace and enable specific storage features such as replication factor and dedupe. Best practice says that a single storage pool spread across all disks is sufficient but multiple containers can be used. The image below shows an example large scale XC-based cluster with a single storage pool and multiple containers.

While the Nutanix architecture can theoretically scale indefinitely, practicality might dictate that you design your clusters around the boundaries of the hypervisors, 32 nodes for vSphere, 64 nodes for Hyper-v. The decision to do this will be more financially impactful if you separate your resources along the lines of compute and management in distinct SDS clusters. You could also, optionally, install many maximum node hypervisor clusters within a single very large, contiguous Nutanix namespace, which is fully supported. I’ll discuss the former option below as part of our recommended pod architecture.

Dell XC720xd platforms

For our phase 1 launch we have five platforms to offer that vary in CPU, RAM and size/ quantity of disks. Each appliance is 2U, based on the 3.5” 12-gen PowerEdge R720XD and supports from 5 to 12 total disks, each a mix of SSD and HDD. The A5 platform is the smallest with a pair of 6-core CPUs, 200GB SSDs and a recommended 256GB RAM. The B5 and B7 models are almost identical except for the 8-core CPU on the B5 and the 10-core CPU on the B7. The C5 and C7 boast a slightly higher clocked 10-core CPU with doubled SSD densities and 4-5x more in the capacity tier. The suggested workloads are specific with the first three targeted at VDI customers. If greater capacity is required, the C5 and C7 models work very well for this purpose too.

For workload to platform sizing guidance, we make the following recommendations:

Platform	Workload	Special Considerations
A5	Basic/ light task users, app virt	Be mindful of limited CPU cores and RAM densities
B5	Medium knowledge workers	Additional 4 cores and greater RAM to host more VMs or sessions
B7	Heavy power users	20 cores per node + a recommended 512GB RAM to minimize oversubscription
C5	Heavy power users	Higher density SSDs + 20TB in the capacity tier for large VMs or amount of user data
C7	Heavy power users	Increased number of SSDs with larger capacity for greater amount of T1 performance

Here is a view of the 12G-based platform representing the A5-B7 models. The C5 and C7 would add additional disks in the second disk bay. The two disks in the rear flexbay are 160GB SSDs configured in RAID1 via PERC used to host the hypervisor and CVM, these disks do not participate in the storage pool. The six disks in front are controlled by the CVM directly via the LSI controller and contribute to the distributed storage pool across all nodes.

Dell XC Pod Architecture

This being a 10Gb hyperconverged architecture, the leaf/ spine network model is recommended. We do recommend a 1Gb switch stack for iDRAC/ IPMI traffic and build the leaf layer from 10Gb Force10 parts. The S4810 is shown in the graphic below which is recommended for SFP+ based platforms or the S4820T can be used for 10GBase-T.

In our XC series product architecture, the compute, management and storage layers, typically all separated, are combined here into a single appliance. For solutions based on vWorkspace under 10 nodes, we recommend a “floating management” architecture which allows the server infrastructure VMs to move between hosts also being used for desktop VMs or RDSH sessions. You’ll notice in the graphics below that compute and management are combined into a single hypervisor cluster which hosts both of these functions.

Hyper-V is shown below which means the CVMs present the SMBv3 protocol to the storage pool. We recommend three basic containers to separate infrastructure mgmt, desktop VMs and RDSH VMs. We recommend the following feature attributes based on these three containers (It is not supported to enable compression and deduplication on the same container):

Container	Purpose	Replication Factor	Perf Tier Deduplication	Capacity Tier Deduplication	Compression
Ds_compute	Desktop VMs	2	Enabled	Enabled	Disabled
Ds_mgmt	Mgmt Infra VMs	2	Enabled	Disabled	Disabled
Ds_rdsh	RDSH Server VMs	2	Enabled	Enabled	Disabled

You’ll notice that I’ve included the resource requirements for the Nutanix CVMs (8 x vCPUs, 32GB vRAM). The vRAM allocation can vary depending on the features you enable within your SDS cluster. 32GB is required, for example, if you intend to enable both SSD and HDD deduplication. If you only require SSD deduplication and leave the HDD tier turned off, you can reduce your CVM vRAM allocation to 16GB. We highly recommend that you disable any features that you do not need or do not intend to use!

For vWorkspace solutions over 1000 users or solutions based on VMware Horizon or Citrix XenDesktop, we recommend separating the infrastructure management in all cases. This allows management infrastructure to run in its own dedicated hypervisor cluster while providing very clear and predictable compute capacity for the compute cluster. The graphic below depicts a 1000-6000 user architecture based on vWorkspace on Hyper-V. Notice that the SMB namespace is stretched across both of the discrete compute and management infrastructure clusters, each scaling independently. You could optionally build dedicated SDS clusters for compute and management if you desire, but remember the three node minimum, which would raise your minimum build to 6 nodes in this scenario.

XenDesktop on vSphere, up to 32 nodes max per cluster, supporting around 2500 users in this architecture:

Horizon View on vSphere, up to 32 nodes max per cluster. supporting around 1700 users in this architecture:

Network Architecture

Following the leaf/ spine model, each node should connect 2 x 10Gb ports to a leaf switch which are then fully mesh connected to an upstream set of spine switches.

On each host there are two virtual switches: one for external access to the LAN and internode communication and one private internal vSwitch used for the CVM alone. On Hyper-V the two NICs are configured in a LBFO team on each host with all management OS vNICs connected to it.

vSphere follows the same basic model except for port groups configured for the VM type and VMKernel ports configured for host functions:

Performance results

The tables below summarize the user densities observed for each platform during our testing. Please refer to the product architectures linked at the beginning of this post for the detailed performance results for each solution.

Resources:

http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2014/11/05/dell-world-two-questions-cloud-client-computing

http://blogs.citrix.com/2014/11/07/dell-launches-new-appliance-solution-for-desktop-virtualization/

http://blogs.citrix.com/2014/11/10/xendesktop-technologies-introduced-as-a-new-dell-wyse-datacenter-appliance-architecture/

http://blogs.vmware.com/euc/2014/11/vmware-horizon-6-dell-xc-delivering-new-economics-simplicity-desktop-application-virtualization.html

http://stevenpoitras.com/the-nutanix-bible/

http://www.dell.com/us/business/p/dell-xc-series/pd

Citrix XenDesktop and PVS: A Write Cache Performance Study

If you’re unfamiliar, PVS (Citrix Provisioning Server) is a vDisk deployment mechanism available for use within a XenDesktop or XenApp environment that uses streaming for image delivery. Shared read-only vDisks are streamed to virtual or physical targets in which users can access random pooled or static desktop sessions. Random desktops are reset to a pristine state between logoffs while users requiring static desktops have their changes persisted within a Personal vDisk pinned to their own desktop VM. Any changes that occur within the duration of a user session are captured in a write cache. This is where the performance demanding write IOs occur and where PVS offers a great deal of flexibility as to where those writes can occur. Write cache destination options are defined via PVS vDisk access modes which can dramatically change the performance characteristics of your VDI deployment. While PVS does add a degree of complexity to the overall architecture, since its own infrastructure is required, it is worth considering since it can reduce the amount of physical computing horsepower required for your VDI desktop hosts. The following diagram illustrates the relationship of PVS to Machine Creation Services (MCS) in the larger architectural context of XenDesktop. Keep in mind also that PVS is frequently used to deploy XenApp servers as well.

PVS 7.1 supports the following write cache destination options (from Link):

Cache on device hard drive – Write cache can exist as a file in NTFS format, located on the target-device’s hard drive. This write cache option frees up the Provisioning Server since it does not have to process write requests and does not have the finite limitation of RAM.
Cache on device hard drive persisted (experimental phase only) – The same as Cache on device hard drive, except cache persists. At this time, this write cache method is an experimental feature only, and is only supported for NT6.1 or later (Windows 7 and Windows 2008 R2 and later).
Cache in device RAM – Write cache can exist as a temporary file in the target device’s RAM. This provides the fastest method of disk access since memory access is always faster than disk access.
Cache in device RAM with overflow on hard disk – When RAM is zero, the target device write cache is only written to the local disk. When RAM is not zero, the target device write cache is written to RAM first.
Cache on a server – Write cache can exist as a temporary file on a Provisioning Server. In this configuration, all writes are handled by the Provisioning Server, which can increase disk IO and network traffic.
Cache on server persistent – This cache option allows for the saving of changes between reboots. Using this option, after rebooting, a target device is able to retrieve changes made from previous sessions that differ from the read only vDisk image.

Many of these were available in previous versions of PVS, including cache to RAM, but what makes v7.1 more interesting is the ability to cache to RAM with the ability to overflow to HDD. This provides the best of both worlds: extreme RAM-based IO performance without the risk since you can now overflow to HDD if the RAM cache fills. Previously you had to be very careful to ensure your RAM cache didn’t fill completely as that could result in catastrophe. Granted, if the need to overflow does occur, affected user VMs will be at the mercy of your available HDD performance capabilities, but this is still better than the alternative (BSOD).

Results

Even when caching directly to HDD, PVS shows lower IOPS/ user numbers than MCS does on the same hardware. We decided to take things a step further by testing a number of different caching options. We ran tests on both Hyper-V and ESXi using our standard 3 user VM profiles against LoginVSI’s low, medium, high workloads. For reference, below are the standard user VM profiles we use in all Dell Wyse Datacenter enterprise solutions:

Profile Name	Number of vCPUs per Virtual Desktop	Nominal RAM (GB) per Virtual Desktop	Use Case
Standard	1	2	Task Worker
Enhanced	2	3	Knowledge Worker
Professional	2	4	Power User

We tested three write caching options across all user and workload types: cache on device HDD, RAM + Overflow (256MB) and RAM + Overflow (512MB). Doubling the amount of RAM cache on more intensive workloads paid off big netting a near host IOPS reduction to 0. That’s almost 100% of user generated IO absorbed completely by RAM. We didn’t capture the IOPS generated in RAM here using PVS, but as the fastest medium available in the server and from previous work done with other in-RAM technologies, I can tell you that 1600MHz RAM is capable of tens of thousands of IOPS, per host. We also tested thin vs thick provisioning using our high end profile when caching to HDD just for grins. Ironically, thin provisioning outperformed thick for ESXi, the opposite proved true for Hyper-V. To achieve these impressive IOPS number on ESXi it is important to enable intermediate buffering (see links at the bottom). I’ve highlighted the more impressive RAM + overflow results in red below. Note: IOPS per user below indicates IOPS generation as observed at the disk layer of the compute host. This does not mean these sessions generated close to no IOPS.

Hyper-visor	PVS Cache Type	Workload	Density	Avg CPU %	Avg Mem Usage GB	Avg IOPS/User	Avg Net KBps/User
ESXi	Device HDD only	Standard	170	95%	1.2	5	109
ESXi	256MB RAM + Overflow	Standard	170	76%	1.5	0.4	113
ESXi	512MB RAM + Overflow	Standard	170	77%	1.5	0.3	124
ESXi	Device HDD only	Enhanced	110	86%	2.1	8	275
ESXi	256MB RAM + Overflow	Enhanced	110	72%	2.2	1.2	284
ESXi	512MB RAM + Overflow	Enhanced	110	73%	2.2	0.2	286
ESXi	HDD only, thin provisioned	Professional	90	75%	2.5	9.1	250
ESXi	HDD only thick provisioned	Professional	90	79%	2.6	11.7	272
ESXi	256MB RAM + Overflow	Professional	90	61%	2.6	1.9	255
ESXi	512MB RAM + Overflow	Professional	90	64%	2.7	0.3	272

For Hyper-V we observed a similar story and did not enabled intermediate buffering at the recommendation of Citrix. This is important! Citrix strongly recommends to not use intermediate buffering on Hyper-V as it degrades performance. Most other numbers are well inline with the ESXi results, save for the cache to HDD numbers being slightly higher.

Hyper-visor	PVS Cache Type	Workload	Density	Avg CPU %	Avg Mem Usage GB	Avg IOPS/User	Avg Net KBps/User
Hyper-V	Device HDD only	Standard	170	92%	1.3	5.2	121
Hyper-V	256MB RAM + Overflow	Standard	170	78%	1.5	0.3	104
Hyper-V	512MB RAM + Overflow	Standard	170	78%	1.5	0.2	110
Hyper-V	Device HDD only	Enhanced	110	85%	1.7	9.3	323
Hyper-V	256MB RAM + Overflow	Enhanced	110	80%	2	0.8	275
Hyper-V	512MB RAM + Overflow	Enhanced	110	81%	2.1	0.4	273
Hyper-V	HDD only, thin provisioned	Professional	90	80%	2.2	12.3	306
Hyper-V	HDD only thick provisioned	Professional	90	80%	2.2	10.5	308
Hyper-V	256MB RAM + Overflow	Professional	90	80%	2.5	2.0	294
Hyper-V	512MB RAM + Overflow	Professional	90	79%	2.7	1.4	294

Implications

So what does it all mean? If you’re already a PVS customer this is a no brainer, upgrade to v7.1 and turn on “cache in device RAM with overflow to hard disk” now. Your storage subsystems will thank you. The benefits are clear in both ESXi and Hyper-V alike. If you’re deploying XenDesktop soon and debating MCS vs PVS, this is a very strong mark in the “pro” column for PVS. The fact of life in VDI is that we always run out of CPU first, but that doesn’t mean we get to ignore or undersize for IO performance as that’s important too. Enabling RAM to absorb the vast majority of user write cache IO allows us to stretch our HDD subsystems even further, since their burdens are diminished. Cut your local disk costs by 2/3 or stretch those shared arrays 2 or 3x. PVS cache in RAM + overflow allows you to design your storage around capacity requirements with less need to overprovision spindles just to meet IO demands (resulting in wasted capacity).
References:
DWD Enterprise Reference Architecture
http://support.citrix.com/proddocs/topic/provisioning-7/pvs-technology-overview-write-cache-intro.html
When to Enable Intermediate Buffering for Local Hard Drive Cache

XenApp 7.x Architecture and Sizing

Peter Fine here from Dell CCC Solution Engineering, where we just finished an extensive refresh for our XenApp recommendation within the Dell Wyse Datacenter for Citrix solution architecture. Although not called “XenApp” in XenDesktop versions 7 and 7.1 of the product, the name has returned officially for version 7.5. XenApp is still architecturally linked to XenDesktop from a management infrastructure perspective but can also be deployed as a standalone architecture from a compute perspective. The best part of all now is flexibility. Start with XenApp or start with XenDesktop then seamlessly integrate the other at a later time with no disruption to your environment. All XenApp really is now, is a Windows Server OS running the Citrix Virtual Delivery Agent (VDA). That’s it! XenDesktop on the other hand = a Windows desktop OS running the VDA.

Architecture

The logical architecture depicted below displays the relationship with the two use cases outlined in red. All of the infrastructure that controls the brokering, licensing, etc is the same between them. This simplification of architecture comes as a result of XenApp shifting from the legacy Independent Management Architecture (IMA) to XenDesktop’s Flexcast Management Architecture (FMA). It just makes sense and we are very happy to see Citrix make this move. You can read more about the individual service components of XenDesktop/ XenApp here.

Expanding the architectural view to include the physical and communication elements, XenApp fits quite nicely with XenDesktop and compliments any VDI deployment. For simplicity, I recommend using compute hosts dedicated to XenApp and XenDesktop, respectively, for simpler scaling and sizing. Below you can see the physical management and compute hosts on the far left side with each of their respective components considered within. Management will remain the same regardless of what type of compute host you ultimately deploy but there are several different deployment options. Tier 1 and tier 2 storage are comprehended the same way when XenApp is in play, which can make use of local or shared disk depending on your requirements. XenApp also integrates nicely with PVS which can be used for deployment and easy scale out scenarios. I have another post queued up for PVS sizing in XenDesktop.

From a stack view perspective, XenApp fits seamlessly into an existing XenDesktop architecture or can be deployed into a dedicated stack. Below is a view of a Dell Wyse Datacenter stack tailored for XenApp running on either vSphere or Hyper-v using local disks for Tier 1. XenDesktop slips easily into the compute layer here with our optimized host configuration. Be mindful of the upper scale utilizing a single management stack as 10K users and above is generally considered very large for a single farm. The important point to note is that the network, mgmt and storage layers are completely interchangeable between XenDesktop and XenApp. Only the host config in the compute layer changes slightly for XenApp enabled hosts based on our optimized configuration.

Use Cases

There are a number of use cases for XenApp which ultimately relies on Windows Server’s RDSH role (terminal services). The age-old and most obvious use case is for hosted shared sessions, i.e. many users logging into and sharing the same Windows Server instance via RDP. This is useful for managing access to legacy apps, providing a remote access/ VPN alternative, or controlling access to an environment through which can only be accessed via the XenApp servers. The next step up naturally extends to application virtualization where instead of multiple users being presented with and working from a full desktop, they simply launch the applications they need to use from another device. These virtualized apps, of course, consume a full shared session on the backend even though the user only interacts with a single application. Either scenario can now be deployed easily via Delivery Groups in Citrix Studio.

XenApp also compliments full XenDesktop VDI through the use of application off-load. It is entirely viable to load every application a user might need within their desktop VM, but this comes at a performance and management cost. Every VDI user on a given compute host will have a percentage of allocated resources consumed by running these applications which all have to be kept up to date and patched unless part of the base image. Leveraging XenApp with XenDesktop provides the ability to off-load applications and their loads from the VDI sessions to the XenApp hosts. Let XenApp absorb those burdens for the applications that make sense. Now instead of running MS Office in every VM, run it from XenApp and publish it to your VDI users. Patch it in one place, shrink your gold images for XenDesktop and free up resources for other more intensive non-XenApp friendly apps you really need to run locally. Best of all, your users won’t be able to tell the difference!

Optimization

We performed a number of tests to identify the optimal configuration for XenApp. There are a number of ways to go here: physical, virtual, or PVS streamed to physical/ virtual using a variety of caching options. There are also a number of ways in which XenApp can be optimized. Citrix wrote a very good blog article covering many of these optimization options, of which most we confirmed. The one outlier turned out to be NUMA where we really didn’t see much difference with it turned on or off. We ran through the following test scenarios using the core DWD architecture with LoginVSI light and medium workloads for both vSphere and Hyper-V:

Virtual XenApp server optimization on both vSphere and Hyper-V to discover the right mix of vCPUs, oversubscription, RAM and total number of VMs per host
Physical Windows 2012 R2 host running XenApp
The performance impact and benefit of NUMA enabled to keep the RAM accessed by a CPU local to its adjacent DIMM bank.
The performance impact of various provisioning mechanisms for VMs: MCS, PVS write cache to disk, PVS write cache to RAM
The performance impact of an increased user idle time to simulate a less than 80+% concurrency of user activity on any given host.

To identify the best XenApp VM config we tried a number of configurations including a mix of 1.5x CPU core oversubscription, fewer very beefy VMs and many less beefy VMs. Important to note here that we based on this on the 10-core Ivy Bridge part E5-2690v2 that features hyperthreading and Turbo boost. These things matter! The highest density and best user experience came with 6 x VMs each outfitted with 5 x vCPUs and 16GB RAM. Of the delivery methods we tried (outlined in the table below), Hyper-V netted the best results regardless of provisioning methodology. We did not get a better density between PVS caching methods but PVS cache in RAM completely removed any IOPS generated against the local disk. I’ll got more into PVS caching methods and results in another post.

Interestingly, of all the scenarios we tested, the native Server 2012 R2 + XenApp combination performed the poorest. PVS streamed to a physical host is another matter entirely, but unfortunately we did not test that scenario. We also saw no benefit from enabling NUMA. There was a time when a CPU accessing an adjacent CPU’s remote memory banks across the interconnect paths hampered performance, but given the current architecture in Ivy Bridge and its fat QPIs, this doesn’t appear to be a problem any longer.

The “Dell Light” workload below was adjusted to account for less than 80% user concurrency where we typically plan for in traditional VDI. Citrix observed that XenApp users in the real world tend to not work all at the same time. Less users working concurrently means freed resources and opportunity to run more total users on a given compute host.

The net of this study shows that the hypervisor and XenApp VM configuration matter more than the delivery mechanism. MCS and PVS ultimately netted the same performance results but PVS can be used to solve other specific problems if you have them (IOPS).

* CPU % for ESX Hosts was adjusted to account for the fact that Intel E5-2600v2 series processors with the Turbo Boost feature enabled will exceed the ESXi host CPU metrics of 100% utilization. With E5-2690v2 CPUs the rated 100% in ESXi is 60000 MHz of usage, while actual usage with Turbo has been seen to reach 67000 MHz in some cases. The Adjusted CPU % Usage is based on 100% = 66000 MHz usage and is used in all charts for ESXi to account for Turbo Boost. Windows Hyper-V metrics by comparison do not report usage in MHz, so only the reported CPU % usage is used in those cases.

** The “Dell Light” workload is a modified VSI workload to represent a significantly lighter type of user. In this case the workload was modified to produce about 50% idle time.

†Avg IOPS observed on disk is 0 because it is offloaded to RAM.

Summary of configuration recommendations:

Enable Hyper-Threading and Turbo for oversubscribed performance gains.
NUMA did not show to have a tremendous impact enabled or disabled.
1.5x CPU oversubscription per host produced excellent results. 20 physical cores x 1.5 oversubscription netting 30 logical vCPUs assigned to VMs.
Virtual XenApp servers outperform dedicated physical hosts with no hypervisor so we recommend virtualized XenApp instances.
Using 10-Core Ivy Bridge CPUs, we recommend running 6 x XenApp VMs per host, each VM assigned 5 x vCPUs and 16GB RAM.
PVS cache in RAM (with HD overflow) will reduce the user IO generated to disk almost nothing but may require greater RAM densities on the compute hosts. 256GB is a safe high water mark using PVS cache in RAM based on a 21GB cache per XenApp VM.

Resources:

Dell Wyse Datacenter for Citrix – Reference Architecture

XenApp/ XenDesktop Core Concepts

Citrix Blogs – XenApp Scalability

Dell Wyse Datacenter for Citrix XenDesktop

Dell DVS Enterprise is now Dell Wyse Datacenter (DWD) and with the brand change we are proud to announce the 6th release of our flagship set of enterprise VDI solutions for Citrix XenDestop. Dell Wyse Datacenter is the only true end to end (E2E) VDI solution suite in the industry providing best of breed hardware and software for any size enterprise. What makes Dell different is that we carefully vet, craft and test our solutions using real-world requirements and workloads making it easier for customers to adopt our solutions. Dell Wyse Datacenter includes a number of platform and scaling options to meet almost any need: rack, blade, local disk, iSCSI, fiber channel, low cost, high performance, high scale…we have a validated and proven answer to solve your VDI business problem.

The full DWD reference architectures for Citrix can be found here:

Core RA

VRTX RA

Graphics RA

New in DWD 6th release:

Citrix XenDesktop 7.1 support
Support for Dell EqualLogic PS6210 series
Dell Wyse Cloud Client updates
High performance SAN-less blade offering
Support for LoginVSI 4
High IOPS persistent desktops using Atlantis ILIO: Link
Citrix vGPU support for graphical acceleration

Citrix XenDesktop 7.1

Architecturally, Citrix XenDesktop has never been simpler. Yes, this is still a potentially very large scale distributed architecture that solves a complicated problem, but the moving parts have been greatly simplified. One of the most important architectural shifts involves the way XenApp is integrated within a XenDesktop Infrastructure. Now the management infrastructures are combined and all that separates shared hosted sessions vs dedicated desktops are where the Virtual Delivery Agent (VDA) is installed. All “XenApp” really means now is that the VDA is installed on an RDSH host, physical or virtual. That’s it! Otherwise, Machine Creation Services (MCS) performance is getting closer to Provisioning Services (PVS) and catalogs/ delivery groups have never been easier to manage.

Support for Dell EqualLogic PS6210 series

Dell EqualLogic has begun shipping the new and improved PS6210 series which most notably provides the DWD Tier 1 iSCSI hybrid array: PS6210XS. Not only are the Base-T and SFP controller ports doubled but so is the performance! We have literally seen a 100% performance improvement on the PS6210XS hosting VDI desktops which will stretch your shared storage dollar and users per rack unit even further.

Dell Wyse Cloud Client updates

DWD E2E solutions provide everything required to design, implement and manage industry leading VDI infrastructures. This of course includes one of the most important aspects which is how the end user actually interfaces with the environment: the client. DWD for Citrix XenDesktop includes Citrix Ready clients to meet any organization need. ThinOS, Windows Embedded, Linux, dual/quad core, BYOD, HDMI, tablet and Chromebook.

One of the most exciting new clients from Dell Wyse is the Cloud Connect. The Cloud Connect turns any HDMI-enabled TV or monitor into a thin client through which to access your VDI desktop. Portable and high performing!

High performance SAN-less blade offering

New in DWD is a high-performance SAN-less blade offering that leverages Citrix MCS. This very simple but high performance design makes use of a pair of 800GB SSDs in each blade to host the mgmt or compute functions. Gold images are stored on each compute blade with a corresponding non-persistent pool spun up on each blade by the Delivery Controller. This configuration is capable of supporting the maximum possible compute host density based on our standard CPU and RAM recommendations. HA is provided as an option for the mgmt layer through the introduction of shared Tier 2 and clustering with a second mgmt blade.

Support for LoginVSI 4

Login Consultants Login VSI is the industry standard VDI load generation tool to simulate actual workloads. Moving from version 3.7 to 4 introduces some changes in the load generated and consequently the densities we see on our VDI solution infrastructures. Mostly notably, version 4 reduces the amount of read/write IO generated but increases the CPU utilization. All of our test results have been updated using the new tool.

High IOPS persistent desktops using Atlantis ILIO

Partnering with Atlantis Computing, we have collaborated on a solution to deliver high IOPS for persistent desktops where disk is typically at a premium. ILIO is a software solution that fits seamlessly into the existing Local Tier 1 model for DWD that makes use of compute host RAM to store and execute desktop VMs. No server medium is faster than RAM and with ILIO’s deduplication ability, only a fraction of the storage that would otherwise be required is necessary. This solution fits easily with the highly reliable and economical 1Gb EqualLogic PS6100E that provides hosting to the mgmt VMs, user data, and persistent desktop data.

Citrix vGPU support for graphical acceleration

Fleshing out the DWD virtualized graphics offerings, we are proud to include the new Citrix vGPU technology which utilizes the NVIDIA Grid boards. For XenDesktop, we offer vDGA (pass-through) and vSGA (shared) graphics on vSphere and now vGPU on Citrix XenServer. vGPU enables a hybrid variety of graphic acceleration by sharing the on-board GPUs while also enabling OpenGL capabilities by running graphics drivers in the VMs themselves. This is done through vGPU profiles that are targeted to specific use cases and maximum Grid board capabilities. The higher the resources consumed with higher resolutions = fewer users per Grid board.

High level vGPU architecture:

Stay tuned for more exciting VDI products from Dell coming late this spring. Citrix Synergy is right around the corner. In the meantime please check out our RAs and thanks for stopping by!

DVS Enterprise for Citrix – RA Updates!

We finally have the past several months of work published in our latest RAs. I posted the VRTX + XenDesktop RA earlier which was part of this release but made it out the door first.

Reference Architecture for DVS Enterprise for Citrix XenDesktop 7: Link

Notable new features included in this release:

Intel Ivy Bridge CPU support for all servers (E5-2600 v2) with increased user densities

Intel Ivy Bridge CPUs enable greater performance capability in the existing 12G Dell PowerEdge server line. More cores and faster RAM clocks are now available to get more out of every dollar invested in your VDI deployment. Using Hyper-v 2012 with PVS on XenDesktop 7 we are now seeing user densities of up to 200 concurrent per compute host! It should be said that we tested with the new 1866MHz DIMMs and found no benefit to justify the cost increase over the 1600MHz DIMMs.

Dell PowerEdge M IO Aggregator added to extended SCL replacing M8024-K modules

The IO Aggregator modules are used in the A frabic of the M620 blade-based solutions for 10Gb connectivity (internal + uplink). They can be used to provide 20Gb of iSCSI bandwidth to each blade or can be sliced into virtual NICs for LAN traffic, using NPAR, for fiber channel solutions.

Update to Dell Compellent Tier 1 scaling and configuration (SCOS 6.3)

SCOS 6.3 brings with it some exciting performance enhancements to the Dell Compellent SC8000 controller architecture. We saw a 200% performance improvement over the legacy 6.2 code bringing the DVS recommended Compellent array to support upwards of 2000 users. Keep in mind that this is pure SAS, no SSD or fancy flash, and that’s not the top end for this array either. We’ll be pushing this array further in the months to come. This update adds 2 additional shelves to the existing 1000 user offering and is capable of booting them all on-demand in less than an hour.

Dell Wyse Cloud Client updates

Citrix Ready and certified for use with HDX to achieve the best end user experience possible.

Dell PowerEdge VRTX offering for ROBO: LINK

Dell’s award-winning and industry recognized consolidated infrastructure used to host up to 500 VDI users at a price that can’t be beat.

Shared graphics (vSGA) on NVIDIA Grid cards: LINK

We’ve expanded our graphic acceleration offering to include vSGA (shared mode) in addition to vDGA (pass-through mode) on VMware vSphere. vSGA offers a higher per compute host user density by virtualizing the GPU cores.

Microsoft Lync 2013 test results and sizing: LINK

Unified Communications in VDI can be tricky. Dell DVS has put Lync 2013 through its paces in the XenDesktop environment and demonstrates what can be expected from a sizing and performance perspective.

Citrix XenDesktop on Dell VRTX

Dell Desktop Virtualization Solutions/ Cloud Client Computing is proud to announce support for the Dell VRTX on DVS Enterprise for Citrix XenDesktop. Dell VRTX is the new consolidated platform that combines blade servers, storage and networking into a single 5U chassis that we are enabling for the Remote Office/ Branch Office (ROBO) use case. There was tremendous interest in this platform at both the Citrix Synergy and VMworld shows this year!

Initially this solution offering will be available running on vSphere with Hyper-V following later, ultimately following a similar architecture to what I did in the Server 2012 RDS solution. The solution can be configured in either 2 or 4 M620 blades, Shared Tier 1 (cluster in a box), with 15 or 25 local 2.5” SAS disks. 10 x 15K SAS disks are assigned to 2 blades for Tier 1 or 20 disks for 4 blades. Tier 2 is handled via 5 x 900GB 10K SAS. A high-performance shared PERC is behind the scenes accelerating IO and enabling all disk volumes to be shared amongst all hosts.
Targeted sizing for this solution is 250-500 XenDesktop (MCS) users with HA options available for networking. Since all blades will be clustered, all VDI sessions and mgmt components will float across all blades to provide redundancy. The internal 8-port 1Gb switch can connect up to existing infrastructure or a Force10 switch can be added ToR. Otherwise there is nothing externally required for the VRTX VDI solution, other than the end user connection device. We of course recommend Dell Wyse Cloud Clients to suit those needs.

For the price point this one is tough to beat. Everything you need for up to 500 users in a single box! Pretty cool.
Check out the Reference Architecture for Dell VRTX and Citrix XenDesktop: Link

Unidesk: Layered VDI Management

VDI is one of the most intensive workloads in the datacenter today and by nature uses every major component of the enterprise technology stack: networking, servers, virtualization, storage, load balancing. No stone is left unturned when it comes to enterprise VDI. Physical desktop management can also be an arduous task with large infrastructure requirements of its own. The sheer complexity of VDI drives a lot of interesting and feverish innovation in this space but also drives a general adoption reluctance for some who fear the shift too burdensome for their existing teams and datacenters. The value proposition Unidesk 2.0 brings to the table is a simplification of the virtual desktops themselves, simplified management of the brokers that support them, and comprehensive application management .

The Unidesk solution plugs seamlessly into a new or existing VDI environment and is comprised of the following key components:

Management virtual appliance
Master CachePoint
Secondary CachePoints
Installation Machine

Solution Architecture

At its core, Unidesk is a VDI management solution that does some very interesting things under the covers. Unidesk requires vSphere at the moment but can manage VMware View, Citrix XenDesktop, Dell Quest vWorkspace, or Microsoft RDS. You could even manage each type of environment from a single Unidesk management console if you had the need or proclivity. Unidesk is not a VDI broker in and of itself, so that piece of the puzzle is very much required in the overall architecture. The Unidesk solution works from the concept of layering, which is increasingly becoming a hotter topic as both Citrix and VMware add native layering technologies to their software stacks. I’ll touch on those later. Unidesk works by creating, maintaining, and compositing numerous layers to create VMs that can share common items like base OS and IT applications, while providing the ability to persist user data including user installed applications, if desired. Each layer is stored and maintained as a discrete VMDK and can be assigned to any VM created within the environment. Application or OS layers can be patched independently and refreshed to a user VM. Because of Unidesk’s layering technology, customers needing persistent desktops can take advantage of capacity savings over traditional methods of persistence. A persistent desktop in Unidesk consumes, on average, a similar disk footprint to what a non-persistent desktop would typically consume.

CachePoints (CP) are virtual appliances that are responsible for the heavy lifting in the layering process. Currently there are two distinct types of CachPoints: Master and Secondary. The Master CP is the first to be provisioned during the setup process and maintains the primary copy of all layers in the environment. Master CPs replicate the pertinent layers to Secondary CPs who have the task of actually combining layers to build the individual VMs, a process called Compositing. Due to the role played by each CP type, the Secondary CPs will need to live on the Compute hosts with the VMs they create. Local or Shared Tier 1 solution models can be supported here, but the Secondary CPs will need to be able to the “CachePoint and Layers” volume at a minimum.

The Management Appliance is another virtual machine that comes with the solution to manage the environment and individual components. This appliance provides a web interface used to manage the CPs, layers, images, as well as connections to the various VDI brokers you need to interface with. Using the Unidesk management console you can easily manage an entire VDI environment almost completely ignoring vCenter and the individual broker management GUIs. There are no additional infrastructure requirements for Unidesk specifically outside of what is required for the VDI broker solution itself.

Installation Machines are provided by Unidesk to capture application layers and make them available for assignment to any VM in the solution. This process is very simple and intuitive requiring only that a given application is installed within a regular VM. The management framework is then able to isolate the application and create it as an assignable layer (VMDK). Many of the problems traditionally experienced using other application virtualization methods are overcome here. OS and application layers can be updated independently and distributed to existing desktop VMs.

Here is an exploded and descriptive view of the overall solution architecture summarizing the points above:

Storage Architecture

The Unidesk solution is able to leverage three distinct storage tiers to house the key volumes: Boot Images, CachePoint and Layers, and Archive.

Boot Images – Contains images having very small footprints and consist of a kernel and pagefile used for booting a VM. These images are stored as VMDKs, like all other layers, and can be easily recreated if need be. This tier does not require high performance disk.
CachePoint and Layers – This tier stores all OS, application, and personalization layers. Of the three tiers, this one sees the most IO so if you have high performance disk available, use it with this tier.
Archive – This tier is used for layer backup including personalization. Repairs and restored layers can be pulled from the archive and placed into the CachePoint and Layers volume for re-deployment, if need be. This tier does not require high performance disk.

The Master CP stores layers in the following folder structure, each layer organized and stored as a VMDK.

Installation and Configuration

New in Unidesk 2.x is the ability to execute a completely scripted installation. You’ll need to decide ahead of time what IPs and names you want to use for the Unidesk management components as these are defined during setup. This portion of the install is rather lengthy to it’s best to have things squared away before you begin. Once the environment variables are defined, the setup script takes over and builds the environment according to your design.

Once setup has finished, the Management appliance and Master CP will be ready, so you can log into the mgmt console to take the configuration further. Of the initial key activities to complete will be setting up an Active Directory junction point and connecting Unidesk to your VDI broker. Unidesk should already be talking to your vCenter server at this point.

Your broker mgmt server will need to have the Unidesk Integration Agent installed which you should find in the bundle downloaded with the install. This agent listens on TCP 390 and will connect the Unidesk management server to the broker. Once this agent is installed on the VMware View Connection Server or Citrix Desktop Delivery Controller, you can point the Unidesk management configuration at it. Once synchronized all pool information will be visible from the Unidesk console.

A very neat feature of Unidesk is that you can build many AD junction points from different forests if necessary. These junction points will allow Unidesk to interact with AD and provide the ability to create machine accounts within the domains.

Desktop VM and Application Management

Once Unidesk can talk to your vSphere and VDI environments, you can get started building OS layers which will serve as your gold images for the desktops you create. A killer feature of the Unidesk solution is that you only need a single gold image per OS type even for numerous VDI brokers. Because the broker agents can be layered and deployed as needed, you can reuse a single image across disparate View and XenDesktop environments, for example. Setting up an OS layer simply points Unidesk at an existing gold image VM in vCenter and makes it consumable for subsequent provisioning.

Once successfully created, you will see your OS layers available and marked as deployable.

Before you can install and deploy applications, you will need to deploy a Unidesk Installation Machine which is done quite simply from the System page. You should create an Installation Machine for each type of desktop OS in your environment.

Once the Installation Machine is ready, creating layers is easy. From the Layers page, simply select “Create Layer,” fill in the details, choose the OS layer you’ll be using along with the Installation machine and any prerequisite layers.

To finish the process, you’ll need to log into the Installation Machine, perform the install, then tell the Unidesk management console when you’re finished and the layer will be deployable to any VM.

Desktops can now be created as either persistent of non-persistent. You can deploy to already existing pools or if you need a new persistent pool created, Unidesk will take care of it. Choose the type of OS template to deploy (XP or Win7), select the connected broker to which you want to deploy the desktops, choose an existing pool or create a new one, and select the number of desktops to create.

Next select the CachePoint that will deploy the new desktops along with the network they need to connect to and the desktop type.

Select the OS layer that should be assigned to the new desktops.

Select the application layers you wish to assign to this desktop group. All your layers will be visible here.

Choose the virtual hardware, performance characteristics and backup frequency (Unidesk Archive) of the desktop group you are deploying.

Select an existing or create a new maintenance schedule that defines when layers can be updated within this desktop group.

Deploy the desktops.

Once the creation process is underway, the activity will be reflected under the Desktops page as well as in vCenter tasks. When completed all desktops will be visible and can be managed entirely from the Unidesk console.

Sample Architecture

Below are some possible designs that can be used to deploy Unidesk into a Local or Shared Tier 1 VDI solution model. For Local Tier 1, both the Compute and Management hosts will need access to shared storage, even though VDI sessions will be hosted locally on the Compute hosts. 1Gb PowerConnect or Force10 switches can be used in the Network layer for LAN and iSCSI. The Unidesk boot images should be stored locally on the Compute hosts along with the Secondary CachePoints that will host the sessions on that host. All of the typical VDI management components will still be hosted on the Mgmt layer hosts along with the additional Unidesk management components. Since the Mgmt hosts connect to and run their VMs from shared storage, all of the additional Unidesk volumes should be created on shared storage. Recoverability is achieved primarily in this model through use of the Unidesk Archive function. Any failed Compute host VDI session information can be recreated from the Archive on a surviving host.

Here is a view of the server network and storage architecture with some of the solution components broken out:

For Shared Tier 1 the layout is slightly different. The VDI sessions and “CachePoint and Layers” volumes must live together on Tier 1 storage while all other volumes can live on Tier 2. You could combine the two tiers for smaller deployments, perhaps, but your mileage will vary. Blades are also an option here, of course. All normal vSphere HA options apply here with the Unidesk Archive function bolstering the protection of the environment.

Unidesk vs. the Competition

Both Citrix and VMware have native solutions available for management, application virtualization, and persistence so you will have to decide if Unidesk if worth the price of admission. On the View side, if you buy a Premier license, you get ThinApp for applications, Composer for non-persistent linked clones, and soon the technology from VMware’s recent Wanova acquisition will be available. The native View persistence story isn’t great at the moment, but Wanova Mirage will change that when made available. Mirage will add a few layers to the mix including OS, apps, and persistent data but will not be as granular as the multi-layer Unidesk solution. The Wanova tech notwithstanding, you should be able to buy a cheaper/ lower level View license as with Unidesk you will need neither ThinApp nor Composer. Unidesk’s application layering is superior to ThinApp, with little in the way of applications that cannot be layered, and can provide persistent or non-persistent desktops with almost the same footprint on disk. Add to that the Unidesk single management pane for both applications and desktops, and there is a compelling value to be considered.

On the Citrix side, if you buy an Enterprise license, you get XenApp for application virtualization, Provisioning Services (PVS) and Personal vDisk (PVD) for persistence from the recent RingCube acquisition. With XenDesktop you can leverage Machine Creation Services (MCS) or PVS for either persistent or non-persistent desktops. MCS is deadly simple while PVS is incredibly powerful but an extraordinary pain to set up and configure. XenApp builds on top of Microsoft’s RDS infrastructure and requires additional components of its own such as SQL Server. PVD can be deployed with either catalog type, PVS or MCS, and adds a layer of persistence for user data and user installed applications. While PVD provides only a single layer, that may be more than suitable for any number of customers. The overall Citrix solution is time tested and works well although the underlying infrastructure requirements are numerous and expensive. XenApp offloads application execution from the XenDesktop sessions which will in turn drive greater overall host densities. Adding Unidesk to a Citrix stack again affords a customer to buy in at a lower licensing level, although Citrix is seemingly removing value for augmenting its software stack by including more at lower license levels. For instance, PVD and PVS are available at all licensing levels now. The big upsell now is for the inclusion of XenApp. Unidesk removes the need for MCS, PVS, PVD, and XenApp so you will have to ask yourself if the Unidesk approach is preferred to the Citrix method. The net result will certainly be less overall infrastructure required but net licensing costs may very well be a wash.

Enterprise VDI

My name is Peter and I am the Principal Engineering Architect for Desktop Virtualization at Dell. 🙂

VDI is a fire hot topic right now and there are many opinions out there on how to approach it. If you’re reading this I probably don’t need to sell you on the value of the concept but more and more companies are deploying VDI instead of investing in traditional PC refreshes. All trending data points to this shift only going up over the next several years as the technology gets better and better. VDI is a very interesting market segment as it encompasses the full array of cutting edge enterprise technologies: network, servers, storage, virtualization, database, web services, highly distributed software architectures, high-availability, and load balancing. Add high capacity and performance requirements to the list and you have a very interesting segment indeed! VDI is also constantly evolving with a very rich ecosystem of players offering new and interesting technologies to keep up with. This post will give you a brief look at the enterprise VDI offering from Dell.
As a customer, and just 1 year ago I still was one, it’s very easy to get caught up in the marketing hype making it difficult to realize the true value of product or platform. With regard to VDI, we are taking a different approach at Dell. Instead of trying to lure you with inconceivable and questionable per server user densities, we have decided to take a very honest and realistic approach in our solutions. I’ll explain this in more detail later.
Dell currently offers 2 products in the VDI space: Simplified, which is the SMB-focused VDI-in-a-box appliance I discussed here (link), and Enterprise which can also start very small but has much longer legs to scale to suit a very large environment. I will be discussing the Enterprise platform in this post which is where I spend the majority of my time. In the resources section at the bottom of this posting you will find links to 2 reference architectures that I co-authored. They serve as the basis for this article.

DVS Enterprise

Dell DVS Enterprise is a multi-tiered turnkey solution comprised of rack or blade servers, iSCSI or FC storage built on industry leading hypervisors, software and VDI brokers. We have designed DVS Enterprise to encompass tons of flexibility to meet any customer need and can suit 50-50,000 users. As apposed to the more rigid “block” type products, our solutions are tailored to the customer to provide exactly what is needed with flexibility for leveraging existing investments in network, storage, and software.
The solution stacks consist of 4 primary tiers: network, compute, management, and storage. Network and storage can be provided by the customer, given the existing infrastructure meets our design and performance requirements. The Compute tier is where the VDI sessions execute, whether running on local or shared storage. The management tier is where VDI broker VMs and supporting infrastructure run. These VMs run off of shared storage in all solutions so management tier hosts can always be clustered to provide HA. All tiers, while inextricably linked, can scale independently.

The DVS Enterprise portfolio consists of 2 primary solution models: “Local Tier 1” and “Shared Tier 1”. DVS Engineering spends considerable effort validating and characterizing core solution components to ensure your VDI implementation will perform as it is supposed to. Racks, blades, 10Gb networking, Fiber Channel storage…whatever mix of ingredients you need, we have it. Something for everyone.

Local Tier 1

“Tier 1” in the DVS context defines from which disk source the VDI sessions execute and is therefore faster and higher performing disk. Local Tier1 applies only to rack servers (due to the amount of disk required) while Shared Tier 1 can be rack or blade. Tier 2 storage is present in both solution architectures and, while having a reduced performance requirement, is utilized for user profile/data and management VM execution. The graphic below depicts the management tier VMs on shared storage while the compute tier VDI sessions are on local server disk:

This local Tier 1 Enterprise offering is uniquely Dell as most industry players focus solely on solutions revolving around shared storage. The value here is flexibility and that you can buy into high performance VDI no matter what your budget is. Shared Tier 1 storage has its advantages but is costly and requires a high performance infrastructure to support it. The Local Tier 1 solution is cost optimized and only requires 1Gb networking.

Network

We are very cognizant that network can be a touchy subject with a lot of customers pledging fierce loyalty to the well-known market leader. Hey I was one of those customers just a year ago. We get it. That said, a networking purchase from Dell is entirely optional as long you have suitable infrastructure in place. From a cost perspective, PowerConnect provides strong performance at a very attractive price point and is the default option in our solutions. Our premium Force10 networking product line is positioned well to compete directly with the market leader from top of rack (ToR) to large chassis-based switching. Force10 is an optional upgrade in all solutions. For the Local Tier 1 solution, a simple 48-port 1Gb switch is all that is required, the PC6248 is shown below:

Servers

The PowerEdge R720 is a solid rack server platform that suits this solution model well with up to 2 x 2.9Ghz 8-core CPUs, 768GB RAM, and 16 x 2.5” 15K SAS drives. There is more than enough horsepower in this platform to suit any VDI need. Again, flexibility is an important tenet of Dell DVS so other server platforms can be used if desired to meet specific needs.

Storage

A shared Tier 2 storage purchase from Dell is entirely optional in the Local Tier 1 solution but is a required component of the architecture. The Equallogic 4100X is a solid entry level 1Gb iSCSI storage array that can be configured to provide up to 22TB of raw storage running on 10k SAS disks. You can of course go bigger to the 6000 series in Equallogic or integrate a Compellent array with your choice of storage protocol. It all depends on your need to scale.

Shared Tier 1

In the Shared Tier 1 solution model, an additional shared storage array is added to handle the execution of the VDI sessions in larger scale deployments. Performance is a key concern in the shared Tier 1 array and contributes directly to how the solution scales. All Compute and Mgmt hosts in this model are diskless and can be either rack or blade. In smaller scale solutions, the functions of Tier 1 and Tier 2 can be combined as long as there is sufficient capacity and performance on the array to meet the needs of the environment.

Network

The network configuration changes a bit in the shared Tier 1 model depending if you are using rack or blades and what block storage protocol you employ. Block storage traffic should be separated from LAN so iSCSI will leverage a discrete 10Gb infrastructure while fiber channel will leverage an 8Gb fabric. The PowerConnect 8024F is a 10Gb SFP+ based switch used for iSCSI traffic destined to either Equallogic or Compellent storage that can be stacked to scale. The fiber channel industry leader Brocade is used for FC fabric switching.

In the blade platform, each chassis has 3 available fabrics that can be configured with Ethernet, FC, or Infiniband switching. In DVS solutions, the chassis is configured with the 48-port M6348 switch interconnect for LAN traffic and either Brocade switches for FC or a pair of 10Gb 8024-K switches for iSCSI. Ethernet-based chassis switches are stacked for easy management.

Servers

Just like the Local Tier 1 solution, the R720 can be used if rack servers are desired or the half-height dual-socket M620 if blades are desired. The M620 is on par to the R720 in all regards except for disk capacity and top end CPU. The R720 can be configured with a higher 2.9Ghz 8-core CPU to leverage greater user density in the compute tier. The M1000E blade chassis can support 16 half-height blades.

Storage

Either Equallogic or Compellent arrays can be utilized in the storage tier. The performance demands of Tier 1 storage in VDI are very high so design considerations dealing with boot storms and steady-state performance are critical. Each Equallogic array is a self-contained iSCSI storage unit with an active/passive controller pair that can be grouped with other arrays to be managed. The 6110XS, depicted below, is a hybrid array containing a mix of high performance SSD and SAS disks. Equallogic’s active tiering technology dynamically moves hot and cold data between tiers to ensure the best performance at all times. Even though each controller now only has a single 10Gb port, vertical port sharing ensures that a controller port failure does not necessitate a controller failover.

Compellent can also be used in this space and follows a more traditional linear scale. SSDs are used for “hot” storage blocks especially boot storms, and 15K SAS disks are used to store the cooler blocks on dense storage. To add capacity and throughput additional shelves are looped into the array architecture. Compellent has its own auto-tiering functionality that can be scheduled off hours to rebalance the array from day to day. It also employs a mechanism that puts the hot data on the outer ring of the disk platters where they can be read easily and quickly. High performance and redundancy is achieved through an active/active controller architecture. The 32-bit Series 40 controller architecture is soon to be replaced by the 64-bit SC8000 controllers, alleviating the previous x86-based cache limits.

Another nice feature about Compellent is its inherent flexibility. The controllers are flexible like servers allowing you to install the number and type of IO cards you require: FC, iSCSI, FCoE, and SAS for the backend… Need more front-end bandwidth or add another backend SAS loop? Just add the appropriate card to the controller.

In the lower user count solutions, Tier 1 and Tier 2 storage functions can be combined. In the larger scale deployments these tiers should be separated and scale independently.

VDI Brokers

Dell DVS currently supports both VMware View 5 and Citrix XenDesktop 5 running on top of the vSphere 5 hypervisor. All server components run Windows Server 2008 R2 and database services provided by SQL Server 2008 R2. I have worked diligently to create a simple, flexible, unified architecture that expands effortlessly to meet the needs of any environment.

Choice of VDI broker generally lands on customer preference, while each solution has its advantages and disadvantages. View has a very simple backend architecture consisting of 4 essential server roles: SQL, vCenter, View Connection Server (VCS) and Composer. Composer is the secret sauce that provides the non-persistent linked clone technology and is installed on the vCenter server. One downside to this is that because of Composer’s reliance on vCenter, the total number of VMs per vCenter instance is reduced to 2000, instead of the published 3000 per HA cluster in vSphere 5. This means that you will have multiple vCenter instances depending on how large your environment is. The advantage to View is scaling footprint, as 4 management hosts are all that is required to serve a 10,000 user environment. I wrote about View architecture design previously for version 4 (link).

View Storage Accelerator (VSA), officially supported in View 5.1, is the biggest game changing feature in View 5.x thus far. VSA changes the user workload IO profile, thus reducing the number of IOPS consumed by each user. VSA provides the ability to enable a portion of the host server’s RAM to be used for host caching, largely absorbing read IOs. This reduces the demand of boot storms as well as makes the tier 1 storage in use more efficient. Before VSA there was a much larger disparity between XenDesktop and View users in terms of IOPS, now the gap is greatly diminished.
View can be used with 2 connection protocols, the proprietary PCoIP protocol or native RDP. PCoIP is an optimized protocol intended to provide a greater user experience through richer media handling and interaction. Most users will probably be just fine running RDP as PCoIP has a greater overhead that uses more host CPU cycles. PCoIP is intended to compete head on with the Citrix HDX protocol and there are plenty of videos running side by side comparisons if you’re curious. Below is the VMware View logical architecture flow:

XenDesktop (XD), while similar in basic function, is very different from View. Let’s face it, Citrix has been doing this for a very long time. Client virtualization is what these guys are known for and through clever innovation and acquisitions over the years they have managed to bolster their portfolio as the most formidable client virtualization player in this space. A key difference between View and XD is the backend architecture. XD is much more complex and requires many more server roles than View which affects the size and scalability of the management tier. This is very complex software so there are a lot of moving parts: SQL, vCenter, license server, web interfaces, desktop delivery controllers, provisioning servers… there are more pieces to account for that all have their own unique scaling elements. XD is not as inextricably tied to vCenter as View is so a single instance should be able to support the published maximum number of sessions per HA cluster.

One of the neat things about XD is that you have a choice in desktop delivery mechanisms. Machine Creation Services (MCS) is the default mechanism provided in the DDC. At its core this provides a dead simple method for provisioning desktops and functions very similarly to View in this regard. Citrix recommends using MCS only for 5000 or fewer VDI sessions. For greater than 5000 sessions, Citrix recommends using their secret weapon: Provisioning Server (PVS). PVS provides the ability to stream desktops to compute hosts using gold master vDisks, customizing the placement of VM write-caches, all the while reducing the IO profile of the VDI session. PVS leverages TFTP to boot the VMs from the master vDisk. PVS isn’t just for virtual desktops either, it can also be used for other infrastructure servers in the architecture such as XenApp servers and provides dynamic elasticity should the environment need to grow to meet performance demands. There is no PVS equivalent on the VMware side of things.
With Citrix’s recent acquisition and integration of RingCube in XD, there are now new catalog options available for MCS and PVS in XD 5.6: pooled with personal vDisk or streamed with personal vDisk. The personal vDisk (PVD) is disk space that can be dedicated on a per user basis for personalization information, application data ,etc. PVD is intended to provide a degree of end user experience persistence in an otherwise non-persistent environment. Additional benefits of XD include seamless integration with XenApp for application delivery as well as the long standing benefits of the ICA protocol: session reliability, encrypted WAN acceleration, NetScaler integration, etc. Below is the Citrix XenDesktop logical architecture flow:

High Availability

HA is provided via several different mechanisms across the solution architecture tiers. In the network tier HA is accomplished through stacking switches whether top of rack (ToR) or chassis-based. Stacking functionally unifies an otherwise segmented group of switches so they can be managed as a single logical unit. Discrete stacks should be configured for each service type, for example a stack for LAN traffic and a stack for iSCSI traffic. Each switch type has its stacking limits so care has been taken to ensure the proper switch type and port count to meet the needs of a given configuration.

Load balancing is provided via native DNS in smaller stacks, especially for file and SQL, and moves into a virtual appliance based model over 1000 users. NetScaler VPX or F5 LTM-VE can be used to load balance larger environments. NetScalers are sized based on required throughput as each appliance can manage millions of concurrent TCP sessions.
Protecting the compute tier differs a bit between the local and shared tier 1 solutions, as well as between View and XenDesktop. In the local tier 1 model there is no share storage in the compute tier, so vSphere HA can’t help us here. With XD, PVS can provide HA functionality by controlling the placement of VDI VMs from a failed host to a hot standby.

The solution for View is not quite so elegant in the local tier 1 model as there is no mechanism to automatically move VMs from a failed host. What we can do though is mimic HA functionality by manually creating a resource reservation on each compute host. This creates a manual RAID type of model where there is reserve capacity to host a failed server’s VDI sessions.

In the shared tier 1 model, the compute tier has shared storage so we can take full advantage of vSphere HA. This also applies to the management tier in all solution models. There are a few ways to go here when configuring admission control. Thankfully there are now more options than only calculating slot sizes and overhead. The simplest way to go is specifying a hot standby for dedicated failover. The downside is that you will have gear sitting idle. If that doesn’t sit well with you then you could specify a percentage of cluster resources to reserve. This will thin the load running on each host in the cluster but at least won’t waste resources entirely.

If the use of DRS is desired, care needs to be taken in large scale scenarios as this technology will functionally limit each HA cluster to 1280 VMs.

Protection for the storage tier is relatively straight forward as each storage array has its own built-in protections for controllers and RAID groups. In smaller solution stacks (under 1000 users) a file server VM is sufficient to host user data, profiles, etc. We recommend that for deployments larger than 1000 users that NAS be leveraged to provide this service. Our clustered NAS solutions for both Equallogic and Compellent are high performing and scalable to meet the needs of very large deployments. That said, NAS is available as an HA option at any time, for any solution size.

Validation

The Dell DVS difference is that our solutions are validated and characterized around real-world requirements and scenarios. Everyone that competes in this space plays the marketing game but we actually put our solutions through their paces. Everything we sell in our core solution stacks has been configured, tested, scrutinized, optimized, and measured for performance at all tiers in the solution. Additional consulting and blue printing services are available to help customers properly size VDI for their environments by analyzing user workloads to build a custom solution to meet those needs. Professional services is also available to stand up and support the entire solution.

The DVS Enterprise solution is constantly evolving with new features and options coming this summer. Keep an eye out here for more info on the latest DVS offerings as well as discussions on the interesting facets of VDI.

References:

Dell DVS: VMware View Reference Architecture

Dell DVS: Citrix XenDesktop Reference Architecture