Networking – Exit | the | Fast

Dell XC Series 2.0: Product Architectures

Following our launch of the new 13G-based XC series platform, I present our product architectures for the VDI-specific use cases. Of the platforms available, this use case is focused on the extremely powerful 1U XC630 with Haswell CPUs and 2133MHz RAM. We offer these appliances on both Server 2012 R2 Hyper-V and vSphere 5.5 U2 with Citrix XenDesktop, VMware Horizon View, or Dell vWorkspace. All platform architectures have been optimized, configured for best performance and documented.

Platforms

We have three platforms to choose from optimized around cost and performance, all being ultimately flexible should specific parameters need to change. The A5 model is the most cost effective leveraging 8-core CPUs, 256GB RAM 2 x 200GB SSDs for performance and 4 x 1TB HDDs for capacity. For POCs, small deployments or light application virtualization, this platform is well suited. The B5 model steps up the performance by adding four cores per socket, increasing the RAM density to 384GB and doubling the performance tier to 2 x 400GB SSDs. This platform will provide the best bang for the buck on medium density deployments of light or medium level workloads. The B7 is the top of the line offering 16-core CPUs and a higher capacity tier of 6 x 1TB HDDs. For deployments requiring maximum density of knowledge or power user workloads, this is the platform for you.

At 1U with dual CPUs, 24 DIMM slots and 10 drive bays…loads of potential and flexibility!

Solution Architecture

Utilizing 3 platform hardware configurations, we are offering 3 VDI solutions on 2 hypervisors. Lots of flexibility and many options. 3 node cluster minimum is required with every node containing a Nutanix Controller VM (CVM) to handle all IO. The SATADOM is present for boot responsibilities to host the hypervisor as well as initial setup of the Nutanix Home area. The SSDs and NL SAS disks are passed through directly to each CVM which straddle the hypervisor and hardware. Every CVM contributes its directly-attached disks to the storage pool which is stretched across all nodes in the Nutanix Distributed File System (NDFS) cluster. NDFS is not dependent on the hypervisor for HA or clustering. Hyper-V cluster storage pools will present SMB version 3 to the hosts and vSphere clusters will be presented with NFS. Containers can be created within the storage pool to logically separate VMs based on function. These containers also provide isolation of configurable storage characteristics in the form of dedupe and compression. In other words, you can enable compression on your management VMs within their dedicated container, but not on your VDI desktops, also within their own container. The namespace is presented to the cluster in the form of \\NDFS_Cluster_name\container_name.
The first solution I’ll cover is Dell’s Wyse vWorkspace which supports either 2012 R2 Hyper-V or vSphere 5.5. For small deployments or POCs we offer this solution in a “floating mgmt” architecture which combines the vWorkspace infrastructure management roles and VDI desktops or shared session VMs. vWorkspace and Hyper-V enables a special technology for non-persistent/ shared image desktops called Hyper-V Catalyst which includes 2 components: HyperCache and HyperDeploy. Hyper-V Catalyst provides some incredible performance boosts and requires that the vWorkspace infrastructure components communicate directly with the hyper-V hypervisor. This also means that vWorkspace does not require SCVMM to perform provisioning tasks for non-persistent desktops!

HyperCache – Provides virtual desktop performance enhancement by caching parent VHDs in host RAM. Read requests are satisfied from cache including requests from all subsequent child VMs.
HyperDeploy – Provides instant cloning of parent VHDs massively diminishing virtual desktop pool deployment times.

You’ll notice the HyperCache components included on the Hyper-V architectures below. 3 to 6 hosts in the floating management model, depicted below with management, desktops and RDSH VMs logically separated only from a storage container perspective by function. The recommendation of 3-7 RDSH VMs is based our work optimizing around NUMA boundaries. I’ll dive deeper into that in an upcoming post. The B7 platform is used in the architectures below.

Above ~1000 users we recommend the traditional distributed management architecture to enable more predictable scaling and performance of both the compute and management hosts. The same basic architecture is the same and scales to the full extent supported by the hypervisor, is this case Hyper-V which supports up to 64 hosts. NDFS does not have a scaling limitation so several hypervisor clusters can be built within a single contiguous NDFS namespace. Our recommendation is to then build independent Failover Clusters for compute and management discretely so they can scale up or out independently.
The architecture below depicts a B7 build on Hyper-V applicable to Citrix XenDesktop or Wyse vWorkspace.

This architecture is relatively similar for Wyse vWorkspace or VMware Horizon View on vSphere 5.5 U2 but fewer total compute hosts per HA cluster, 32 total. For vWorkspace, Hyper-V Catalyst is not present in this scenario so vCenter is required to perform desktop provisioning tasks.

For the storage containers, the best practice of less is more still stands. If you don’t need a particular feature don’t enable it, as it will consume additional resources. Deduplication is always recommended on the performance tier since the primary OpLog lives on SSD and will always benefit. Dedupe or compression on the capacity tier is not recommended, of course you absolutely need it. And if you do prepare to increase each CVM RAM allocation to 32GB.

Container	Purpose	Replication Factor	Perf Tier Deduplication	Capacity Tier Deduplication	Compression
Ds_compute	Desktop VMs	2	Enabled	Disabled	Disabled
Ds_mgmt	Mgmt Infra VMs	2	Enabled	Disabled	Disabled
Ds_rdsh	RDSH Server VMs	2	Enabled	Disabled	Disabled

Network Architecture

As a hyperconverged appliance, the network architecture leverages the converged model. A pair of 10Gb NICs minimum in each node handle all traffic for the hypervisor, guests and storage operations between CVMs. Remember that the storage of all VMs is kept local to the host to which the VM resides, so the only traffic that will traverse the network is LAN and replication. There is no need to isolate storage protocol traffic when using Nutanix.
Hyper-V and vSphere are functionally similar. For Hyper-V there are 2 vSwitches per host, 1 external that aggregates all of the services of the host management OS as well as the vNICs for the connected VMs. The 10Gb NICs are connected to a LBFO team configured in Dynamic Mode. The CVM alone connects to a private internal vSwitch so it can communicate with the hypervisor.

In vSphere it’s the same story but with the concept of Port Groups and vMotion.

We have tested the various configurations per our standard processes and documented the performance results which can be found in the link below. These docs will be updated as we validate additional configurations.

Product Architectures for 13G XC launch:

http://bit.ly/DellXCLib

Resources:

About Wyse vWorkspace HyperCache
About Wyse vWorkspace HyperDeploy
SJbWUYl4LRCwFNcatHuG

vSphere 5.5: Navigating the Web Client

As I called out in my vSphere 5.5 upgrade post, the vSphere Client is now deprecated in 5.5 so in preparation of an inevitable future, I’m forcing myself to use the Web Client to gain familiarity. Turns out there was way more moving around than I initially thought so I’m selfishly documenting a few pertinent items that seemed less than intuitive to me my first time through. Some things are just easier to do in the legacy vSphere Client, or maybe I’m just too accustomed after 3 generations of ESX/i. In any case, I encourage you to use the web client as well and hopefully these tips will help.

Topics covered in this post:

How to configure iSCSI software adapters
How to add datastores
How to manage multipathing
How to rename an ESXi host
Cool changes in Recent Tasks pane
Traffic Shaping
Deploying vCenter Operations Manager (vCOps)

How to configure iSCSI software adapters:

This assumes that the preliminary steps of setting up your storage array and requisite physical networking have already been properly completed. The best and easiest way to do this is via dedicated switches and server NICs for iSCSI in a Layer2 switch segment. Use whatever IP scheme you like, this should be a closed fabric and there is no reason to route this traffic.

First things first, if you don’t have a software iSCSI adapter created on your hosts, create one in the Storage Adapters section of Storage Management for a particular ESXi host. Once created, it will appear in the list below. A quick note on software vs hardware iSCSI initiators. Physical initiators can generally do iSCSI offload OR jumbo frames. not both. We have seen the use of jumbo frames to be more impactful to performance than iSCSI offload, so software initiators with jumbo frames enabled is the preferred way to go here.

Click over to the Networking tab and create a new vSwitch with a VMkernel Network Adapter for iSCSI.

Choose the physical adapters to be used in this vSwitch, create a useful network Port Group label such as iSCSI-1 and assign an IP address that can reach the storage targets. Repeat this process and add a second VMkernel adapter to the same vSwitch. Configure your VMK ports to use apposing physical NICs. This is done by editing the port group settings and changing the Failover order. This allows you to cleanly share 2 physical NICs for 2 iSCSI connections within a single vSwitch.

In my case VMK2 is active on vmnic3 and VMK3 is active on vmnic1 providing physical path redundancy to the storage array.

When all is said and done, your vSwitch configuration should look something like this:

Next under the iSCSI software adapter, add the target IP to your storage (group IP for EqualLogic). Authentication needs and requirements will vary between organizations. Choose and configure this scheme appropriately for your environment. For my lab, I scope connections based on subnet alone which defines the physical Layer2 boundary of my iSCSI fabrics.

Next configure the network port binding to ensure that the port groups you defined earlier get bound to the iSCSI software adapter using the proper physical interfaces.

At this point, if you have any volumes created on your array and presented to your host, a quick rescan should reveal the devices presented to your host as LUNs.

You should also see 2 paths per LUN (device) per host based on 2 physical adapters connecting to your array. EqualLogic is an active/passive array so only connections to the active controller will be seen here.

If you run into trouble making this work after these steps, jump over to the vSphere Client which does make this process a bit easier. Also keep in mind that all pathing will be set to Fixed by default. See my How to manage multipathing topic below for guidance on changing this.

iSCSI works very well with jumbo frames which is an end-to-end Layer2 technology, so makes sure a MTU of 9000 is set on all ESXi iSCSI vSwitches, VMK ports, as well as the NICs on the storage array. Your switches must be capable of supporting jumbo frames as well. This will increase the performance of your iSCSI network and front-end storage operation speeds.

How to add datastores:

Once your new datastore has been provisioned from your storage platform and presented to your ESXi hosts, from the Hosts and Clusters view, navigate to Related Objects then datastores. From here click the Create a new Datastore button.

Choose the host or cluster to add the datastore to, choose whether it is NFS or VMFS, name the datastore and choose a host that can see it. You should see the raw LUN in the dialog below.

Choose the VMFS version and any partition options you want to implement. Confirm and deploy.

If presenting to multiple hosts, once the VMFS datastore is created and initialized on one, they all should see it assuming the raw device is present via a previous adapter rescan.

How to manage multipathing:

From the Hosts and clusters view, click the Storage tab, choose the datastore you want to manage, click Manage in the middle pane then click Connectivity and Multipathing under Settings.

Alternatively, from the Hosts and Clusters view (from any level item), navigate to Related Objects, then Datastores. Either click the volume you want to edit or choose Settings from the dropdown. Either method will get you to the same place.

From the datastore Settings page, click Manage and under Settings (once again) click Connectivity and Multipathing. In the middle of the screen you should see all hosts attached to whatever datastore you selected. Clicking on each host will reveal the current Path Selection Policy below, “Fixed” by VMware default along with the number of paths present per host.

To change this to Round Robin, click Edit Multipathing, change the Path Selection Policy, repeat for each host connected to the datastore.

How to rename an ESXi host:

Renaming hosts is one area that the Web Client has made significantly easier (once you figure out where to go)! Select a host from the Hosts and Clusters view, click Manage, click Networking, then TCP/IP Configuration below.

From the DNS Configuration menu, select “Enter settings manually”, put whatever hostname you would like here.

VMware recommends putting a host in maintenance mode and disconnecting it from vCenter before doing this. I did this hot with my host active in an HA cluster with zero ill affects. I did it a few times just to make sure. The other way to do this is via the CLI. Connect to your ESXi host via SSH, vMA or vCLI and run:

esxcli system hostname set –host=hostname

Cool changes in Recent Tasks pane:

Not only is the Recent Tasks pane off to the right now, which I really like, it breaks out tasks by All, Running and Failed individually for easier viewing, including the ability to view your own tasks for environments with many admins. Previously these tasks were all lumped together and longer running tasks would get buried in the task stream.

The Recent Tasks pane also provides a new and better method to deal with pop-up configuration dialogs. Ever start configuring something using the old vSphere Client, get 4-5 clicks deep in the pop-up configuration, then realize you need some other piece of information requiring you to cancel out so you can go back to some other area in vCenter? This problem is now resolved in the web client with a cool new section of the Tasks pane called Work in Progress. It doesn’t matter what you’re doing or how far along you are in the dialog. If you need to break away for any reason, you can simply minimize the pop up and come back to it later. These minimized pop-ups will show in the Work in Progress pane below recent tasks.

The example here shows 3 concurrent activities in various states: a vMotion operation, a VM edit settings and a clone operation of the same VM even. Any activity that generates a pop-up dialog can be set aside and picked up again later. This is a huge improvement over the legacy vSphere Client. Very cool!!

Traffic Shaping:

It appears that in the web client you can only apply traffic shaping at the vSwitch level, not at an individual port group or VMK level. Here you can see shaping available for the standard vSwitch:

These settings, while viewable in the VMK policies summary, are not changeable (that I can see).

To override the vSwitch shaping policy and apply one to an individual port group or VMK, you have to use the legacy vSphere Client. Not sure if this is an oversight on VMware’s part or yet another sign of things to come requiring dvSwitching to assign shaping policies below the vSwitch level.

Deploying vCenter Operations Manager (vCOps):

Made extremely easy in vSphere 5.5 via the web client is the deployment of the incredible vCOps vApp for advanced monitoring of your environment. VMware has made detailed performance monitoring of your vSphere world incredibly simply and intuitive through this easy to set up and use vApp. Really impressive. From the home page, click vCenter Operations Management.

On the Getting Started screen, click Deploy vCOps. If you have a valid vmware.com login, entire it here to download the OVF and related files for deployment. You can alternatively point to the file locally if you have it already.

Accept the EULAs and choose all the placement and sizing options for the VM.

A word of caution, do not expect DRS to make a host placement decision for you here during the deployment. The wizard will allow you to select your cluster as a resource destination but the deployment will ultimately fail. Choose a specific host to deploy the VM to instead.

The requisite files will be downloaded from VMware directly and deployed to your environment. Off to the races!

Once deployed, you’ll see 2 new VMs running under the vCOps vApp object in your datacenter.

Once the VMs are powered on and the vApp has been started, you should see new options under vCenter Operations Manager.

First, click the Configure link to open the admin site in a web page. The default login for the admin account is admin/ admin, for root the password is vmware. Configure the initial setup to point to vCenter and the analytics VM which it should detect. Install the certificates as prompted and continue through the registration process.

Once complete, return to the vCOps page in vCenter and click Open, a new web page will launch for you to consume the vCOps goodness. After a short while performance stats should start pouring in for everything in your vSphere environment. Usage patterns and workload profiles can be identified so appropriate adjustments can be made. What you do from here with the data collected is entirely up to you. 🙂

A couple more screens just to show you the capability of vCOps, since I like it so much. Storage at the datastore view:

VM performance view:

Performance Considerations for Enterprise VDI

[This post references portions of published test results from various Dell DVS Enterprise reference architectures. We design, build, test and publish E2E enterprise VDI architectures, SKU’d and sold globally. Head over here and have a look for more information.]
There are four core elements that we typically focus on for performance analysis of VDI: CPU, memory, disk, and network. Each plays a uniquely integral role in the system overall with the software in play defining how each element is consumed and to what extent. In this post I’ll go over some of the key considerations when planning an enterprise VDI infrastructure.

CPU

First things first, no matter what you’ve heard or read, the primary VDI bottleneck is CPU. We in CCC/ DVS at Dell prove this again and again, across all hypervisors, all brokers and any hardware configuration. There are special use case caveats to this of course, but generally speaking, your VDI environment will run out of compute CPU before it runs out of anything else! CPU is a finite shared resource unlike memory, disk or network which can all be incrementally increased or adjusted. There are many special purpose vendors and products out there that will tell you the VDI problem is memory or IOPS, those can be issues but you will always come back to CPU.
Intel’s Ivy Bridge is upon us now, delivering more cores at higher clocks with more cache and supporting faster memory. It is decidedly cheaper to purchase a more expensive pair of CPUs then it is to purchase an entire additional server. For that reason, we recommend running [near] top bin CPUs in your compute hosts as we see measurable benefit in running faster chips. For management hosts you can get by with a lower spec CPU but if you want to get the best return on investment for your compute hosts and get as many users as you can per host, buy the fast CPUs! Our recommendation in DVS enterprise will be following the lateral succession to Ivy Bridge from the Sandy Bridge parts we previously recommended: Xeon E5-2690v2 (IB) vs E5-2690 (SB).
The 2690v2 is a 10 core part using a 22nm process with a 130w TDP clocking in at 3.0GHz and supporting up to 1866MHz DDR3 memory. We tested the top of the line E5-2697v2 (12 cores) as well as the faster 1866MHz memory and saw no meaningful improvement in either case to warrant a core recommendation. It’s all about the delicate balance of the right performance for the right price.
There is no 10c part in the AMD Opteron 6300 line so the closest competitor is the Opteron 6348 (Piledriver). As has always been the case, the AMD parts are a bit cheaper and run more efficiently. AMD clocks lower (with turbo) and due to no hyperthreading feature, executes fewer simultaneous threads. The 6348 also only supports 1600MHz memory but provides a few additional instruction sets. Both run 4 memory channels with an integrated memory controller. AMD also offers a 16c part at its top end in the 6386SE. I have no empirical data on AMD vs Intel for VDI at this time.
Relevant CPU spec comparison, our default selection for DVS Enterprise highlighted in red:

Performance analysis:

To drive this point home regarding the importance of CPU in VDI, here are 2 sets of test results published in my reference architecture for DVS Enterprise on XenDesktop, one on vSphere, one of Hyper-V, both based on Sandy Bridge (we haven’t published our Ivy Bridge data yet). MCS vs PVS is another discussion entirely but in either case, CPU is always the determining factor of scale. These graphs are based on tests using MCS and R720’s fitted with 2 x E5-2690 CPUs and 192GB RAM running the LoginVSI Light workload.
Hyper-V:
The 4 graphs below tell the tale fairly well for 160 concurrent users. Hyper-V does a very good job of optimizing CPU while consuming slightly higher amounts of other resources. Network consumption, while very reasonable and much lower than you might expect for 160 users, is considerably larger than in the vSphere use case in the next example. Once steady state has been reached, CPU peaks right around 85% which is the generally accepted utilization sweet spot making the most of your hardware investment while leaving head room for unforeseen spikes or temporary resource consumption. Memory in use is on the high side given the host had 192GB, but that can be easily remedied by raising to 256GB.

vSphere:

Similar story for vSphere, although the user density below is representative of only 125 desktops of the same user workload. This speaks to another trend we are seeing more and more of which is a stark CPU performance reduction of vSphere compared to Hyper-V for non-View brokers. 35 fewer users overall here but disk performance is also acceptable. In the example below, CPU spikes slightly above 85% at full load with disk performance and network consumption well within reasonable margins. Where vSphere really shines is in it’s memory management capabilities thanks to features the likes of Transparent Page Sharing, as you can see the active memory is quite a bit lower than what has actually been assigned.

Are 4 sockets better than 2?

Not necessarily. 4-socket servers, such as the Dell PowerEdge R820, use a different mutli-processor (MP) CPU architecture, currently based on Sandy Bridge EP (E5-4600 family) versus the dual processor (DP) CPU architectures of its dual socket server counterparts. MP CPUs and their 4-socket servers are inherently more expensive, especially considering the additional RAM required to run whatever additional user densities. 2 additional CPUs does not mean twice the user density in a 4-socket platform either! A similarly configured 2-socket server is roughly half the cost of a 4-socket box and it is for this reason that we recommend the Dell PowerEdge R720 for DVS Enterprise. You will get more users on 2 x R720s cheaper than if you ran a single R820.

Memory

Memory architecture is an important consideration for any infrastructure planning project. Our experience shows that VDI appears to be less sensitive to memory bandwidth performance than other enterprise applications. Besides overall RAM density per host, DIMM speed and loading configuration are important considerations. In Sandy and Ivy Bridge CPUs, there are 4 memory channels, 3 DIMM slots each, per CPU (12 slots total). Your resulting DIMM clock speed and total available bandwidth will vary depending on how you populate these slots.

As you can see from the table below, loading all slots on a server via 3 DPC (3 DIMMs per channel) will result in a forced clock reduction to either 1066 or 1333 depending on the DIMM voltage. If you desire to run at 1600MHz or 1866Mhz (Ivy) you cannot populate the 3rd slot per channel which will net 8 empty DIMM slots per server. You’ll notice that the higher memory clocks are also achievable using lower voltage RDIMMs.

Make sure to always use the same DIMM size, clock and slot loading to ensure a balanced configuration. To follow the example of 256GB in a compute host, the proper loading to maintain maximum clock speeds and 4-channel bandwidth is as follows per CPU:

If 256GB is not required, leaving the 4th channel empty is also acceptable in “memory optimized” BIOS mode although it does reduce the overall memory bandwidth by 25%. In our tests with the older sandy bridge E5-2690 CPUs, we did not find that this affected desktop VM performance.

Disk

There are 3 general considerations for storage that vary depending on the requirements of the implementation: capacity, performance and protocol.
Usable capacity must be sufficient to meet the needs of both Tier1 and Tier2 storage requirements which will differ greatly based on persistent or non-persistent desktops. We generally see an excess of usable capacity as a result of the number of disks required to provide proper performance. This of course is not always the case as bottlenecks can often arise in other areas, such as array controllers. It is less expensive to run RAID10 in fewer arrays to achieve a given performance requirement, than it is to run more arrays at RAID50. Sometimes you need to maximize capacity, sometimes you need to maximize performance. Persistent desktops (full clones) will consume much more disk than their non-persistent counterparts so additional storage capacity can be purchased or a deduplication technology can be leveraged to reduce the amount of actual disk required. If using local disk, in a Local Tier 1 solution model, inline dedupe software can be implemented to reduce the amount of storage required by several fold. Some shared storage arrays have this capability built in. Other solutions, such as Microsoft’s native dedupe capability in Server 2012 R2, make use of file servers to host Hyper-V VMs via SMB3 to reduce the amount of storage required.
Disk performance is another deep well of potential and caveats again related directly to the problems one needs to solve. A VDI desktop can consume anywhere from 3 to over 20 IOPS for steady state operation depending on the use case requirements. Sufficient steady state disk performance can be provided without necessarily solving the problems related to boot or login storms (many desktop VMs being provisioned/ booted or many users logging in all at once). Designing a storage architecture to withstand boot or login storms requires providing a large amount of available IOPS capability which can be via hardware or software based solutions, neither generally inexpensive. Some products combine the ability to provide high IOPS while also providing dedupe capabilities. Generally speaking, it is much more expensive to provide high performance for potential storms than it is to provide sufficient performance for normal steady state operations. When considering SSDs and shared storage, one needs to be careful to consider the capabilities of the array’s controllers which will almost always exhaust before the attached disk will. Just because you have 50K IOPS potential in your disk shelves on the back end, does not mean that the array is capable of providing that level of performance on the front end!
There is not a tremendous performance difference between storage protocols used to access a shared array on the front end, this boils down to preference these days. Fiber Channel has been proven to outperform iSCSI and file protocols (NFS) by a small margin but performance alone is not really reason enough to choose between them. Local disk also works well but concessions may need to be made with regard to HA and VM portability. Speed, reliability, limits/ maximums, infrastructure costs and features are key considerations when deciding on a storage protocol. At the end of the day, any of these methods will work well for an enterprise VDI deployment. What features do you need and how much are you willing to spend?

Network

Network utilization is consistently (and maybe surprisingly) low, often in the Kbps/ per user. VDI architectures in and of themselves simply don’t drive a ton of steady network utilization. VDI is bursty and will exhibit higher levels of consumption during large aggregate activities such as provisioning or logins. Technologies like Citrix Provisioning Server will inherently drive greater consumption by nature. What will drive the most variance here is much more reliant on upstream applications in use by the enterprise and their associated architectures. This is about as subjective as it gets, so impossible to speculate in any kind of fashion across the board. Now you will have a potentially high number of users on a large number of hosts, so comfortable network oversubscription planning should be done to ensure proper bandwidth in and out of the compute host or blade chassis. Utilizing enterprise-class switching components that are capable of operating at line rates for all ports is advisable. Will you really need hundreds of gigs of bandwidth? I really doubt it. Proper HA is generally desirable along with adherence to sensible network architectures (core/ distribution, leaf/spline). I prefer to do all of my routing at the core leaving anything Layer2 at the Top of Rack. Uplink to your core or distribution layers using 10Gb links which can be copper (TwinAx) for shorter runs or fiber for longer runs.

In Closing

That about sums it up for the core 4 performance elements. To put a bow on this, hardware utilization analysis is fine and definitely worth doing, but user experience is ultimately what’s important here. All components must sing together in harmony to provide the proper level of scale and user experience. A combination of subjective and automated monitoring tests during a POC will give a good indication of what users will experience. At Dell, we use Stratusphere UX by Liquidware Labs to measure user experience in combination with Login Consultants LoginVSI for load generation. A personal, subjective test (actually log in to a session) is always a good idea when putting your environment under load, but a tool like Stratusphere UX can identify potential pitfalls that might otherwise go unnoticed.
Keeping tabs on latencies, queue lengths and faults, then reporting each users’ experience into a magic-style quadrant will give you the information required to ascertain if your environment will either perform as designed, or send you scrambling to make adjustments.

Server 2012 Native NIC Teaming and Hyper-V

One of the cool new features introduced in Server 2012 is the ability to team your NICs natively which is fully supported by Microsoft, without using any vendor-specific drivers or software. OEM driver-based teaming solutions have been around for much longer but have never been supported by Microsoft and are usually the first thing asked to be disabled if you ever call for PSS support. Server 2012 teaming is easy to configure and can be used for simple to very complex networking scenarios, converged or non. In this post I’ll cover the basics of teaming with convergence options having a focus around Hyper-V scenarios. A quick note on vernacular: Hyper-V host, parent, parent partition, or management OS all refer to essentially the same thing. Microsoft likes to refer to the network connections utilized by the hyper-V host as the “management OS” in the system dialogs.

100% of the teaming configuration can be handled via PowerShell with certain items specifically requiring it. Basic functionality can be configured via the GUI which is most easily accessed from the Local Server page in Server Manager:

The NIC Teaming app isn’t technically part of Server Manager but it sure looks and feels like it. Click New Team under the tasks drop down of the Teams section:

Name the team, select the adapters you want to include (up to 32) then set the additional properties. Switch independent teaming mode should suit the majority of use cases but can be changed to static or LACP mode if required. Load balancing mode options consists of address hash or Hyper-V port. If you intend to use this team within a Hyper-V switch, make sure the latter is selected. Specifying all adapters as active should also serve the majority of use cases but a standby adapter can be specified if required.

Click ok and your new team will appear within the info boxes in the lower portion of the screen. Additional adapters can be added to an existing team at any time.

Every team and vNIC you create will be named and enabled via the Microsoft Network Adapter Multiplexor Driver. This is the device name that Hyper-V will see so name your teams intuitively and take note of which driver number is assigned to which team (multiplexor driver #2, etc).

In the Hyper-V Virtual Switch Manager, under the external network connection type you will see every physical NIC in the system as well as a multiplexor driver for every NIC team. Checking the box below the selected interface does exactly what it suggests: shares that connection with the management OS (parent).

VLANs (non Hyper-V)

VLANs can be configured a number of different ways depending on your requirements. The highest level of methods can be configured in the NIC team itself. From within the Adapters and Interfaces dialog, click the Team Interfaces tab, then right click the team you wish to configure.

Entering a specific VLAN will limit all traffic on this team accordingly. If you intend to use this team with a Hyper-V switch DO NOT DO THIS! It will likely cause confusion and problems later. Leave any team to be used in Hyper-V in default mode with a single team interface and do your filtering within Hyper-V.

Team Interfaces can also be configured from this dialog which will create vNICs tied to a specific VLAN. This can be useful for specific services that need to communicate on a dedicated VLAN not part of Hyper-V. First select the team on the left, then right and the Add Interface drop down item will become active.

Name the interface and set the intended VLAN. Once created these will also appear as multiplexor driver devices in the network connections list. They can then be assigned IP addresses the same as any other NIC. vNICs created in this manner are not intended for use in Hyper-V switches!

VLANS (Hyper-V)

For Hyper-V VLAN assignments, the preferred method is to let the Hyper-V switch perform all filtering. This varies a bit depending on if you are assigning management VLANs to the Hyper-V parent or to guest VMs. For guest VMs, VLAN IDs should be specified within vNICs connected to a Hyper-V port. If multiple VLANs need to be assigned, add multiple network adapters to the VM and assign VLAN IDs as appropriate. This can also be accomplished in PowerShell using the Set-VMNetworkAdapterVlan command.

NIC teaming can also be enabled within a guest VM by configuring the advanced feature item within the VM’s network adapter settings. This is an either/ or scenario, guest vNICs can be teamed if there is no teaming in the Management OS.

Assigning VLANs to interfaces used by the Hyper-V host can be done a couple of different ways. Most basically, if your Hyper-V host is to share the team or NIC of a virtual switch this can be specified in the Virtual Switch Manager for one of your external vSwitches. Optionally a different VLAN can be specified for the management OS.

Converged vs Non-converged

Before I go much further on carving VLANs for the management OS out of Hyper-V switches, let’s look at a few scenarios and identify why we may or may not want to do this. Thinking through some of these scenarios can be a bit tricky conceptually, especially if you’re used to working with vSphere. In ESXi all traffic sources and consumers go through a vSwitch, in all cases. ESXi management, cluster communications, vMotion, guest VM traffic…everything. In Hyper-V you can do it this way, but you don’t have to, and depending on how or what you’re deploying you may not want to.

First let’s look at a simple convergence model. This use case applies to a clustered VDI infrastructure with all desktops and management components running on the same set of hosts. 2 10Gb NICs configured in a team, all network traffic for both guest VMs and parent management will traverse the same interfaces, storage traffic split out via NPAR to MPIO drivers. The NICs are teamed for redundancy, guest VMs attach to the vSwitch and the management OS receives weighted vNICs from the same vSwitch. Once a team is assigned to a Hyper-V switch, it cannot be used by any other vSwitch.

Assigning vNICs to be consumed by the management OS is done via PowerShell using the Add-VMNetworkAdapter command. There is no way to do this via the GUI. The vNIC is assigned to the parent via the ManagementOS designator, named whatever you like and assigned to the virtual switch (as named in Virtual Switch Manager).

Once the commands are successfully issued, you will see the new vNICs created in the network connections list that can be assigned IP addresses and configured like any other interface.

You can also see the lay of the land in PowerShell (elevated) by issuing the Get-VMNetworkAdapter command.

Going a step further, assign VLANs and weighted priorities for the management OS vNICs:

Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Cluster” -Access -VlanId 25
Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Live Migration” -Access -VlanId 50
Set-VMNetworkAdapterVlan -ManagementOS -VMNetworkAdapterName “Management” -Access -VlanId 10

Set-VMNetworkAdapter -ManagementOS -Name “Cluster” -MinimumBandwidthWeight 40
Set-VMNetworkAdapter -ManagementOS -Name “Live Migration” -MinimumBandwidthWeight 20
Set-VMNetworkAdapter -ManagementOS -Name “Management” -MinimumBandwidthWeight 5

For our second scenario let’s consider a non-converged model with 6 NICs. 2 for iSCSI storage, 2 for the Hyper-V switch and 2 for the management OS. The NICs used to access any Ethernet based storage protocol traffic should not be teamed, let MPIO take care of this. The NIC team used by the Hyper-V switch will not share with any other host function, the same goes for the management team.

This is a perfectly acceptable method of utilizing more NICs on a Hyper-V host if you haven’t bought into the converged network models. Not everything must go through a vSwitch if you don’t want it to. Flexibility is good but this is the point that may confuse folks. vNICs dedicated to the management OS attached to the Mgmt NIC team can be carved out via PowerShell or via the Teaming GUI as Team Interfaces. Since these interfaces will exist and function outside of Hyper-V, it is perfectly acceptable to configure them in this manner. Storage traffic should be configured using MPIO and not NIC teaming.

The native Server 2012 NIC teaming offering is powerful and a much welcomed addition to the Server software stack. There are certainly opportunities for Microsoft to enhance and simply this feature further by adding best practices gates to prevent misconfiguration. The bandwidth weighting calculations could also be made simpler and more in line with WYSIWYG. Converged or non, there are a number of ways to configure teaming in Server 2012 to fit any use case or solution model with an enhanced focus paid to Hyper-V.

Unidesk: Layered VDI Management

VDI is one of the most intensive workloads in the datacenter today and by nature uses every major component of the enterprise technology stack: networking, servers, virtualization, storage, load balancing. No stone is left unturned when it comes to enterprise VDI. Physical desktop management can also be an arduous task with large infrastructure requirements of its own. The sheer complexity of VDI drives a lot of interesting and feverish innovation in this space but also drives a general adoption reluctance for some who fear the shift too burdensome for their existing teams and datacenters. The value proposition Unidesk 2.0 brings to the table is a simplification of the virtual desktops themselves, simplified management of the brokers that support them, and comprehensive application management .

The Unidesk solution plugs seamlessly into a new or existing VDI environment and is comprised of the following key components:

Management virtual appliance
Master CachePoint
Secondary CachePoints
Installation Machine

Solution Architecture

At its core, Unidesk is a VDI management solution that does some very interesting things under the covers. Unidesk requires vSphere at the moment but can manage VMware View, Citrix XenDesktop, Dell Quest vWorkspace, or Microsoft RDS. You could even manage each type of environment from a single Unidesk management console if you had the need or proclivity. Unidesk is not a VDI broker in and of itself, so that piece of the puzzle is very much required in the overall architecture. The Unidesk solution works from the concept of layering, which is increasingly becoming a hotter topic as both Citrix and VMware add native layering technologies to their software stacks. I’ll touch on those later. Unidesk works by creating, maintaining, and compositing numerous layers to create VMs that can share common items like base OS and IT applications, while providing the ability to persist user data including user installed applications, if desired. Each layer is stored and maintained as a discrete VMDK and can be assigned to any VM created within the environment. Application or OS layers can be patched independently and refreshed to a user VM. Because of Unidesk’s layering technology, customers needing persistent desktops can take advantage of capacity savings over traditional methods of persistence. A persistent desktop in Unidesk consumes, on average, a similar disk footprint to what a non-persistent desktop would typically consume.

CachePoints (CP) are virtual appliances that are responsible for the heavy lifting in the layering process. Currently there are two distinct types of CachPoints: Master and Secondary. The Master CP is the first to be provisioned during the setup process and maintains the primary copy of all layers in the environment. Master CPs replicate the pertinent layers to Secondary CPs who have the task of actually combining layers to build the individual VMs, a process called Compositing. Due to the role played by each CP type, the Secondary CPs will need to live on the Compute hosts with the VMs they create. Local or Shared Tier 1 solution models can be supported here, but the Secondary CPs will need to be able to the “CachePoint and Layers” volume at a minimum.

The Management Appliance is another virtual machine that comes with the solution to manage the environment and individual components. This appliance provides a web interface used to manage the CPs, layers, images, as well as connections to the various VDI brokers you need to interface with. Using the Unidesk management console you can easily manage an entire VDI environment almost completely ignoring vCenter and the individual broker management GUIs. There are no additional infrastructure requirements for Unidesk specifically outside of what is required for the VDI broker solution itself.

Installation Machines are provided by Unidesk to capture application layers and make them available for assignment to any VM in the solution. This process is very simple and intuitive requiring only that a given application is installed within a regular VM. The management framework is then able to isolate the application and create it as an assignable layer (VMDK). Many of the problems traditionally experienced using other application virtualization methods are overcome here. OS and application layers can be updated independently and distributed to existing desktop VMs.

Here is an exploded and descriptive view of the overall solution architecture summarizing the points above:

Storage Architecture

The Unidesk solution is able to leverage three distinct storage tiers to house the key volumes: Boot Images, CachePoint and Layers, and Archive.

Boot Images – Contains images having very small footprints and consist of a kernel and pagefile used for booting a VM. These images are stored as VMDKs, like all other layers, and can be easily recreated if need be. This tier does not require high performance disk.
CachePoint and Layers – This tier stores all OS, application, and personalization layers. Of the three tiers, this one sees the most IO so if you have high performance disk available, use it with this tier.
Archive – This tier is used for layer backup including personalization. Repairs and restored layers can be pulled from the archive and placed into the CachePoint and Layers volume for re-deployment, if need be. This tier does not require high performance disk.

The Master CP stores layers in the following folder structure, each layer organized and stored as a VMDK.

Installation and Configuration

New in Unidesk 2.x is the ability to execute a completely scripted installation. You’ll need to decide ahead of time what IPs and names you want to use for the Unidesk management components as these are defined during setup. This portion of the install is rather lengthy to it’s best to have things squared away before you begin. Once the environment variables are defined, the setup script takes over and builds the environment according to your design.

Once setup has finished, the Management appliance and Master CP will be ready, so you can log into the mgmt console to take the configuration further. Of the initial key activities to complete will be setting up an Active Directory junction point and connecting Unidesk to your VDI broker. Unidesk should already be talking to your vCenter server at this point.

Your broker mgmt server will need to have the Unidesk Integration Agent installed which you should find in the bundle downloaded with the install. This agent listens on TCP 390 and will connect the Unidesk management server to the broker. Once this agent is installed on the VMware View Connection Server or Citrix Desktop Delivery Controller, you can point the Unidesk management configuration at it. Once synchronized all pool information will be visible from the Unidesk console.

A very neat feature of Unidesk is that you can build many AD junction points from different forests if necessary. These junction points will allow Unidesk to interact with AD and provide the ability to create machine accounts within the domains.

Desktop VM and Application Management

Once Unidesk can talk to your vSphere and VDI environments, you can get started building OS layers which will serve as your gold images for the desktops you create. A killer feature of the Unidesk solution is that you only need a single gold image per OS type even for numerous VDI brokers. Because the broker agents can be layered and deployed as needed, you can reuse a single image across disparate View and XenDesktop environments, for example. Setting up an OS layer simply points Unidesk at an existing gold image VM in vCenter and makes it consumable for subsequent provisioning.

Once successfully created, you will see your OS layers available and marked as deployable.

Before you can install and deploy applications, you will need to deploy a Unidesk Installation Machine which is done quite simply from the System page. You should create an Installation Machine for each type of desktop OS in your environment.

Once the Installation Machine is ready, creating layers is easy. From the Layers page, simply select “Create Layer,” fill in the details, choose the OS layer you’ll be using along with the Installation machine and any prerequisite layers.

To finish the process, you’ll need to log into the Installation Machine, perform the install, then tell the Unidesk management console when you’re finished and the layer will be deployable to any VM.

Desktops can now be created as either persistent of non-persistent. You can deploy to already existing pools or if you need a new persistent pool created, Unidesk will take care of it. Choose the type of OS template to deploy (XP or Win7), select the connected broker to which you want to deploy the desktops, choose an existing pool or create a new one, and select the number of desktops to create.

Next select the CachePoint that will deploy the new desktops along with the network they need to connect to and the desktop type.

Select the OS layer that should be assigned to the new desktops.

Select the application layers you wish to assign to this desktop group. All your layers will be visible here.

Choose the virtual hardware, performance characteristics and backup frequency (Unidesk Archive) of the desktop group you are deploying.

Select an existing or create a new maintenance schedule that defines when layers can be updated within this desktop group.

Deploy the desktops.

Once the creation process is underway, the activity will be reflected under the Desktops page as well as in vCenter tasks. When completed all desktops will be visible and can be managed entirely from the Unidesk console.

Sample Architecture

Below are some possible designs that can be used to deploy Unidesk into a Local or Shared Tier 1 VDI solution model. For Local Tier 1, both the Compute and Management hosts will need access to shared storage, even though VDI sessions will be hosted locally on the Compute hosts. 1Gb PowerConnect or Force10 switches can be used in the Network layer for LAN and iSCSI. The Unidesk boot images should be stored locally on the Compute hosts along with the Secondary CachePoints that will host the sessions on that host. All of the typical VDI management components will still be hosted on the Mgmt layer hosts along with the additional Unidesk management components. Since the Mgmt hosts connect to and run their VMs from shared storage, all of the additional Unidesk volumes should be created on shared storage. Recoverability is achieved primarily in this model through use of the Unidesk Archive function. Any failed Compute host VDI session information can be recreated from the Archive on a surviving host.

Here is a view of the server network and storage architecture with some of the solution components broken out:

For Shared Tier 1 the layout is slightly different. The VDI sessions and “CachePoint and Layers” volumes must live together on Tier 1 storage while all other volumes can live on Tier 2. You could combine the two tiers for smaller deployments, perhaps, but your mileage will vary. Blades are also an option here, of course. All normal vSphere HA options apply here with the Unidesk Archive function bolstering the protection of the environment.

Unidesk vs. the Competition

Both Citrix and VMware have native solutions available for management, application virtualization, and persistence so you will have to decide if Unidesk if worth the price of admission. On the View side, if you buy a Premier license, you get ThinApp for applications, Composer for non-persistent linked clones, and soon the technology from VMware’s recent Wanova acquisition will be available. The native View persistence story isn’t great at the moment, but Wanova Mirage will change that when made available. Mirage will add a few layers to the mix including OS, apps, and persistent data but will not be as granular as the multi-layer Unidesk solution. The Wanova tech notwithstanding, you should be able to buy a cheaper/ lower level View license as with Unidesk you will need neither ThinApp nor Composer. Unidesk’s application layering is superior to ThinApp, with little in the way of applications that cannot be layered, and can provide persistent or non-persistent desktops with almost the same footprint on disk. Add to that the Unidesk single management pane for both applications and desktops, and there is a compelling value to be considered.

On the Citrix side, if you buy an Enterprise license, you get XenApp for application virtualization, Provisioning Services (PVS) and Personal vDisk (PVD) for persistence from the recent RingCube acquisition. With XenDesktop you can leverage Machine Creation Services (MCS) or PVS for either persistent or non-persistent desktops. MCS is deadly simple while PVS is incredibly powerful but an extraordinary pain to set up and configure. XenApp builds on top of Microsoft’s RDS infrastructure and requires additional components of its own such as SQL Server. PVD can be deployed with either catalog type, PVS or MCS, and adds a layer of persistence for user data and user installed applications. While PVD provides only a single layer, that may be more than suitable for any number of customers. The overall Citrix solution is time tested and works well although the underlying infrastructure requirements are numerous and expensive. XenApp offloads application execution from the XenDesktop sessions which will in turn drive greater overall host densities. Adding Unidesk to a Citrix stack again affords a customer to buy in at a lower licensing level, although Citrix is seemingly removing value for augmenting its software stack by including more at lower license levels. For instance, PVD and PVS are available at all licensing levels now. The big upsell now is for the inclusion of XenApp. Unidesk removes the need for MCS, PVS, PVD, and XenApp so you will have to ask yourself if the Unidesk approach is preferred to the Citrix method. The net result will certainly be less overall infrastructure required but net licensing costs may very well be a wash.

Whole House Networking using MoCA

If you’re a Verizon FIOS customer then you are already using the MoCA standard, which is the Multimedia over Coax Alliance. This technology is in use by many cable and satellite providers and supports up to 175Mbps of streamed HD video. All of the cable set top boxes (STB) in your house connect to your FIOS router through coax cables and stream video content using MoCA which is enabled in the router. As you can see below, each device connecting via MoCA receive an IP address just like another other device on your network.

Ok, that’s cool but wifi is ubiquitous these days right, why use MoCA? If you have wifi-enabled devices in your home media stack, awesome, but many TVs, receivers, game consoles and media players are network-capable but not wifi-enabled. If you have a usable coax run in the wall, you can extend your Ethernet to these devices using this technology that is already very much flowing through your household. There are a couple of ways to do this, the first is using a MoCA adapter.

If the location you want to extend your network in also has a STB, then you will need to split the signal using a coax 2-way splitter. One output will go to the STB, the other to the adapter. Whatever device you want to connect to your network can do so using CAT5/6 to the Ethernet port in the adapter. Simple. The catch is that these adapters are not cheap at $75 or so apiece and you only get a single Ethernet port per adapter. Of course you could connect a switch behind the adapter if you need more ports. Verizon fully supports this, by the way, and although these adapters are made by Actiontec, there are Verizon branded adapters that you can buy directly from Verizon.

There is another method, which is how I do this at my house. Instead of using expensive MoCA adapters, you can use MoCA-capable FIOS routers. There are many benefits to doing this: you get a built-in switch, you can use the router to extend your wifi network if you wish, and they are dirt cheap. The Actiontec MI424WR is the router you want (Rev F or later ideally) and can be found on that giant well-known internet flea market for ~$20.

From a connection perspective, using the router is the same as the adapter: coax in, CAT5/6 out to your devices. Just a few best practices if you go this route:

Give your MoCA network extenders dedicated IP addresses
Disable the wireless antenna if you don’t plan to use it
Disable the Broadband Connection (coax)

They did a great job over at the media center maniac drawing out what this looks like using routers to extend your network using MoCA. Just substitute Linksys DMA-2200’s for TVs, receivers, etc 🙂

Enterprise VDI

My name is Peter and I am the Principal Engineering Architect for Desktop Virtualization at Dell. 🙂

VDI is a fire hot topic right now and there are many opinions out there on how to approach it. If you’re reading this I probably don’t need to sell you on the value of the concept but more and more companies are deploying VDI instead of investing in traditional PC refreshes. All trending data points to this shift only going up over the next several years as the technology gets better and better. VDI is a very interesting market segment as it encompasses the full array of cutting edge enterprise technologies: network, servers, storage, virtualization, database, web services, highly distributed software architectures, high-availability, and load balancing. Add high capacity and performance requirements to the list and you have a very interesting segment indeed! VDI is also constantly evolving with a very rich ecosystem of players offering new and interesting technologies to keep up with. This post will give you a brief look at the enterprise VDI offering from Dell.
As a customer, and just 1 year ago I still was one, it’s very easy to get caught up in the marketing hype making it difficult to realize the true value of product or platform. With regard to VDI, we are taking a different approach at Dell. Instead of trying to lure you with inconceivable and questionable per server user densities, we have decided to take a very honest and realistic approach in our solutions. I’ll explain this in more detail later.
Dell currently offers 2 products in the VDI space: Simplified, which is the SMB-focused VDI-in-a-box appliance I discussed here (link), and Enterprise which can also start very small but has much longer legs to scale to suit a very large environment. I will be discussing the Enterprise platform in this post which is where I spend the majority of my time. In the resources section at the bottom of this posting you will find links to 2 reference architectures that I co-authored. They serve as the basis for this article.

DVS Enterprise

Dell DVS Enterprise is a multi-tiered turnkey solution comprised of rack or blade servers, iSCSI or FC storage built on industry leading hypervisors, software and VDI brokers. We have designed DVS Enterprise to encompass tons of flexibility to meet any customer need and can suit 50-50,000 users. As apposed to the more rigid “block” type products, our solutions are tailored to the customer to provide exactly what is needed with flexibility for leveraging existing investments in network, storage, and software.
The solution stacks consist of 4 primary tiers: network, compute, management, and storage. Network and storage can be provided by the customer, given the existing infrastructure meets our design and performance requirements. The Compute tier is where the VDI sessions execute, whether running on local or shared storage. The management tier is where VDI broker VMs and supporting infrastructure run. These VMs run off of shared storage in all solutions so management tier hosts can always be clustered to provide HA. All tiers, while inextricably linked, can scale independently.

The DVS Enterprise portfolio consists of 2 primary solution models: “Local Tier 1” and “Shared Tier 1”. DVS Engineering spends considerable effort validating and characterizing core solution components to ensure your VDI implementation will perform as it is supposed to. Racks, blades, 10Gb networking, Fiber Channel storage…whatever mix of ingredients you need, we have it. Something for everyone.

Local Tier 1

“Tier 1” in the DVS context defines from which disk source the VDI sessions execute and is therefore faster and higher performing disk. Local Tier1 applies only to rack servers (due to the amount of disk required) while Shared Tier 1 can be rack or blade. Tier 2 storage is present in both solution architectures and, while having a reduced performance requirement, is utilized for user profile/data and management VM execution. The graphic below depicts the management tier VMs on shared storage while the compute tier VDI sessions are on local server disk:

This local Tier 1 Enterprise offering is uniquely Dell as most industry players focus solely on solutions revolving around shared storage. The value here is flexibility and that you can buy into high performance VDI no matter what your budget is. Shared Tier 1 storage has its advantages but is costly and requires a high performance infrastructure to support it. The Local Tier 1 solution is cost optimized and only requires 1Gb networking.

Network

We are very cognizant that network can be a touchy subject with a lot of customers pledging fierce loyalty to the well-known market leader. Hey I was one of those customers just a year ago. We get it. That said, a networking purchase from Dell is entirely optional as long you have suitable infrastructure in place. From a cost perspective, PowerConnect provides strong performance at a very attractive price point and is the default option in our solutions. Our premium Force10 networking product line is positioned well to compete directly with the market leader from top of rack (ToR) to large chassis-based switching. Force10 is an optional upgrade in all solutions. For the Local Tier 1 solution, a simple 48-port 1Gb switch is all that is required, the PC6248 is shown below:

Servers

The PowerEdge R720 is a solid rack server platform that suits this solution model well with up to 2 x 2.9Ghz 8-core CPUs, 768GB RAM, and 16 x 2.5” 15K SAS drives. There is more than enough horsepower in this platform to suit any VDI need. Again, flexibility is an important tenet of Dell DVS so other server platforms can be used if desired to meet specific needs.

Storage

A shared Tier 2 storage purchase from Dell is entirely optional in the Local Tier 1 solution but is a required component of the architecture. The Equallogic 4100X is a solid entry level 1Gb iSCSI storage array that can be configured to provide up to 22TB of raw storage running on 10k SAS disks. You can of course go bigger to the 6000 series in Equallogic or integrate a Compellent array with your choice of storage protocol. It all depends on your need to scale.

Shared Tier 1

In the Shared Tier 1 solution model, an additional shared storage array is added to handle the execution of the VDI sessions in larger scale deployments. Performance is a key concern in the shared Tier 1 array and contributes directly to how the solution scales. All Compute and Mgmt hosts in this model are diskless and can be either rack or blade. In smaller scale solutions, the functions of Tier 1 and Tier 2 can be combined as long as there is sufficient capacity and performance on the array to meet the needs of the environment.

Network

The network configuration changes a bit in the shared Tier 1 model depending if you are using rack or blades and what block storage protocol you employ. Block storage traffic should be separated from LAN so iSCSI will leverage a discrete 10Gb infrastructure while fiber channel will leverage an 8Gb fabric. The PowerConnect 8024F is a 10Gb SFP+ based switch used for iSCSI traffic destined to either Equallogic or Compellent storage that can be stacked to scale. The fiber channel industry leader Brocade is used for FC fabric switching.

In the blade platform, each chassis has 3 available fabrics that can be configured with Ethernet, FC, or Infiniband switching. In DVS solutions, the chassis is configured with the 48-port M6348 switch interconnect for LAN traffic and either Brocade switches for FC or a pair of 10Gb 8024-K switches for iSCSI. Ethernet-based chassis switches are stacked for easy management.

Servers

Just like the Local Tier 1 solution, the R720 can be used if rack servers are desired or the half-height dual-socket M620 if blades are desired. The M620 is on par to the R720 in all regards except for disk capacity and top end CPU. The R720 can be configured with a higher 2.9Ghz 8-core CPU to leverage greater user density in the compute tier. The M1000E blade chassis can support 16 half-height blades.

Storage

Either Equallogic or Compellent arrays can be utilized in the storage tier. The performance demands of Tier 1 storage in VDI are very high so design considerations dealing with boot storms and steady-state performance are critical. Each Equallogic array is a self-contained iSCSI storage unit with an active/passive controller pair that can be grouped with other arrays to be managed. The 6110XS, depicted below, is a hybrid array containing a mix of high performance SSD and SAS disks. Equallogic’s active tiering technology dynamically moves hot and cold data between tiers to ensure the best performance at all times. Even though each controller now only has a single 10Gb port, vertical port sharing ensures that a controller port failure does not necessitate a controller failover.

Compellent can also be used in this space and follows a more traditional linear scale. SSDs are used for “hot” storage blocks especially boot storms, and 15K SAS disks are used to store the cooler blocks on dense storage. To add capacity and throughput additional shelves are looped into the array architecture. Compellent has its own auto-tiering functionality that can be scheduled off hours to rebalance the array from day to day. It also employs a mechanism that puts the hot data on the outer ring of the disk platters where they can be read easily and quickly. High performance and redundancy is achieved through an active/active controller architecture. The 32-bit Series 40 controller architecture is soon to be replaced by the 64-bit SC8000 controllers, alleviating the previous x86-based cache limits.

Another nice feature about Compellent is its inherent flexibility. The controllers are flexible like servers allowing you to install the number and type of IO cards you require: FC, iSCSI, FCoE, and SAS for the backend… Need more front-end bandwidth or add another backend SAS loop? Just add the appropriate card to the controller.

In the lower user count solutions, Tier 1 and Tier 2 storage functions can be combined. In the larger scale deployments these tiers should be separated and scale independently.

VDI Brokers

Dell DVS currently supports both VMware View 5 and Citrix XenDesktop 5 running on top of the vSphere 5 hypervisor. All server components run Windows Server 2008 R2 and database services provided by SQL Server 2008 R2. I have worked diligently to create a simple, flexible, unified architecture that expands effortlessly to meet the needs of any environment.

Choice of VDI broker generally lands on customer preference, while each solution has its advantages and disadvantages. View has a very simple backend architecture consisting of 4 essential server roles: SQL, vCenter, View Connection Server (VCS) and Composer. Composer is the secret sauce that provides the non-persistent linked clone technology and is installed on the vCenter server. One downside to this is that because of Composer’s reliance on vCenter, the total number of VMs per vCenter instance is reduced to 2000, instead of the published 3000 per HA cluster in vSphere 5. This means that you will have multiple vCenter instances depending on how large your environment is. The advantage to View is scaling footprint, as 4 management hosts are all that is required to serve a 10,000 user environment. I wrote about View architecture design previously for version 4 (link).

View Storage Accelerator (VSA), officially supported in View 5.1, is the biggest game changing feature in View 5.x thus far. VSA changes the user workload IO profile, thus reducing the number of IOPS consumed by each user. VSA provides the ability to enable a portion of the host server’s RAM to be used for host caching, largely absorbing read IOs. This reduces the demand of boot storms as well as makes the tier 1 storage in use more efficient. Before VSA there was a much larger disparity between XenDesktop and View users in terms of IOPS, now the gap is greatly diminished.
View can be used with 2 connection protocols, the proprietary PCoIP protocol or native RDP. PCoIP is an optimized protocol intended to provide a greater user experience through richer media handling and interaction. Most users will probably be just fine running RDP as PCoIP has a greater overhead that uses more host CPU cycles. PCoIP is intended to compete head on with the Citrix HDX protocol and there are plenty of videos running side by side comparisons if you’re curious. Below is the VMware View logical architecture flow:

XenDesktop (XD), while similar in basic function, is very different from View. Let’s face it, Citrix has been doing this for a very long time. Client virtualization is what these guys are known for and through clever innovation and acquisitions over the years they have managed to bolster their portfolio as the most formidable client virtualization player in this space. A key difference between View and XD is the backend architecture. XD is much more complex and requires many more server roles than View which affects the size and scalability of the management tier. This is very complex software so there are a lot of moving parts: SQL, vCenter, license server, web interfaces, desktop delivery controllers, provisioning servers… there are more pieces to account for that all have their own unique scaling elements. XD is not as inextricably tied to vCenter as View is so a single instance should be able to support the published maximum number of sessions per HA cluster.

One of the neat things about XD is that you have a choice in desktop delivery mechanisms. Machine Creation Services (MCS) is the default mechanism provided in the DDC. At its core this provides a dead simple method for provisioning desktops and functions very similarly to View in this regard. Citrix recommends using MCS only for 5000 or fewer VDI sessions. For greater than 5000 sessions, Citrix recommends using their secret weapon: Provisioning Server (PVS). PVS provides the ability to stream desktops to compute hosts using gold master vDisks, customizing the placement of VM write-caches, all the while reducing the IO profile of the VDI session. PVS leverages TFTP to boot the VMs from the master vDisk. PVS isn’t just for virtual desktops either, it can also be used for other infrastructure servers in the architecture such as XenApp servers and provides dynamic elasticity should the environment need to grow to meet performance demands. There is no PVS equivalent on the VMware side of things.
With Citrix’s recent acquisition and integration of RingCube in XD, there are now new catalog options available for MCS and PVS in XD 5.6: pooled with personal vDisk or streamed with personal vDisk. The personal vDisk (PVD) is disk space that can be dedicated on a per user basis for personalization information, application data ,etc. PVD is intended to provide a degree of end user experience persistence in an otherwise non-persistent environment. Additional benefits of XD include seamless integration with XenApp for application delivery as well as the long standing benefits of the ICA protocol: session reliability, encrypted WAN acceleration, NetScaler integration, etc. Below is the Citrix XenDesktop logical architecture flow:

High Availability

HA is provided via several different mechanisms across the solution architecture tiers. In the network tier HA is accomplished through stacking switches whether top of rack (ToR) or chassis-based. Stacking functionally unifies an otherwise segmented group of switches so they can be managed as a single logical unit. Discrete stacks should be configured for each service type, for example a stack for LAN traffic and a stack for iSCSI traffic. Each switch type has its stacking limits so care has been taken to ensure the proper switch type and port count to meet the needs of a given configuration.

Load balancing is provided via native DNS in smaller stacks, especially for file and SQL, and moves into a virtual appliance based model over 1000 users. NetScaler VPX or F5 LTM-VE can be used to load balance larger environments. NetScalers are sized based on required throughput as each appliance can manage millions of concurrent TCP sessions.
Protecting the compute tier differs a bit between the local and shared tier 1 solutions, as well as between View and XenDesktop. In the local tier 1 model there is no share storage in the compute tier, so vSphere HA can’t help us here. With XD, PVS can provide HA functionality by controlling the placement of VDI VMs from a failed host to a hot standby.

The solution for View is not quite so elegant in the local tier 1 model as there is no mechanism to automatically move VMs from a failed host. What we can do though is mimic HA functionality by manually creating a resource reservation on each compute host. This creates a manual RAID type of model where there is reserve capacity to host a failed server’s VDI sessions.

In the shared tier 1 model, the compute tier has shared storage so we can take full advantage of vSphere HA. This also applies to the management tier in all solution models. There are a few ways to go here when configuring admission control. Thankfully there are now more options than only calculating slot sizes and overhead. The simplest way to go is specifying a hot standby for dedicated failover. The downside is that you will have gear sitting idle. If that doesn’t sit well with you then you could specify a percentage of cluster resources to reserve. This will thin the load running on each host in the cluster but at least won’t waste resources entirely.

If the use of DRS is desired, care needs to be taken in large scale scenarios as this technology will functionally limit each HA cluster to 1280 VMs.

Protection for the storage tier is relatively straight forward as each storage array has its own built-in protections for controllers and RAID groups. In smaller solution stacks (under 1000 users) a file server VM is sufficient to host user data, profiles, etc. We recommend that for deployments larger than 1000 users that NAS be leveraged to provide this service. Our clustered NAS solutions for both Equallogic and Compellent are high performing and scalable to meet the needs of very large deployments. That said, NAS is available as an HA option at any time, for any solution size.

Validation

The Dell DVS difference is that our solutions are validated and characterized around real-world requirements and scenarios. Everyone that competes in this space plays the marketing game but we actually put our solutions through their paces. Everything we sell in our core solution stacks has been configured, tested, scrutinized, optimized, and measured for performance at all tiers in the solution. Additional consulting and blue printing services are available to help customers properly size VDI for their environments by analyzing user workloads to build a custom solution to meet those needs. Professional services is also available to stand up and support the entire solution.

The DVS Enterprise solution is constantly evolving with new features and options coming this summer. Keep an eye out here for more info on the latest DVS offerings as well as discussions on the interesting facets of VDI.

References:

Dell DVS: VMware View Reference Architecture

Dell DVS: Citrix XenDesktop Reference Architecture

Wake On LAN

Happy New Year!!

While I wait for the work I’ve been doing for the past 6 months to be publicly released (so I can talk about it), here is a quick post about WOL. WOL technologies have been around for awhile and are used to wake a computer from sleep by sending a magic packet or pattern to the sleeping PC. The best analogy I’ve seen for this is a bunch of people in a room together and one person across the room shouting out another’s name. Everyone but the person with the name called ignores the call. Sleep mode shuts down all major processing on the PC but keeps active tasks (documents, etc) alive in RAM which continues to be energized during sleep. This allows the PC to be quickly awakened and resume normal operation, right where the user left off.

Use Case

I have 6 Windows7 PCs in my house, all of which are connected to a HomeGroup, along with a NAS, a LAN-enabled TV, an XBOX360 and a PS3, all capable of delivering and consuming content via DLNA. I let my PCs sleep after 2 hours of inactivity which saves power consumption. Sometimes I want to access content or simply RDP to a PC that is sleeping. Instead of walking around and physically waking up a sleeping PC, why not leverage WOL? This can be useful in corporate environments as well if you don’t already have tools in place.

Set Up

First you need to enable WOL in the properties of your NIC, most of which these days should be supported. In the advanced properties tab look for “Wake-Up Capabilities.” The important thing to enable in the values here is the Magic Packet. Enabling Pattern is ok too.

Wake Up

Now when the PC sleeps it can be awakened remotely. There are a few different tools out there that can do this. I’ll be working with Depicus’ tools which has, among others, command line and GUI versions: Wake On LAN GUI/ Command Line. Both are free and dead simple executables that require no install and can be easily stored in your Dropbox for portability. They also have apps for Android and IOS. Each version accomplishes the same thing ultimately and can target a host over the internet if need be. All that is needed is the MAC address of the PC to wake up, its IP, subnet and port. For LAN wake up use port 7, if using across the internet you will need to specify a port that is allowed through your firewall as well as a public IP.

The GUI provides a dropdown to specify local subnet or internet. In my tests the GUI required the dashes in the MAC address, it gave an error without them. The cmd version will accept either.

Here you can see the PC I’m waking up is 10.10.1.19. It is fully asleep when the Ping starts, times out to unreachable, then comes back alive when the magic packet was sent.

That’s it! Another reason to stay seated. 🙂

References:

Power Management for Network Devices in Windows 7

Depicus

TMG 2010: Reverse Server Publishing

One of my favorite and most versatile tools in the arsenal is Microsoft Forefront Threat Management Gateway (TMG). Although “ISA” still slips out occasionally, TMG is as ISA was, a firewall, a proxy, a router, and on my short list of solutions to use if traditional methods of routing between networks is not possible, for whatever reason.

Here is a scenario: I have a private L2 network, not reachable from or connected to the corporate network. The private network, unfortunately, uses an IP scheme also in use somewhere on the corporate network. So I cannot simply attach this network to the corporate LAN directly and changing the private network architecture is just not an option right now. Enter the ultimate software network bridge: TMG. But the plot thickens… I have resources on the private side that I want to make available to the corporate network, namely a VDI pool and a Kace web server.

Planning and Design

First and foremost, I need to protect the corporate network. Nothing that goes on in the private network should be able to negatively impact the corporate LAN. This is just a common-sense precaution. With that in mind, the private network will be considered “outside” and firewalled to protect the “inside” LAN. I only have 2 networks in this scenario so a simple perimeter network configuration with 2 interfaces will suffice. My TMG server will be virtual, of course, hosted on an ESXi server with physical connections to both networks. The guest OS will be Server 2008 R2 running TMG 2010, SP1. Because the private network will be considered “the internet”, for all intensive purposes, I will have to reverse publish/ proxy those resources I wish to make available to the internal network. Here is the topology I will be implementing:

This design is typically opposite of what you would normally do with TMG, where you might publish OWA or some other internal resource to the internet. I’ve also used TMG to route specific protocols between otherwise disconnected networks. The end result of this architecture is that users on the LAN side will be able to access the published resources on the “outside”. For this exercise I will be reusing the TMG server’s inside IP address to publish my web server. So users inside will be able to browse to http://10.1.1.50 and be able to view the website hosted on 192.168.1.10.

Setup

All the basic ISA/ TMG setup best practices still apply. Define the networks your internal and external NICs connect to, use a gateway and DNS on only one (the other will be static routed), disable all protocol stack services on the outside interface (NetBIOS, etc), set the NIC binding order properly, etc. These practices are all very well documented so I won’t cover them here. There is also nothing special to configure in the TMG network rules. You don’t need to create a External to Internal route or NAT rule. The magic is all in the publishing rule.

First, create a new Web Site Publishing Rule from within the Firewall Policy tab. Give it a name and set the rule action to Allow. Publish a single web site and decide if you want SSL to protect the server connection between TMG and the web server. For this exercise I will not be using SSL.

Because DNS on the LAN side does not resolve anything in the private network, we need to specify an IP address for the internal web site. Enter the site name, check the box to use an IP address and enter it. We can’t use this site name because it isn’t resolvable but the wizard requires it, we will change it once the setup is complete.

We don’t need to enter a specific path for this website and accept requests for any domain name. Next we need to create a web listener, click New and enter a name. Choose do not require SSL and select the Internal network to listen for incoming web requests.

Select No Authentication and finish to complete the listener setup. Once complete the listener should look like this:

TMG has the ability to pass authentication requests made by published web servers to clients. In this case we don’t need this so ensure that delegation and direct client authentication are disabled. This rule will apply to all users. Once the base rule is created there are still a few tweaks to be made before it will work as intended.

Open the new rule you just created. On the “To” tab, change the published site entry to the IP address and delete the IP in the second field. Ensure that requests are set to appear to come from TMG.

On the “Public Name” tab, change the rule to apply to “requests for the following sites” and enter TMG’s inside IP address.

Now the secret sauce: Link Translation. Without this, the home page may load but every link a user would click on the website would be directed back to the source web server’s IP address. These requests will fail because there is no routing from the corporate LAN to the private network. Link translation will replace the true source IP address mapping with the TMG server’s IP address. On the Link translation tab, click apply link translation to this rule, then click configure.

The mappings button will show you all translations applied to that rule in a web browser report. At this point, after clicking apply, you should be able to test the rule by clicking the Test Rule button. This will initiate a path ping to ensure the destination is reachable as configured. Before it will take effect, you will need to apply the rule in TMG. Now you can test by accessing the website on the corporate LAN side.

Publishing the VDI host will be similar using standard http/https to access the resources. Link translation will again be the key there to ensure that everything that is published through TMG will appear to be hosted by TMG to the end user.

Ooma Telo Review

This post originated at the blog: Exit the Fast Lane

Voice Over IP (VOIP) has been a hot topic for those in the know for some time now, but is becoming increasingly more consumer targeted for its obvious benefits. Plain Old Telephone Service (POTS) lines are going the way of the dinosaurs with few unique exceptions that just don’t work well with VOIP yet. I started my VOIP journey with Cisco at work and Vonage at home around 7-8 years ago. Vonage was still an up and comer then with its ability to provide phone service right over your home internet connection. All you needed was a service contract and a Vonage supplied router to make calls. The problem with Vonage is that they have become more like a traditional phone company with their call plans, although still much cheaper than a traditional POTS line + long distance. Near the end of my experience with Vonage, the call quality was terrible which I chalked up to wireless interference. In my new house I didn’t want a home phone at all, but was eventually convinced by my wife so I sought out for something better.

During my search I came across a “next gen” VOIP phone company called Ooma. With staggeringly positive reviews all over the internet, Ooma has an interesting proposition: buy their router and make unlimited free calls nation-wide while only paying the mandatory federal/local fees which, for me, will equate to ~$3.50/month (911 included). That’s right, they don’t do phone plans with minute options like Vonage. The router is $200 and can be purchased from Amazon. By my math, based on what I was paying monthly with Vonage, this solution will pay for itself in 10 months. Ooma offers a Premier service for $10/month extra that adds feature like an instant second line, 3-way conferencing, and additional voicemail options. Important to note that this is an OPT-OUT service so they will automatically enroll you and start billing after a free 60-day trial period. My current feeling is that the basic services are more than ample so I plan to opt out. Ooma offers number porting services for $40/number if you wish to keep your current digits.

Installation

Setup is incredibly easy and if you’ve used a service like Vonage, this isn’t much different. The voice router needs to be added to your network via Ethernet so it can reach the internet and your phone will plug directly into it. I prefer to use wireless phones in which there is a single central base with many wireless satellites. I only plug the “master” base into the Ooma router. This method achieves maximum flexibility by overcoming any house phone wiring limitations and is essentially no different than how you would do it with any other type of phone service. I choose to use 2 firewalls in my home environment and put my voice router in the “DMZ” segment, which I also did with Vonage. Here is my setup in simplified form:

Once you first plug in and power on your Ooma router it will automatically pull the latest firmware which will take a few minutes, during which time you will see the bottom row of button lights flashing. Once the updates are complete, log into the ooma activation site and complete the set up of your device. Once activated you will create a login on the My Ooma site where you will find a wizard that will step you through the rest of the set up process. Number porting can take some time but they’ll give you a temp number to use in the meantime. Or if you opted to create a new number you should now get a dial tone and will be able to make calls. Ooma uses a very special and fancy dial tone. 😉

Features

Aesthetically the device is quite elegant. The face is coated with a soft rubberized material that is quite pleasing to the touch. All edges are rounded and the bottom is finished with a high-gloss piano black plastic. The button lights can be made brighter or turned off altogether. Voicemail can be accessed directly on the device like an old-school answering machine, via the phone, or via the website.

The My Ooma website is where you’ll make all service configuration changes. The dashboard is still a work in progress but you can see your voicemails, setup progress, and stats from here. Clicking “call logs” on the left will reveal detailed information for all call activity. What’s really neat is that from this view you can white or blacklist any number in your history (premier feature)! I’ll show you blacklisting in just a second. The Voicemail and Contacts areas are fairly self-explanatory.

Clicking the “Preferences” button up top will reveal the meat of the configurable options. Under Voicemail you have the option to control how many rings before a call goes to voicemail as well as whether to send email or SMS notifications including audio attachments.

Now on to my favorite premier feature: Blacklisting. This is a great feature that allows you to completely control who calls your house and how those calls are dealt with. Send a blocked caller a disconnected number message or just let the line ring continuously. You can use the community list which is Ooma’s list of telemarketers or control your own. Many may find the $10/month premier price worth it for this feature alone. But you should know that this can be done for free in Google Voice. GV adds another layer to your overall voice solution but the value is becoming more and more compelling.

Call forwarding does what it says in the traditional sense or you can enable multi-ring for one device like a cell phone. You can also manage multiple phone numbers, your ring pattern, as well as play with some [currently] experimental Google Voice and iPhone integration.

Privacy settings control your ability to block the outbound caller-ID display of your number plus anonymous call block.

Everything else under preferences in inconsequential. Under the Account tab at the top you manage billing, account, and services information. Take note that this is where you go to opt-out of the automatic Premier services upgrade.

Available add-ons include the premier service, international calling, warranty extensions, and a few others.

Conclusion

Overall I am extremely impressed with Ooma. I just finished a Webex training class in which I was dialed into a conference call from literally 9-5 for 4 days. The call quality was excellent and my calls didn’t drop once! Since I’m using the same Panasonic DECT phones I had in my other house, I am left to believe that my bad call quality experience with Vonage was due to their router. All of my other gear is the same. Ooma looks and feels like a very polished and mature product in form and function. The home network setup is an effortless process and the web portal is feature-rich with Ooma making visible improvements to enhance the user experience. As long as you commit to use Ooma for a few years, the $200 buy-in along with $3/mo fees will be well worth your while. My only complaint is having to opt-out of the premier service and it not being immediately clear which features are basic vs premium.