SAN – Exit | the | Fast

Clustering Server 2012 R2 with iSCSI Storage

Yay, last post of 2014! Haven’t invested in the hyperconverged Software Defined Storage model yet? No problem, there’s still time. In the meanwhile, here is how to cluster Server 2012 R2 using tried and true EqualLogic iSCSI shared storage.

EQL Group Manager

First, prepare your storage array(s), by logging into EQL Group Manager. This post assumes that your basic array IP, access and security settings are in place. Set up your local CHAP account to be used later. Your organization’s security access policies or requirements might dictate a different standard here.

Create and assign an Access Policy to the VDS/VSS in Group Manager otherwise this volume will not be accessible. This will make subsequent steps easier when it’s time to configure ASM.

Create some volumes in Group Manager now so you can connect your initiators easily in the next step. It’s a good idea to create your cluster quorum LUN now as well.

Host Network Configuration

First configure the interfaces you intend to use for iSCSI on your cluster nodes. Best practice says that you should limit your iSCSI traffic to a private Layer2 segment, not routed and only connecting to the devices that will participate in the fabric. This is no different from Fiber Channel in that regard, unless you are using a converged methodology and sharing your higher bandwidth NICs. If using Broadcom NICs you can choose Jumbo Frames or hardware offload, the larger frames will likely net a greater performance impact. Each host NIC used to access your storage targets should have a unique IP address able to access the network of those targets within the same private Layer2 segment. While these NICs can technically be teamed using the native Windows LBFO mechanism, best practice says that you shouldn’t, especially if you plan to use MPIO to load balance traffic. If your NICs will be shared (not dedicated to iSCSI alone) then LBFO teaming is supported in that configuration. To keep things clean and simple I’ll be using 4 NICs, 2 dedicated to LAN, 2 dedicated to iSCSI SAN. Both LAN and SAN connections are physically separated to their own switching fabrics as well, this is also a best practice.

MPIO – the manual method

First, start the MS iSCSI service, which you will be prompted to do, and check its status in PowerShell using get-service –name msiscsi.

Next, install MPIO using Install-WindowsFeature Multipath-IO

Once installed and your server has been rebooted, you can set additional options in PowerShell or via the MPIO dialog under File and Storage Services—> Tools.

Open the MPIO settings and tick “add support for iSCSI devices” under Discover Multi-Paths. Reboot again. Any change you make here will ask you to reboot. Make all changes once so you only have to do this one time.

The easier way to do this from the onset is using the EqualLogic Host Integration Tools (HIT Kit) on your hosts. If you don’t want to use HIT for some reason, you can skip from here down to the “Connect to iSCSI Storage” section.

Install EQL HIT Kit (The Easier Method)

The EqualLogic HIT Kit will make it much easier to connect to your storage array as well as configure the MPIO DSM for the EQL arrays. Better integration, easier to optimize performance, better analytics. If there is a HIT Kit available for your chosen OS, you should absolutely install and use it. Fortunately there is indeed a HIT Kit available for Server 2012 R2.

Configure MPIO and PS group access via the links in the resulting dialog.

In ASM (launched via the “configure…” links above), add the PS group and configure its access. Connect to the VSS volume using the CHAP account and password specified previously. If the VDS/VSS volume is not accessible on your EQL array, this step will fail!

Connect to iSCSI targets

Once your server is back up from the last reboot, launch the iSCSI Initiator tool and you should see any discovered targets, assuming they are configured and online. If you used the HIT Kit you will already be connected to the VSS control volume and will see the Dell EQL MPIO tab.

Choose an inactive target in the discovered targets list and click connect, be sure to enable multi-path in the pop-up that follows, then click Advanced.

Enable CHAP log on, specify the user/pw set up previously:

If your configuration is good the status of your target will change to Connected immediately. Once your targets are connected, the raw disks will be visible in Disk Manager and can be brought online by Windows.

When you create new volumes on these disks, save yourself some pain down the road and give them the same label as what you assigned in Group Manager! The following information can be pulled out of the ASM tool for each volume:

Failover Clustering

With all the storage pre-requisites in place you can now build your cluster. Setting up a Failover Cluster has never been easier, assuming all your ducks are in a row. Create your new cluster using the Failover Cluster Manager tool and let it run all compatibility checks.

Make sure your patches and software levels are identical between cluster nodes or you’ll likely fail the clustering pre-check with differing DSM versions:

Once the cluster is built, you can manipulate your cluster disks and bring any online as required. Cluster disks will not be able to be brought online until all nodes in the cluster can access the disk.

Next add your cluster disks to Cluster Shared Volumes to enable multi-host read/write and HA.

The new status will be reflected once this change is made.

Configure your Quorum to use the disk witness volume you created earlier. This disk does not need to be a CSV.

Check your cluster networks and make sure that iSCSI is set to not allow cluster network communication. Make sure that your cluster network is setup to allow cluster network communication as well as allowing client connections. This can of course be further segregated if desired using additional NICs to separate cluster and client communication.

Now your cluster is complete and you can begin adding HA VMs, if using Hyper-V, SQL, File or other roles as required.

References:

http://blogs.technet.com/b/keithmayer/archive/2013/03/12/speaking-iscsi-with-windows-server-2012-and-hyper-v.aspx

http://blogs.technet.com/b/askpfeplat/archive/2013/03/18/is-nic-teaming-in-windows-server-2012-supported-for-iscsi-or-not-supported-for-iscsi-that-is-the-question.aspx

Dell XC Series – Product Architectures

Hyperconverged Web-scale Software Defined Storage (SDS) solutions are white hot right now and Nutanix is leading the pack with their ability to support all major hypervisors (vSphere, Hyper-V and KVM) while providing nearly unlimited scale. Dell partnering with Nutanix was an obvious mutually beneficial choice for the reasons above plus supplying a much more robust server platform. Dell also provides a global reach for services and support as well as solving other challenges such as hypervisors installed in the factory.

Nutanix operates below the hypervisor layer so as a result requires a lot of tightly coupled interaction with the hardware directly. Many competing platforms in this space sit above the hypervisor so require vSphere, for example, to provide access to storage and HA but they are also limited by the hypervisor’s limitations (scale). Nutanix uses its own algorithm for clustering and doesn’t rely on a common transactional database which can cause additional challenges when building solutions that span multiple sites. Because of this the Nutanix Distributed Filesystem (NDFS) supports no known limits of scale. There are current Nutanix installations in the thousands of nodes across a contiguous namespace and now you can build them on Dell hardware.

Along with the Dell XC720xd appliances, we have released a number of complementary workload Product Architectures to help customers and partners build solutions using these new platforms. I’ll discuss the primary architectural elements below.

Wyse Datacenter Appliance Architecture for Citrix

Wyse Datacenter Appliance Architecture for VMware

Wyse Datacenter Appliance Architecture for vWorkspace

Nutanix Architecture

Three nodes minimum are required for NDFS to achieve quorum so that is the minimum solution buy in, then storage and compute capacity can be increased incrementally beyond by adding one or more nodes to an existing cluster. The Nutanix architecture uses a Controller VM (CVM) on each host which participates in the NDFS cluster and manages the hard disks local to its own host. Each host requires two tiers of storage: high performance/ SSD and capacity/ HDD. The CVM manages the reads/writes on each host and automatically tiers the IO across these disks. A key value proposition of the Nutanix model is data locality which means that the data for a given VM running on a given host is stored locally on that host as apposed to having reads and writes crossing the network. This model scales indefinitely in a linear block manner where you simply buy and add capacity as you need it. Nutanix creates a storage pool that is distributed across all hosts in the cluster and presents this pool back to the hypervisor as NFS or SMB.

You can see from the figure below that the CVM engages directly with the SCSI controller through which it accesses the disks local to the host it resides. Since Nutanix sits below the hypervisor and handles its own clustering and data HA, it is not dependent upon the hypervisor to provide any features nor is it limited by any related limitations.

From a storage management and feature perspective, Nutanix provides two tiers of optional deduplication performed locally on each host (SSD and HDD individually), compression, tunable replication (number of copies of each write spread across disparate nodes in the cluster) and data locality (keeps data local to the node the VM lives on). Within a storage pool, containers are created to logically group VMs stored within the namespace and enable specific storage features such as replication factor and dedupe. Best practice says that a single storage pool spread across all disks is sufficient but multiple containers can be used. The image below shows an example large scale XC-based cluster with a single storage pool and multiple containers.

While the Nutanix architecture can theoretically scale indefinitely, practicality might dictate that you design your clusters around the boundaries of the hypervisors, 32 nodes for vSphere, 64 nodes for Hyper-v. The decision to do this will be more financially impactful if you separate your resources along the lines of compute and management in distinct SDS clusters. You could also, optionally, install many maximum node hypervisor clusters within a single very large, contiguous Nutanix namespace, which is fully supported. I’ll discuss the former option below as part of our recommended pod architecture.

Dell XC720xd platforms

For our phase 1 launch we have five platforms to offer that vary in CPU, RAM and size/ quantity of disks. Each appliance is 2U, based on the 3.5” 12-gen PowerEdge R720XD and supports from 5 to 12 total disks, each a mix of SSD and HDD. The A5 platform is the smallest with a pair of 6-core CPUs, 200GB SSDs and a recommended 256GB RAM. The B5 and B7 models are almost identical except for the 8-core CPU on the B5 and the 10-core CPU on the B7. The C5 and C7 boast a slightly higher clocked 10-core CPU with doubled SSD densities and 4-5x more in the capacity tier. The suggested workloads are specific with the first three targeted at VDI customers. If greater capacity is required, the C5 and C7 models work very well for this purpose too.

For workload to platform sizing guidance, we make the following recommendations:

Platform	Workload	Special Considerations
A5	Basic/ light task users, app virt	Be mindful of limited CPU cores and RAM densities
B5	Medium knowledge workers	Additional 4 cores and greater RAM to host more VMs or sessions
B7	Heavy power users	20 cores per node + a recommended 512GB RAM to minimize oversubscription
C5	Heavy power users	Higher density SSDs + 20TB in the capacity tier for large VMs or amount of user data
C7	Heavy power users	Increased number of SSDs with larger capacity for greater amount of T1 performance

Here is a view of the 12G-based platform representing the A5-B7 models. The C5 and C7 would add additional disks in the second disk bay. The two disks in the rear flexbay are 160GB SSDs configured in RAID1 via PERC used to host the hypervisor and CVM, these disks do not participate in the storage pool. The six disks in front are controlled by the CVM directly via the LSI controller and contribute to the distributed storage pool across all nodes.

Dell XC Pod Architecture

This being a 10Gb hyperconverged architecture, the leaf/ spine network model is recommended. We do recommend a 1Gb switch stack for iDRAC/ IPMI traffic and build the leaf layer from 10Gb Force10 parts. The S4810 is shown in the graphic below which is recommended for SFP+ based platforms or the S4820T can be used for 10GBase-T.

In our XC series product architecture, the compute, management and storage layers, typically all separated, are combined here into a single appliance. For solutions based on vWorkspace under 10 nodes, we recommend a “floating management” architecture which allows the server infrastructure VMs to move between hosts also being used for desktop VMs or RDSH sessions. You’ll notice in the graphics below that compute and management are combined into a single hypervisor cluster which hosts both of these functions.

Hyper-V is shown below which means the CVMs present the SMBv3 protocol to the storage pool. We recommend three basic containers to separate infrastructure mgmt, desktop VMs and RDSH VMs. We recommend the following feature attributes based on these three containers (It is not supported to enable compression and deduplication on the same container):

Container	Purpose	Replication Factor	Perf Tier Deduplication	Capacity Tier Deduplication	Compression
Ds_compute	Desktop VMs	2	Enabled	Enabled	Disabled
Ds_mgmt	Mgmt Infra VMs	2	Enabled	Disabled	Disabled
Ds_rdsh	RDSH Server VMs	2	Enabled	Enabled	Disabled

You’ll notice that I’ve included the resource requirements for the Nutanix CVMs (8 x vCPUs, 32GB vRAM). The vRAM allocation can vary depending on the features you enable within your SDS cluster. 32GB is required, for example, if you intend to enable both SSD and HDD deduplication. If you only require SSD deduplication and leave the HDD tier turned off, you can reduce your CVM vRAM allocation to 16GB. We highly recommend that you disable any features that you do not need or do not intend to use!

For vWorkspace solutions over 1000 users or solutions based on VMware Horizon or Citrix XenDesktop, we recommend separating the infrastructure management in all cases. This allows management infrastructure to run in its own dedicated hypervisor cluster while providing very clear and predictable compute capacity for the compute cluster. The graphic below depicts a 1000-6000 user architecture based on vWorkspace on Hyper-V. Notice that the SMB namespace is stretched across both of the discrete compute and management infrastructure clusters, each scaling independently. You could optionally build dedicated SDS clusters for compute and management if you desire, but remember the three node minimum, which would raise your minimum build to 6 nodes in this scenario.

XenDesktop on vSphere, up to 32 nodes max per cluster, supporting around 2500 users in this architecture:

Horizon View on vSphere, up to 32 nodes max per cluster. supporting around 1700 users in this architecture:

Network Architecture

Following the leaf/ spine model, each node should connect 2 x 10Gb ports to a leaf switch which are then fully mesh connected to an upstream set of spine switches.

On each host there are two virtual switches: one for external access to the LAN and internode communication and one private internal vSwitch used for the CVM alone. On Hyper-V the two NICs are configured in a LBFO team on each host with all management OS vNICs connected to it.

vSphere follows the same basic model except for port groups configured for the VM type and VMKernel ports configured for host functions:

Performance results

The tables below summarize the user densities observed for each platform during our testing. Please refer to the product architectures linked at the beginning of this post for the detailed performance results for each solution.

Resources:

http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2014/11/05/dell-world-two-questions-cloud-client-computing

http://blogs.citrix.com/2014/11/07/dell-launches-new-appliance-solution-for-desktop-virtualization/

http://blogs.citrix.com/2014/11/10/xendesktop-technologies-introduced-as-a-new-dell-wyse-datacenter-appliance-architecture/

http://blogs.vmware.com/euc/2014/11/vmware-horizon-6-dell-xc-delivering-new-economics-simplicity-desktop-application-virtualization.html

http://stevenpoitras.com/the-nutanix-bible/

http://www.dell.com/us/business/p/dell-xc-series/pd

vSphere 5.5: Navigating the Web Client

As I called out in my vSphere 5.5 upgrade post, the vSphere Client is now deprecated in 5.5 so in preparation of an inevitable future, I’m forcing myself to use the Web Client to gain familiarity. Turns out there was way more moving around than I initially thought so I’m selfishly documenting a few pertinent items that seemed less than intuitive to me my first time through. Some things are just easier to do in the legacy vSphere Client, or maybe I’m just too accustomed after 3 generations of ESX/i. In any case, I encourage you to use the web client as well and hopefully these tips will help.

Topics covered in this post:

How to configure iSCSI software adapters
How to add datastores
How to manage multipathing
How to rename an ESXi host
Cool changes in Recent Tasks pane
Traffic Shaping
Deploying vCenter Operations Manager (vCOps)

How to configure iSCSI software adapters:

This assumes that the preliminary steps of setting up your storage array and requisite physical networking have already been properly completed. The best and easiest way to do this is via dedicated switches and server NICs for iSCSI in a Layer2 switch segment. Use whatever IP scheme you like, this should be a closed fabric and there is no reason to route this traffic.

First things first, if you don’t have a software iSCSI adapter created on your hosts, create one in the Storage Adapters section of Storage Management for a particular ESXi host. Once created, it will appear in the list below. A quick note on software vs hardware iSCSI initiators. Physical initiators can generally do iSCSI offload OR jumbo frames. not both. We have seen the use of jumbo frames to be more impactful to performance than iSCSI offload, so software initiators with jumbo frames enabled is the preferred way to go here.

Click over to the Networking tab and create a new vSwitch with a VMkernel Network Adapter for iSCSI.

Choose the physical adapters to be used in this vSwitch, create a useful network Port Group label such as iSCSI-1 and assign an IP address that can reach the storage targets. Repeat this process and add a second VMkernel adapter to the same vSwitch. Configure your VMK ports to use apposing physical NICs. This is done by editing the port group settings and changing the Failover order. This allows you to cleanly share 2 physical NICs for 2 iSCSI connections within a single vSwitch.

In my case VMK2 is active on vmnic3 and VMK3 is active on vmnic1 providing physical path redundancy to the storage array.

When all is said and done, your vSwitch configuration should look something like this:

Next under the iSCSI software adapter, add the target IP to your storage (group IP for EqualLogic). Authentication needs and requirements will vary between organizations. Choose and configure this scheme appropriately for your environment. For my lab, I scope connections based on subnet alone which defines the physical Layer2 boundary of my iSCSI fabrics.

Next configure the network port binding to ensure that the port groups you defined earlier get bound to the iSCSI software adapter using the proper physical interfaces.

At this point, if you have any volumes created on your array and presented to your host, a quick rescan should reveal the devices presented to your host as LUNs.

You should also see 2 paths per LUN (device) per host based on 2 physical adapters connecting to your array. EqualLogic is an active/passive array so only connections to the active controller will be seen here.

If you run into trouble making this work after these steps, jump over to the vSphere Client which does make this process a bit easier. Also keep in mind that all pathing will be set to Fixed by default. See my How to manage multipathing topic below for guidance on changing this.

iSCSI works very well with jumbo frames which is an end-to-end Layer2 technology, so makes sure a MTU of 9000 is set on all ESXi iSCSI vSwitches, VMK ports, as well as the NICs on the storage array. Your switches must be capable of supporting jumbo frames as well. This will increase the performance of your iSCSI network and front-end storage operation speeds.

How to add datastores:

Once your new datastore has been provisioned from your storage platform and presented to your ESXi hosts, from the Hosts and Clusters view, navigate to Related Objects then datastores. From here click the Create a new Datastore button.

Choose the host or cluster to add the datastore to, choose whether it is NFS or VMFS, name the datastore and choose a host that can see it. You should see the raw LUN in the dialog below.

Choose the VMFS version and any partition options you want to implement. Confirm and deploy.

If presenting to multiple hosts, once the VMFS datastore is created and initialized on one, they all should see it assuming the raw device is present via a previous adapter rescan.

How to manage multipathing:

From the Hosts and clusters view, click the Storage tab, choose the datastore you want to manage, click Manage in the middle pane then click Connectivity and Multipathing under Settings.

Alternatively, from the Hosts and Clusters view (from any level item), navigate to Related Objects, then Datastores. Either click the volume you want to edit or choose Settings from the dropdown. Either method will get you to the same place.

From the datastore Settings page, click Manage and under Settings (once again) click Connectivity and Multipathing. In the middle of the screen you should see all hosts attached to whatever datastore you selected. Clicking on each host will reveal the current Path Selection Policy below, “Fixed” by VMware default along with the number of paths present per host.

To change this to Round Robin, click Edit Multipathing, change the Path Selection Policy, repeat for each host connected to the datastore.

How to rename an ESXi host:

Renaming hosts is one area that the Web Client has made significantly easier (once you figure out where to go)! Select a host from the Hosts and Clusters view, click Manage, click Networking, then TCP/IP Configuration below.

From the DNS Configuration menu, select “Enter settings manually”, put whatever hostname you would like here.

VMware recommends putting a host in maintenance mode and disconnecting it from vCenter before doing this. I did this hot with my host active in an HA cluster with zero ill affects. I did it a few times just to make sure. The other way to do this is via the CLI. Connect to your ESXi host via SSH, vMA or vCLI and run:

esxcli system hostname set –host=hostname

Cool changes in Recent Tasks pane:

Not only is the Recent Tasks pane off to the right now, which I really like, it breaks out tasks by All, Running and Failed individually for easier viewing, including the ability to view your own tasks for environments with many admins. Previously these tasks were all lumped together and longer running tasks would get buried in the task stream.

The Recent Tasks pane also provides a new and better method to deal with pop-up configuration dialogs. Ever start configuring something using the old vSphere Client, get 4-5 clicks deep in the pop-up configuration, then realize you need some other piece of information requiring you to cancel out so you can go back to some other area in vCenter? This problem is now resolved in the web client with a cool new section of the Tasks pane called Work in Progress. It doesn’t matter what you’re doing or how far along you are in the dialog. If you need to break away for any reason, you can simply minimize the pop up and come back to it later. These minimized pop-ups will show in the Work in Progress pane below recent tasks.

The example here shows 3 concurrent activities in various states: a vMotion operation, a VM edit settings and a clone operation of the same VM even. Any activity that generates a pop-up dialog can be set aside and picked up again later. This is a huge improvement over the legacy vSphere Client. Very cool!!

Traffic Shaping:

It appears that in the web client you can only apply traffic shaping at the vSwitch level, not at an individual port group or VMK level. Here you can see shaping available for the standard vSwitch:

These settings, while viewable in the VMK policies summary, are not changeable (that I can see).

To override the vSwitch shaping policy and apply one to an individual port group or VMK, you have to use the legacy vSphere Client. Not sure if this is an oversight on VMware’s part or yet another sign of things to come requiring dvSwitching to assign shaping policies below the vSwitch level.

Deploying vCenter Operations Manager (vCOps):

Made extremely easy in vSphere 5.5 via the web client is the deployment of the incredible vCOps vApp for advanced monitoring of your environment. VMware has made detailed performance monitoring of your vSphere world incredibly simply and intuitive through this easy to set up and use vApp. Really impressive. From the home page, click vCenter Operations Management.

On the Getting Started screen, click Deploy vCOps. If you have a valid vmware.com login, entire it here to download the OVF and related files for deployment. You can alternatively point to the file locally if you have it already.

Accept the EULAs and choose all the placement and sizing options for the VM.

A word of caution, do not expect DRS to make a host placement decision for you here during the deployment. The wizard will allow you to select your cluster as a resource destination but the deployment will ultimately fail. Choose a specific host to deploy the VM to instead.

The requisite files will be downloaded from VMware directly and deployed to your environment. Off to the races!

Once deployed, you’ll see 2 new VMs running under the vCOps vApp object in your datacenter.

Once the VMs are powered on and the vApp has been started, you should see new options under vCenter Operations Manager.

First, click the Configure link to open the admin site in a web page. The default login for the admin account is admin/ admin, for root the password is vmware. Configure the initial setup to point to vCenter and the analytics VM which it should detect. Install the certificates as prompted and continue through the registration process.

Once complete, return to the vCOps page in vCenter and click Open, a new web page will launch for you to consume the vCOps goodness. After a short while performance stats should start pouring in for everything in your vSphere environment. Usage patterns and workload profiles can be identified so appropriate adjustments can be made. What you do from here with the data collected is entirely up to you. 🙂

A couple more screens just to show you the capability of vCOps, since I like it so much. Storage at the datastore view:

VM performance view:

Performance Considerations for Enterprise VDI

[This post references portions of published test results from various Dell DVS Enterprise reference architectures. We design, build, test and publish E2E enterprise VDI architectures, SKU’d and sold globally. Head over here and have a look for more information.]
There are four core elements that we typically focus on for performance analysis of VDI: CPU, memory, disk, and network. Each plays a uniquely integral role in the system overall with the software in play defining how each element is consumed and to what extent. In this post I’ll go over some of the key considerations when planning an enterprise VDI infrastructure.

CPU

First things first, no matter what you’ve heard or read, the primary VDI bottleneck is CPU. We in CCC/ DVS at Dell prove this again and again, across all hypervisors, all brokers and any hardware configuration. There are special use case caveats to this of course, but generally speaking, your VDI environment will run out of compute CPU before it runs out of anything else! CPU is a finite shared resource unlike memory, disk or network which can all be incrementally increased or adjusted. There are many special purpose vendors and products out there that will tell you the VDI problem is memory or IOPS, those can be issues but you will always come back to CPU.
Intel’s Ivy Bridge is upon us now, delivering more cores at higher clocks with more cache and supporting faster memory. It is decidedly cheaper to purchase a more expensive pair of CPUs then it is to purchase an entire additional server. For that reason, we recommend running [near] top bin CPUs in your compute hosts as we see measurable benefit in running faster chips. For management hosts you can get by with a lower spec CPU but if you want to get the best return on investment for your compute hosts and get as many users as you can per host, buy the fast CPUs! Our recommendation in DVS enterprise will be following the lateral succession to Ivy Bridge from the Sandy Bridge parts we previously recommended: Xeon E5-2690v2 (IB) vs E5-2690 (SB).
The 2690v2 is a 10 core part using a 22nm process with a 130w TDP clocking in at 3.0GHz and supporting up to 1866MHz DDR3 memory. We tested the top of the line E5-2697v2 (12 cores) as well as the faster 1866MHz memory and saw no meaningful improvement in either case to warrant a core recommendation. It’s all about the delicate balance of the right performance for the right price.
There is no 10c part in the AMD Opteron 6300 line so the closest competitor is the Opteron 6348 (Piledriver). As has always been the case, the AMD parts are a bit cheaper and run more efficiently. AMD clocks lower (with turbo) and due to no hyperthreading feature, executes fewer simultaneous threads. The 6348 also only supports 1600MHz memory but provides a few additional instruction sets. Both run 4 memory channels with an integrated memory controller. AMD also offers a 16c part at its top end in the 6386SE. I have no empirical data on AMD vs Intel for VDI at this time.
Relevant CPU spec comparison, our default selection for DVS Enterprise highlighted in red:

Performance analysis:

To drive this point home regarding the importance of CPU in VDI, here are 2 sets of test results published in my reference architecture for DVS Enterprise on XenDesktop, one on vSphere, one of Hyper-V, both based on Sandy Bridge (we haven’t published our Ivy Bridge data yet). MCS vs PVS is another discussion entirely but in either case, CPU is always the determining factor of scale. These graphs are based on tests using MCS and R720’s fitted with 2 x E5-2690 CPUs and 192GB RAM running the LoginVSI Light workload.
Hyper-V:
The 4 graphs below tell the tale fairly well for 160 concurrent users. Hyper-V does a very good job of optimizing CPU while consuming slightly higher amounts of other resources. Network consumption, while very reasonable and much lower than you might expect for 160 users, is considerably larger than in the vSphere use case in the next example. Once steady state has been reached, CPU peaks right around 85% which is the generally accepted utilization sweet spot making the most of your hardware investment while leaving head room for unforeseen spikes or temporary resource consumption. Memory in use is on the high side given the host had 192GB, but that can be easily remedied by raising to 256GB.

vSphere:

Similar story for vSphere, although the user density below is representative of only 125 desktops of the same user workload. This speaks to another trend we are seeing more and more of which is a stark CPU performance reduction of vSphere compared to Hyper-V for non-View brokers. 35 fewer users overall here but disk performance is also acceptable. In the example below, CPU spikes slightly above 85% at full load with disk performance and network consumption well within reasonable margins. Where vSphere really shines is in it’s memory management capabilities thanks to features the likes of Transparent Page Sharing, as you can see the active memory is quite a bit lower than what has actually been assigned.

Are 4 sockets better than 2?

Not necessarily. 4-socket servers, such as the Dell PowerEdge R820, use a different mutli-processor (MP) CPU architecture, currently based on Sandy Bridge EP (E5-4600 family) versus the dual processor (DP) CPU architectures of its dual socket server counterparts. MP CPUs and their 4-socket servers are inherently more expensive, especially considering the additional RAM required to run whatever additional user densities. 2 additional CPUs does not mean twice the user density in a 4-socket platform either! A similarly configured 2-socket server is roughly half the cost of a 4-socket box and it is for this reason that we recommend the Dell PowerEdge R720 for DVS Enterprise. You will get more users on 2 x R720s cheaper than if you ran a single R820.

Memory

Memory architecture is an important consideration for any infrastructure planning project. Our experience shows that VDI appears to be less sensitive to memory bandwidth performance than other enterprise applications. Besides overall RAM density per host, DIMM speed and loading configuration are important considerations. In Sandy and Ivy Bridge CPUs, there are 4 memory channels, 3 DIMM slots each, per CPU (12 slots total). Your resulting DIMM clock speed and total available bandwidth will vary depending on how you populate these slots.

As you can see from the table below, loading all slots on a server via 3 DPC (3 DIMMs per channel) will result in a forced clock reduction to either 1066 or 1333 depending on the DIMM voltage. If you desire to run at 1600MHz or 1866Mhz (Ivy) you cannot populate the 3rd slot per channel which will net 8 empty DIMM slots per server. You’ll notice that the higher memory clocks are also achievable using lower voltage RDIMMs.

Make sure to always use the same DIMM size, clock and slot loading to ensure a balanced configuration. To follow the example of 256GB in a compute host, the proper loading to maintain maximum clock speeds and 4-channel bandwidth is as follows per CPU:

If 256GB is not required, leaving the 4th channel empty is also acceptable in “memory optimized” BIOS mode although it does reduce the overall memory bandwidth by 25%. In our tests with the older sandy bridge E5-2690 CPUs, we did not find that this affected desktop VM performance.

Disk

There are 3 general considerations for storage that vary depending on the requirements of the implementation: capacity, performance and protocol.
Usable capacity must be sufficient to meet the needs of both Tier1 and Tier2 storage requirements which will differ greatly based on persistent or non-persistent desktops. We generally see an excess of usable capacity as a result of the number of disks required to provide proper performance. This of course is not always the case as bottlenecks can often arise in other areas, such as array controllers. It is less expensive to run RAID10 in fewer arrays to achieve a given performance requirement, than it is to run more arrays at RAID50. Sometimes you need to maximize capacity, sometimes you need to maximize performance. Persistent desktops (full clones) will consume much more disk than their non-persistent counterparts so additional storage capacity can be purchased or a deduplication technology can be leveraged to reduce the amount of actual disk required. If using local disk, in a Local Tier 1 solution model, inline dedupe software can be implemented to reduce the amount of storage required by several fold. Some shared storage arrays have this capability built in. Other solutions, such as Microsoft’s native dedupe capability in Server 2012 R2, make use of file servers to host Hyper-V VMs via SMB3 to reduce the amount of storage required.
Disk performance is another deep well of potential and caveats again related directly to the problems one needs to solve. A VDI desktop can consume anywhere from 3 to over 20 IOPS for steady state operation depending on the use case requirements. Sufficient steady state disk performance can be provided without necessarily solving the problems related to boot or login storms (many desktop VMs being provisioned/ booted or many users logging in all at once). Designing a storage architecture to withstand boot or login storms requires providing a large amount of available IOPS capability which can be via hardware or software based solutions, neither generally inexpensive. Some products combine the ability to provide high IOPS while also providing dedupe capabilities. Generally speaking, it is much more expensive to provide high performance for potential storms than it is to provide sufficient performance for normal steady state operations. When considering SSDs and shared storage, one needs to be careful to consider the capabilities of the array’s controllers which will almost always exhaust before the attached disk will. Just because you have 50K IOPS potential in your disk shelves on the back end, does not mean that the array is capable of providing that level of performance on the front end!
There is not a tremendous performance difference between storage protocols used to access a shared array on the front end, this boils down to preference these days. Fiber Channel has been proven to outperform iSCSI and file protocols (NFS) by a small margin but performance alone is not really reason enough to choose between them. Local disk also works well but concessions may need to be made with regard to HA and VM portability. Speed, reliability, limits/ maximums, infrastructure costs and features are key considerations when deciding on a storage protocol. At the end of the day, any of these methods will work well for an enterprise VDI deployment. What features do you need and how much are you willing to spend?

Network

Network utilization is consistently (and maybe surprisingly) low, often in the Kbps/ per user. VDI architectures in and of themselves simply don’t drive a ton of steady network utilization. VDI is bursty and will exhibit higher levels of consumption during large aggregate activities such as provisioning or logins. Technologies like Citrix Provisioning Server will inherently drive greater consumption by nature. What will drive the most variance here is much more reliant on upstream applications in use by the enterprise and their associated architectures. This is about as subjective as it gets, so impossible to speculate in any kind of fashion across the board. Now you will have a potentially high number of users on a large number of hosts, so comfortable network oversubscription planning should be done to ensure proper bandwidth in and out of the compute host or blade chassis. Utilizing enterprise-class switching components that are capable of operating at line rates for all ports is advisable. Will you really need hundreds of gigs of bandwidth? I really doubt it. Proper HA is generally desirable along with adherence to sensible network architectures (core/ distribution, leaf/spline). I prefer to do all of my routing at the core leaving anything Layer2 at the Top of Rack. Uplink to your core or distribution layers using 10Gb links which can be copper (TwinAx) for shorter runs or fiber for longer runs.

In Closing

That about sums it up for the core 4 performance elements. To put a bow on this, hardware utilization analysis is fine and definitely worth doing, but user experience is ultimately what’s important here. All components must sing together in harmony to provide the proper level of scale and user experience. A combination of subjective and automated monitoring tests during a POC will give a good indication of what users will experience. At Dell, we use Stratusphere UX by Liquidware Labs to measure user experience in combination with Login Consultants LoginVSI for load generation. A personal, subjective test (actually log in to a session) is always a good idea when putting your environment under load, but a tool like Stratusphere UX can identify potential pitfalls that might otherwise go unnoticed.
Keeping tabs on latencies, queue lengths and faults, then reporting each users’ experience into a magic-style quadrant will give you the information required to ascertain if your environment will either perform as designed, or send you scrambling to make adjustments.

Unidesk: Layered VDI Management

VDI is one of the most intensive workloads in the datacenter today and by nature uses every major component of the enterprise technology stack: networking, servers, virtualization, storage, load balancing. No stone is left unturned when it comes to enterprise VDI. Physical desktop management can also be an arduous task with large infrastructure requirements of its own. The sheer complexity of VDI drives a lot of interesting and feverish innovation in this space but also drives a general adoption reluctance for some who fear the shift too burdensome for their existing teams and datacenters. The value proposition Unidesk 2.0 brings to the table is a simplification of the virtual desktops themselves, simplified management of the brokers that support them, and comprehensive application management .

The Unidesk solution plugs seamlessly into a new or existing VDI environment and is comprised of the following key components:

Management virtual appliance
Master CachePoint
Secondary CachePoints
Installation Machine

Solution Architecture

At its core, Unidesk is a VDI management solution that does some very interesting things under the covers. Unidesk requires vSphere at the moment but can manage VMware View, Citrix XenDesktop, Dell Quest vWorkspace, or Microsoft RDS. You could even manage each type of environment from a single Unidesk management console if you had the need or proclivity. Unidesk is not a VDI broker in and of itself, so that piece of the puzzle is very much required in the overall architecture. The Unidesk solution works from the concept of layering, which is increasingly becoming a hotter topic as both Citrix and VMware add native layering technologies to their software stacks. I’ll touch on those later. Unidesk works by creating, maintaining, and compositing numerous layers to create VMs that can share common items like base OS and IT applications, while providing the ability to persist user data including user installed applications, if desired. Each layer is stored and maintained as a discrete VMDK and can be assigned to any VM created within the environment. Application or OS layers can be patched independently and refreshed to a user VM. Because of Unidesk’s layering technology, customers needing persistent desktops can take advantage of capacity savings over traditional methods of persistence. A persistent desktop in Unidesk consumes, on average, a similar disk footprint to what a non-persistent desktop would typically consume.

CachePoints (CP) are virtual appliances that are responsible for the heavy lifting in the layering process. Currently there are two distinct types of CachPoints: Master and Secondary. The Master CP is the first to be provisioned during the setup process and maintains the primary copy of all layers in the environment. Master CPs replicate the pertinent layers to Secondary CPs who have the task of actually combining layers to build the individual VMs, a process called Compositing. Due to the role played by each CP type, the Secondary CPs will need to live on the Compute hosts with the VMs they create. Local or Shared Tier 1 solution models can be supported here, but the Secondary CPs will need to be able to the “CachePoint and Layers” volume at a minimum.

The Management Appliance is another virtual machine that comes with the solution to manage the environment and individual components. This appliance provides a web interface used to manage the CPs, layers, images, as well as connections to the various VDI brokers you need to interface with. Using the Unidesk management console you can easily manage an entire VDI environment almost completely ignoring vCenter and the individual broker management GUIs. There are no additional infrastructure requirements for Unidesk specifically outside of what is required for the VDI broker solution itself.

Installation Machines are provided by Unidesk to capture application layers and make them available for assignment to any VM in the solution. This process is very simple and intuitive requiring only that a given application is installed within a regular VM. The management framework is then able to isolate the application and create it as an assignable layer (VMDK). Many of the problems traditionally experienced using other application virtualization methods are overcome here. OS and application layers can be updated independently and distributed to existing desktop VMs.

Here is an exploded and descriptive view of the overall solution architecture summarizing the points above:

Storage Architecture

The Unidesk solution is able to leverage three distinct storage tiers to house the key volumes: Boot Images, CachePoint and Layers, and Archive.

Boot Images – Contains images having very small footprints and consist of a kernel and pagefile used for booting a VM. These images are stored as VMDKs, like all other layers, and can be easily recreated if need be. This tier does not require high performance disk.
CachePoint and Layers – This tier stores all OS, application, and personalization layers. Of the three tiers, this one sees the most IO so if you have high performance disk available, use it with this tier.
Archive – This tier is used for layer backup including personalization. Repairs and restored layers can be pulled from the archive and placed into the CachePoint and Layers volume for re-deployment, if need be. This tier does not require high performance disk.

The Master CP stores layers in the following folder structure, each layer organized and stored as a VMDK.

Installation and Configuration

New in Unidesk 2.x is the ability to execute a completely scripted installation. You’ll need to decide ahead of time what IPs and names you want to use for the Unidesk management components as these are defined during setup. This portion of the install is rather lengthy to it’s best to have things squared away before you begin. Once the environment variables are defined, the setup script takes over and builds the environment according to your design.

Once setup has finished, the Management appliance and Master CP will be ready, so you can log into the mgmt console to take the configuration further. Of the initial key activities to complete will be setting up an Active Directory junction point and connecting Unidesk to your VDI broker. Unidesk should already be talking to your vCenter server at this point.

Your broker mgmt server will need to have the Unidesk Integration Agent installed which you should find in the bundle downloaded with the install. This agent listens on TCP 390 and will connect the Unidesk management server to the broker. Once this agent is installed on the VMware View Connection Server or Citrix Desktop Delivery Controller, you can point the Unidesk management configuration at it. Once synchronized all pool information will be visible from the Unidesk console.

A very neat feature of Unidesk is that you can build many AD junction points from different forests if necessary. These junction points will allow Unidesk to interact with AD and provide the ability to create machine accounts within the domains.

Desktop VM and Application Management

Once Unidesk can talk to your vSphere and VDI environments, you can get started building OS layers which will serve as your gold images for the desktops you create. A killer feature of the Unidesk solution is that you only need a single gold image per OS type even for numerous VDI brokers. Because the broker agents can be layered and deployed as needed, you can reuse a single image across disparate View and XenDesktop environments, for example. Setting up an OS layer simply points Unidesk at an existing gold image VM in vCenter and makes it consumable for subsequent provisioning.

Once successfully created, you will see your OS layers available and marked as deployable.

Before you can install and deploy applications, you will need to deploy a Unidesk Installation Machine which is done quite simply from the System page. You should create an Installation Machine for each type of desktop OS in your environment.

Once the Installation Machine is ready, creating layers is easy. From the Layers page, simply select “Create Layer,” fill in the details, choose the OS layer you’ll be using along with the Installation machine and any prerequisite layers.

To finish the process, you’ll need to log into the Installation Machine, perform the install, then tell the Unidesk management console when you’re finished and the layer will be deployable to any VM.

Desktops can now be created as either persistent of non-persistent. You can deploy to already existing pools or if you need a new persistent pool created, Unidesk will take care of it. Choose the type of OS template to deploy (XP or Win7), select the connected broker to which you want to deploy the desktops, choose an existing pool or create a new one, and select the number of desktops to create.

Next select the CachePoint that will deploy the new desktops along with the network they need to connect to and the desktop type.

Select the OS layer that should be assigned to the new desktops.

Select the application layers you wish to assign to this desktop group. All your layers will be visible here.

Choose the virtual hardware, performance characteristics and backup frequency (Unidesk Archive) of the desktop group you are deploying.

Select an existing or create a new maintenance schedule that defines when layers can be updated within this desktop group.

Deploy the desktops.

Once the creation process is underway, the activity will be reflected under the Desktops page as well as in vCenter tasks. When completed all desktops will be visible and can be managed entirely from the Unidesk console.

Sample Architecture

Below are some possible designs that can be used to deploy Unidesk into a Local or Shared Tier 1 VDI solution model. For Local Tier 1, both the Compute and Management hosts will need access to shared storage, even though VDI sessions will be hosted locally on the Compute hosts. 1Gb PowerConnect or Force10 switches can be used in the Network layer for LAN and iSCSI. The Unidesk boot images should be stored locally on the Compute hosts along with the Secondary CachePoints that will host the sessions on that host. All of the typical VDI management components will still be hosted on the Mgmt layer hosts along with the additional Unidesk management components. Since the Mgmt hosts connect to and run their VMs from shared storage, all of the additional Unidesk volumes should be created on shared storage. Recoverability is achieved primarily in this model through use of the Unidesk Archive function. Any failed Compute host VDI session information can be recreated from the Archive on a surviving host.

Here is a view of the server network and storage architecture with some of the solution components broken out:

For Shared Tier 1 the layout is slightly different. The VDI sessions and “CachePoint and Layers” volumes must live together on Tier 1 storage while all other volumes can live on Tier 2. You could combine the two tiers for smaller deployments, perhaps, but your mileage will vary. Blades are also an option here, of course. All normal vSphere HA options apply here with the Unidesk Archive function bolstering the protection of the environment.

Unidesk vs. the Competition

Both Citrix and VMware have native solutions available for management, application virtualization, and persistence so you will have to decide if Unidesk if worth the price of admission. On the View side, if you buy a Premier license, you get ThinApp for applications, Composer for non-persistent linked clones, and soon the technology from VMware’s recent Wanova acquisition will be available. The native View persistence story isn’t great at the moment, but Wanova Mirage will change that when made available. Mirage will add a few layers to the mix including OS, apps, and persistent data but will not be as granular as the multi-layer Unidesk solution. The Wanova tech notwithstanding, you should be able to buy a cheaper/ lower level View license as with Unidesk you will need neither ThinApp nor Composer. Unidesk’s application layering is superior to ThinApp, with little in the way of applications that cannot be layered, and can provide persistent or non-persistent desktops with almost the same footprint on disk. Add to that the Unidesk single management pane for both applications and desktops, and there is a compelling value to be considered.

On the Citrix side, if you buy an Enterprise license, you get XenApp for application virtualization, Provisioning Services (PVS) and Personal vDisk (PVD) for persistence from the recent RingCube acquisition. With XenDesktop you can leverage Machine Creation Services (MCS) or PVS for either persistent or non-persistent desktops. MCS is deadly simple while PVS is incredibly powerful but an extraordinary pain to set up and configure. XenApp builds on top of Microsoft’s RDS infrastructure and requires additional components of its own such as SQL Server. PVD can be deployed with either catalog type, PVS or MCS, and adds a layer of persistence for user data and user installed applications. While PVD provides only a single layer, that may be more than suitable for any number of customers. The overall Citrix solution is time tested and works well although the underlying infrastructure requirements are numerous and expensive. XenApp offloads application execution from the XenDesktop sessions which will in turn drive greater overall host densities. Adding Unidesk to a Citrix stack again affords a customer to buy in at a lower licensing level, although Citrix is seemingly removing value for augmenting its software stack by including more at lower license levels. For instance, PVD and PVS are available at all licensing levels now. The big upsell now is for the inclusion of XenApp. Unidesk removes the need for MCS, PVS, PVD, and XenApp so you will have to ask yourself if the Unidesk approach is preferred to the Citrix method. The net result will certainly be less overall infrastructure required but net licensing costs may very well be a wash.

Dell DVS and Microsoft team up to deliver VDI on Windows Server 2012

My name is Peter and I am the Principal Engineering Architect for Desktop Virtualization at Dell.

The DVS team at Dell has partnered with Microsoft to launch a new product delivering VDI on Server 2012. This product was announced at the Microsoft Worldwide Partner Conference in Toronto this week and we have published a Reference Architecture detailing the solution (link). This initial release is targeting the SMB market providing support for ~500 pooled VDI desktops. The architecture can and will scale much higher but the intent was to get the ball rolling in the smaller markets.
The Dell solution stack for Remote Desktop Services (RDS) on Server 2012 can be configured in a few different ways. RDS comprises two key roles for hosting desktops: RD Session Host (RDSH), formerly Terminal Services, and RD Virtualization Host (RDVH) for pooled or personal desktops. Both of these roles can coexist on the same compute host, if desired, to provide each type of VDI technology. Your lighter users that may only need web and email access, for example, should do fine on an RDSH host in a hosted shared session model. Knowledge workers would leverage the RDVH technology, comparable to VMware View and Citrix XenDesktop, for persistent or non-persistent desktops. Windows Server 2012 provides a one-stop-shop for VDI in a robust and easy to deliver software stack.

In this solution, Windows Server Hyper-V 2012 is run on both management and compute layer hosts in the parent partition, while all management roles are enabled on dedicated VMs in child partitions. In the combined solution stack, two VMs are created on the compute host to run the RDSH and RDVH roles, respectively. Density numbers are dependent on the amount of resources given to and consumed by the RDVH VDI sessions, so scaling is highly relative and dependent on the specific use case. Only 3 management VMs are required to host the RDS environment and unlike every other VDI solution on the market, SQL Server is not required here in the early stages. If you wish to provide HA for your RD connection broker, then a SQL Server is required. Top of rack we provide the best-in-class Force10 S55 1Gb switch that includes unique features such as the Open Automation and Virtualization frameworks. For user data and management VM storage we leverage the Equallogic PS4100E iSCSI array with 12TB of raw storage. As is the case in our other DVS Enterprise offerings, the network and storage layers are optional purchases from Dell.

The base offering of this solution dedicates the RDSH and RDVH roles in the compute layer depending on customer desire. Each role type has its own per host user densities that scale independently of each other. These are very conservative densities for the first phase of this product launch and in no way represent the limit of what this platform is capable of.

We top out the solution stack with 5 total servers, 4 in the compute layer, and 1 in the management layer. The compute layer hosts can be mixed and matched with regard to RD role.

The basic guidance at this point is that if this solution architecture meets the needs of your enterprise, fantastic, if not we urge you to look at the DVS Enterprise 6020 solution (discussed here). This was meant to serve as a high level overview of this new solution stack so if you’d like to dig deeper, please take a look at the RA document below.
Dell DVS Reference Architecture for Windows Server 2012: here.
Product launch announcement: Link

Enterprise VDI

My name is Peter and I am the Principal Engineering Architect for Desktop Virtualization at Dell. 🙂

VDI is a fire hot topic right now and there are many opinions out there on how to approach it. If you’re reading this I probably don’t need to sell you on the value of the concept but more and more companies are deploying VDI instead of investing in traditional PC refreshes. All trending data points to this shift only going up over the next several years as the technology gets better and better. VDI is a very interesting market segment as it encompasses the full array of cutting edge enterprise technologies: network, servers, storage, virtualization, database, web services, highly distributed software architectures, high-availability, and load balancing. Add high capacity and performance requirements to the list and you have a very interesting segment indeed! VDI is also constantly evolving with a very rich ecosystem of players offering new and interesting technologies to keep up with. This post will give you a brief look at the enterprise VDI offering from Dell.
As a customer, and just 1 year ago I still was one, it’s very easy to get caught up in the marketing hype making it difficult to realize the true value of product or platform. With regard to VDI, we are taking a different approach at Dell. Instead of trying to lure you with inconceivable and questionable per server user densities, we have decided to take a very honest and realistic approach in our solutions. I’ll explain this in more detail later.
Dell currently offers 2 products in the VDI space: Simplified, which is the SMB-focused VDI-in-a-box appliance I discussed here (link), and Enterprise which can also start very small but has much longer legs to scale to suit a very large environment. I will be discussing the Enterprise platform in this post which is where I spend the majority of my time. In the resources section at the bottom of this posting you will find links to 2 reference architectures that I co-authored. They serve as the basis for this article.

DVS Enterprise

Dell DVS Enterprise is a multi-tiered turnkey solution comprised of rack or blade servers, iSCSI or FC storage built on industry leading hypervisors, software and VDI brokers. We have designed DVS Enterprise to encompass tons of flexibility to meet any customer need and can suit 50-50,000 users. As apposed to the more rigid “block” type products, our solutions are tailored to the customer to provide exactly what is needed with flexibility for leveraging existing investments in network, storage, and software.
The solution stacks consist of 4 primary tiers: network, compute, management, and storage. Network and storage can be provided by the customer, given the existing infrastructure meets our design and performance requirements. The Compute tier is where the VDI sessions execute, whether running on local or shared storage. The management tier is where VDI broker VMs and supporting infrastructure run. These VMs run off of shared storage in all solutions so management tier hosts can always be clustered to provide HA. All tiers, while inextricably linked, can scale independently.

The DVS Enterprise portfolio consists of 2 primary solution models: “Local Tier 1” and “Shared Tier 1”. DVS Engineering spends considerable effort validating and characterizing core solution components to ensure your VDI implementation will perform as it is supposed to. Racks, blades, 10Gb networking, Fiber Channel storage…whatever mix of ingredients you need, we have it. Something for everyone.

Local Tier 1

“Tier 1” in the DVS context defines from which disk source the VDI sessions execute and is therefore faster and higher performing disk. Local Tier1 applies only to rack servers (due to the amount of disk required) while Shared Tier 1 can be rack or blade. Tier 2 storage is present in both solution architectures and, while having a reduced performance requirement, is utilized for user profile/data and management VM execution. The graphic below depicts the management tier VMs on shared storage while the compute tier VDI sessions are on local server disk:

This local Tier 1 Enterprise offering is uniquely Dell as most industry players focus solely on solutions revolving around shared storage. The value here is flexibility and that you can buy into high performance VDI no matter what your budget is. Shared Tier 1 storage has its advantages but is costly and requires a high performance infrastructure to support it. The Local Tier 1 solution is cost optimized and only requires 1Gb networking.

Network

We are very cognizant that network can be a touchy subject with a lot of customers pledging fierce loyalty to the well-known market leader. Hey I was one of those customers just a year ago. We get it. That said, a networking purchase from Dell is entirely optional as long you have suitable infrastructure in place. From a cost perspective, PowerConnect provides strong performance at a very attractive price point and is the default option in our solutions. Our premium Force10 networking product line is positioned well to compete directly with the market leader from top of rack (ToR) to large chassis-based switching. Force10 is an optional upgrade in all solutions. For the Local Tier 1 solution, a simple 48-port 1Gb switch is all that is required, the PC6248 is shown below:

Servers

The PowerEdge R720 is a solid rack server platform that suits this solution model well with up to 2 x 2.9Ghz 8-core CPUs, 768GB RAM, and 16 x 2.5” 15K SAS drives. There is more than enough horsepower in this platform to suit any VDI need. Again, flexibility is an important tenet of Dell DVS so other server platforms can be used if desired to meet specific needs.

Storage

A shared Tier 2 storage purchase from Dell is entirely optional in the Local Tier 1 solution but is a required component of the architecture. The Equallogic 4100X is a solid entry level 1Gb iSCSI storage array that can be configured to provide up to 22TB of raw storage running on 10k SAS disks. You can of course go bigger to the 6000 series in Equallogic or integrate a Compellent array with your choice of storage protocol. It all depends on your need to scale.

Shared Tier 1

In the Shared Tier 1 solution model, an additional shared storage array is added to handle the execution of the VDI sessions in larger scale deployments. Performance is a key concern in the shared Tier 1 array and contributes directly to how the solution scales. All Compute and Mgmt hosts in this model are diskless and can be either rack or blade. In smaller scale solutions, the functions of Tier 1 and Tier 2 can be combined as long as there is sufficient capacity and performance on the array to meet the needs of the environment.

Network

The network configuration changes a bit in the shared Tier 1 model depending if you are using rack or blades and what block storage protocol you employ. Block storage traffic should be separated from LAN so iSCSI will leverage a discrete 10Gb infrastructure while fiber channel will leverage an 8Gb fabric. The PowerConnect 8024F is a 10Gb SFP+ based switch used for iSCSI traffic destined to either Equallogic or Compellent storage that can be stacked to scale. The fiber channel industry leader Brocade is used for FC fabric switching.

In the blade platform, each chassis has 3 available fabrics that can be configured with Ethernet, FC, or Infiniband switching. In DVS solutions, the chassis is configured with the 48-port M6348 switch interconnect for LAN traffic and either Brocade switches for FC or a pair of 10Gb 8024-K switches for iSCSI. Ethernet-based chassis switches are stacked for easy management.

Servers

Just like the Local Tier 1 solution, the R720 can be used if rack servers are desired or the half-height dual-socket M620 if blades are desired. The M620 is on par to the R720 in all regards except for disk capacity and top end CPU. The R720 can be configured with a higher 2.9Ghz 8-core CPU to leverage greater user density in the compute tier. The M1000E blade chassis can support 16 half-height blades.

Storage

Either Equallogic or Compellent arrays can be utilized in the storage tier. The performance demands of Tier 1 storage in VDI are very high so design considerations dealing with boot storms and steady-state performance are critical. Each Equallogic array is a self-contained iSCSI storage unit with an active/passive controller pair that can be grouped with other arrays to be managed. The 6110XS, depicted below, is a hybrid array containing a mix of high performance SSD and SAS disks. Equallogic’s active tiering technology dynamically moves hot and cold data between tiers to ensure the best performance at all times. Even though each controller now only has a single 10Gb port, vertical port sharing ensures that a controller port failure does not necessitate a controller failover.

Compellent can also be used in this space and follows a more traditional linear scale. SSDs are used for “hot” storage blocks especially boot storms, and 15K SAS disks are used to store the cooler blocks on dense storage. To add capacity and throughput additional shelves are looped into the array architecture. Compellent has its own auto-tiering functionality that can be scheduled off hours to rebalance the array from day to day. It also employs a mechanism that puts the hot data on the outer ring of the disk platters where they can be read easily and quickly. High performance and redundancy is achieved through an active/active controller architecture. The 32-bit Series 40 controller architecture is soon to be replaced by the 64-bit SC8000 controllers, alleviating the previous x86-based cache limits.

Another nice feature about Compellent is its inherent flexibility. The controllers are flexible like servers allowing you to install the number and type of IO cards you require: FC, iSCSI, FCoE, and SAS for the backend… Need more front-end bandwidth or add another backend SAS loop? Just add the appropriate card to the controller.

In the lower user count solutions, Tier 1 and Tier 2 storage functions can be combined. In the larger scale deployments these tiers should be separated and scale independently.

VDI Brokers

Dell DVS currently supports both VMware View 5 and Citrix XenDesktop 5 running on top of the vSphere 5 hypervisor. All server components run Windows Server 2008 R2 and database services provided by SQL Server 2008 R2. I have worked diligently to create a simple, flexible, unified architecture that expands effortlessly to meet the needs of any environment.

Choice of VDI broker generally lands on customer preference, while each solution has its advantages and disadvantages. View has a very simple backend architecture consisting of 4 essential server roles: SQL, vCenter, View Connection Server (VCS) and Composer. Composer is the secret sauce that provides the non-persistent linked clone technology and is installed on the vCenter server. One downside to this is that because of Composer’s reliance on vCenter, the total number of VMs per vCenter instance is reduced to 2000, instead of the published 3000 per HA cluster in vSphere 5. This means that you will have multiple vCenter instances depending on how large your environment is. The advantage to View is scaling footprint, as 4 management hosts are all that is required to serve a 10,000 user environment. I wrote about View architecture design previously for version 4 (link).

View Storage Accelerator (VSA), officially supported in View 5.1, is the biggest game changing feature in View 5.x thus far. VSA changes the user workload IO profile, thus reducing the number of IOPS consumed by each user. VSA provides the ability to enable a portion of the host server’s RAM to be used for host caching, largely absorbing read IOs. This reduces the demand of boot storms as well as makes the tier 1 storage in use more efficient. Before VSA there was a much larger disparity between XenDesktop and View users in terms of IOPS, now the gap is greatly diminished.
View can be used with 2 connection protocols, the proprietary PCoIP protocol or native RDP. PCoIP is an optimized protocol intended to provide a greater user experience through richer media handling and interaction. Most users will probably be just fine running RDP as PCoIP has a greater overhead that uses more host CPU cycles. PCoIP is intended to compete head on with the Citrix HDX protocol and there are plenty of videos running side by side comparisons if you’re curious. Below is the VMware View logical architecture flow:

XenDesktop (XD), while similar in basic function, is very different from View. Let’s face it, Citrix has been doing this for a very long time. Client virtualization is what these guys are known for and through clever innovation and acquisitions over the years they have managed to bolster their portfolio as the most formidable client virtualization player in this space. A key difference between View and XD is the backend architecture. XD is much more complex and requires many more server roles than View which affects the size and scalability of the management tier. This is very complex software so there are a lot of moving parts: SQL, vCenter, license server, web interfaces, desktop delivery controllers, provisioning servers… there are more pieces to account for that all have their own unique scaling elements. XD is not as inextricably tied to vCenter as View is so a single instance should be able to support the published maximum number of sessions per HA cluster.

One of the neat things about XD is that you have a choice in desktop delivery mechanisms. Machine Creation Services (MCS) is the default mechanism provided in the DDC. At its core this provides a dead simple method for provisioning desktops and functions very similarly to View in this regard. Citrix recommends using MCS only for 5000 or fewer VDI sessions. For greater than 5000 sessions, Citrix recommends using their secret weapon: Provisioning Server (PVS). PVS provides the ability to stream desktops to compute hosts using gold master vDisks, customizing the placement of VM write-caches, all the while reducing the IO profile of the VDI session. PVS leverages TFTP to boot the VMs from the master vDisk. PVS isn’t just for virtual desktops either, it can also be used for other infrastructure servers in the architecture such as XenApp servers and provides dynamic elasticity should the environment need to grow to meet performance demands. There is no PVS equivalent on the VMware side of things.
With Citrix’s recent acquisition and integration of RingCube in XD, there are now new catalog options available for MCS and PVS in XD 5.6: pooled with personal vDisk or streamed with personal vDisk. The personal vDisk (PVD) is disk space that can be dedicated on a per user basis for personalization information, application data ,etc. PVD is intended to provide a degree of end user experience persistence in an otherwise non-persistent environment. Additional benefits of XD include seamless integration with XenApp for application delivery as well as the long standing benefits of the ICA protocol: session reliability, encrypted WAN acceleration, NetScaler integration, etc. Below is the Citrix XenDesktop logical architecture flow:

High Availability

HA is provided via several different mechanisms across the solution architecture tiers. In the network tier HA is accomplished through stacking switches whether top of rack (ToR) or chassis-based. Stacking functionally unifies an otherwise segmented group of switches so they can be managed as a single logical unit. Discrete stacks should be configured for each service type, for example a stack for LAN traffic and a stack for iSCSI traffic. Each switch type has its stacking limits so care has been taken to ensure the proper switch type and port count to meet the needs of a given configuration.

Load balancing is provided via native DNS in smaller stacks, especially for file and SQL, and moves into a virtual appliance based model over 1000 users. NetScaler VPX or F5 LTM-VE can be used to load balance larger environments. NetScalers are sized based on required throughput as each appliance can manage millions of concurrent TCP sessions.
Protecting the compute tier differs a bit between the local and shared tier 1 solutions, as well as between View and XenDesktop. In the local tier 1 model there is no share storage in the compute tier, so vSphere HA can’t help us here. With XD, PVS can provide HA functionality by controlling the placement of VDI VMs from a failed host to a hot standby.

The solution for View is not quite so elegant in the local tier 1 model as there is no mechanism to automatically move VMs from a failed host. What we can do though is mimic HA functionality by manually creating a resource reservation on each compute host. This creates a manual RAID type of model where there is reserve capacity to host a failed server’s VDI sessions.

In the shared tier 1 model, the compute tier has shared storage so we can take full advantage of vSphere HA. This also applies to the management tier in all solution models. There are a few ways to go here when configuring admission control. Thankfully there are now more options than only calculating slot sizes and overhead. The simplest way to go is specifying a hot standby for dedicated failover. The downside is that you will have gear sitting idle. If that doesn’t sit well with you then you could specify a percentage of cluster resources to reserve. This will thin the load running on each host in the cluster but at least won’t waste resources entirely.

If the use of DRS is desired, care needs to be taken in large scale scenarios as this technology will functionally limit each HA cluster to 1280 VMs.

Protection for the storage tier is relatively straight forward as each storage array has its own built-in protections for controllers and RAID groups. In smaller solution stacks (under 1000 users) a file server VM is sufficient to host user data, profiles, etc. We recommend that for deployments larger than 1000 users that NAS be leveraged to provide this service. Our clustered NAS solutions for both Equallogic and Compellent are high performing and scalable to meet the needs of very large deployments. That said, NAS is available as an HA option at any time, for any solution size.

Validation

The Dell DVS difference is that our solutions are validated and characterized around real-world requirements and scenarios. Everyone that competes in this space plays the marketing game but we actually put our solutions through their paces. Everything we sell in our core solution stacks has been configured, tested, scrutinized, optimized, and measured for performance at all tiers in the solution. Additional consulting and blue printing services are available to help customers properly size VDI for their environments by analyzing user workloads to build a custom solution to meet those needs. Professional services is also available to stand up and support the entire solution.

The DVS Enterprise solution is constantly evolving with new features and options coming this summer. Keep an eye out here for more info on the latest DVS offerings as well as discussions on the interesting facets of VDI.

References:

Dell DVS: VMware View Reference Architecture

Dell DVS: Citrix XenDesktop Reference Architecture

Hot adding an external shelf to a NetApp array

My set up for this scenario is simple: 1 x FAS2020 running HA with 12 x internal SAS disks (Ontap 7.3.2). I am adding a partially-populated external SATA shelf (DS14MK2AT) to provide expansion and an additional tier of storage. The process is relatively straight forward and should apply to most arrays in the NetApp family.

Hardware Installation

NetApp uses an inordinate amount of packing material to ship what ultimately amounts to 3U’s of occupied space in the rack. Better safe than RMA I guess.

If you’ve assembled other storage arrays or servers this part won’t be much of a challenge. One item of note is that the upper shelf controller goes in upside down, which may not be immediately obvious.

Once your shelf is securely installed in the rack with the drives inserted, install your SFPs in the “In” ports on both controllers, keeping in mind the upper SFP will go in upside down. NetApp will ship 2 sets fiber pairs with SC connectors, you will only need 1 set if you are installing a single shelf. Each pair will be labeled to match “1” and “2” on both ends. If you have additional shelves to install you will need to also install SFPs in the “out” ports to connect those shelves to the loop. Make sure to properly set your shelf ID which will be “1” if this is your first shelf.

FC Adapter Configuration

Ok, now the fun begins. Because my FAS2020 had no external shelves previously I had both FC ports on each controller connected to my Fiber Channel fabrics providing 4 paths to each storage target. Unfortunately I now need 2 of these ports to connect a loop to my new shelf. Any subsequent shelves added to the stack will attach to a prior shelf via the “Out” ports. The first step is to remove the 2 controller ports from my fabrics, both physically and in the Brocade switch configuration. I will be using the 0B interfaces on both controllers to connect to my shelf. My FC clients, vSphere and Server 2008 R2 clusters running DSM, are incredibly resilient and adjust to the dead paths immediately with no data interruption. Perform an HBA rescan in ESX and check the pathing just to be sure everything is ok.
Before the fiber from the shelf can be connected to the controller ports, we need to change the operation mode of the FC ports. Currently they are in “target” mode as they were being used to serve data via the FC fabric. To talk to an external drive shelf they need to be in “initiator” mode. This is done using the fcadmin command in the console. Fcadmin config will display the current state of a controller’s FC adapters. Notice that they are in target mode. The syntax to change the mode is fcadmin config –t <adapter mode> <adapter>. You must also first offline the adapter to be changed because Ontap will not allow the change to an active adapter.

Once the adapter mode has been changed you will need to reboot the controller before it will take effect. If you are running an HA cluster this can be done easily utilizing the takeover and giveback functions. From the console of the controller that will be taking over the cluster, run cf takeover. This will migrate all operations of the other controller to the node on which you issue the command. As part of this process the node that has been taken over, will be rebooted. Very clean.
Fas1 taking over the cluster:

Fas2 being gracefully rebooted:

Once the rebooted node is back up, from the console of the node that is in takeover mode, issue the command cf giveback. This will gracefully return all appropriate functions owned by the taken over node back into its’ control. Client connections are completely unaffected by this activity.

The cluster will resume normal operation after the giveback which can be verified by issuing the cf status command, or via System Manager if you’d like a more visually descriptive display.

Disk Assignments

Now that Fas2 is back up, you can verify the operation mode the 0B adapters (fcadmin config) as well as check that the disks in the external shelf can now be seen by the array. Issue the disk show –n command to view any unassigned disks in the array (which should be every disk in the external shelf).

Because I am working with a partially populated shelf (8 out of 14 disks), I will configure a 3:3 split (+ 2 spares) between the controllers and create new aggregates on both. Performance is not a huge concern for me on this external shelf, I’m just looking for reserve capacity. Here is the physical disk design layout I’ll be working with:

*NOTE make sure that “disk auto assign” is turned off in the options if you want complete control on disk assignment. Otherwise the filer will likely assign all disks to a single controller for you. It is enabled by default and needs to be disabled on both nodes.

With auto assign turned off issue the disk assign –n <disk count> –o <filer owner name> command. Or if you like you can assign the disks individually by name.

Don’t worry if you goofed and need to reassign disks between controllers as this can be done rather painlessly. This is what it looks like when the filer auto assigns all disks to a single controller:

To fix this, enter advanced privilege mode on the filer and issue the disk remove_ownership <drive name> command for each drive you want to change.

Once the drives have been removed from ownership, run the disk assign command again to get them where they should go. NetApp also recommends that you re-enable auto disk assign. Run a vol status –s on both controllers to verify the newly assigned disks and their pertinent details.

Aggregates and Spares

Now that the disks are assigned to their respective controllers, we can create aggregates. If the disk type in the external shelf were the same as the internal disks, we could add them to an existing aggregate, but since I am adding a new disk type to my array I have to create a new aggregate. I’m going to switch over to System Manager for the remaining tasks.
Each controller will need its own aggregate comprised of the disks you just assigned to each (save the spare). I will be using the default NetApp naming standard and creating aggr1. This can be performed from the Disks or Aggregate page and is pretty self explanatory.

RAID 4 is the way to go here as I don’t have the spare disks to justify RAID DP + a hot spare. Although I will be married to this decision for the life of this aggregate, it’s a sacrifice I have to make. Repeat this process on the other node. *NOTE make sure to leave at least 1 spare of each disk type, per controller, in the array. NetApp’s recommendation is as follows for ensuring you have the proper number of spares given a common disk type:

There you have it. A new shelf added hot to a NetApp array with no disruption to the connected clients. Now you can create your volumes, LUNs, CIFS/NFS shares, etc. If I add another AT shelf at some point at least I won’t have to sacrifice any more disks to spares!

vSphere Disaster Recovery using NetApp’s SMVI

Disaster recovery (DR) is always a hot topic that many companies do not do at all for one reason or another or do badly. Providing DR for a virtual environment can be a particularly challenging and expensive endeavor. In my enterprise I am running ESX4 U1 on top of Brocade fiber-channel connected NetApp FAS arrays (1 production, 1 DR) with a 1Gb Metro-Ethernet link between sites. I do not currently have the luxury of using VMware’s Site Recovery Manager (SRM) in my environment so my process will be completely manual. SRM removes many of the monotonous tasks of turning up a virtual DR environment including testing your plan, but this comes with a heavy price tag. I am fortunate enough, however, to have the full suite of NetApp tools at my disposal.

Snap Manager for Virtual Infrastructure (SMVI) is NetApp’s answer to vSphere VM backups for use with their storage arrays. SMVI does require a license and requires some specific array-level licenses, such as: SnapRestore, applicable protocol license (NFS, FCP, iSCSI), SnapMirror (optional), and FlexClone. There are some specific instances in which FlexClone is not required such as for NFS VM in-place VMDK restores. All by itself SMVI can be used instead of VCB or Ranger type products to backup/restore VMs, volumes, or individual files within the guest VM OS. SnapMirror can be used in conjunction with SMVI to provide DR by sending backed up VM volumes offsite to another Filer.

Here is the backup process:

Once a backup is initiated, a VMware snapshot is created via vCenter for each powered-on VM in a selected volume, or for each VM that is selected for backup. You can choose either method but volume backups are recommended.
The VM snapshots preserve state and are used during restores. Windows application consistency can be achieved by using VMware’s VSS support for snapshots.
Once all VMs are snapped, SMVI initiates a snapshot of the volume on the Filer.
Once the volume snaps are complete, the VM snapshots are deleted in vCenter.
If you have a SnapMirror in place for your backed up volumes, it is then updated by SMVI.

NetApp fully supports, and recommends, running SMVI on the vCenter server for simplicity. Setup is very straight forward and only requires the vCenter server/ credentials and the storage array names or IPs/ credentials. Best practice is to set up a dedicated user in vCenter as well as on the arrays for SMVI. The required vCenter permissions for this service account are detailed in the best practices guide in the references section at the bottom of this post.

SMVI Setup

Once SMVI can communicate with vCenter, you will see the Vi datacenters, datastores and VMs on the inventory tab.

Backup configuration is simple. Name the job, choose the volume, specify the desired backup schedule, how long to keep the backups, and where to send alerts. If you’ll be using SnapMirror, make sure to check the “Initiate SnapMirror update” option.

By default the SMVI job alerts include a link to the job log but the listed address may be unreachable. In my case the links sent were to an IPv6 address even though IPv6 was disabled on the server. This can be changed by editing the smvi.override file in \Program Files (x86)\NetApp\SMVI\server\etc and adding the following lines:

smvi.hostname=vcenter.domain.com
smvi.address=10.10.10.10

Once you successfully run a backup job in SMVI you will be able to see the available snapshots for the volume on the source Filer. Note the SMVI snaps vs snaps used by the SnapMirror.

I my scenario, I am backing up the 2 LUNS called VMProd1_FC and VMProd2_FC which exist in volumes called ESXLUN1 and 2, respectively. Both of these volumes have corresponding SnapMirrors configured between the primary and DR Filers.

A couple of things to keep in mind about timing:

It is a good idea to sync your SnapMirrors outside of SMVI which will reduce the time it takes when SMVI updates the mirrors. Just make sure to do this at a time other than when your SMVI jobs are running!
If you are using deduplication on these volumes (you should be), schedule it to run at a time when you are not running SMVI backups or syncing SnapMirrors.

Once SMVI successfully updates the SnapMirror, you will see the replicated snapshots on the destination side of the mirror as well. DR for your ESX environment is now in effect!

Testing the DR Plan

Here comes the fun part and where having SRM would be extremely helpful to automate most of this. Thanks to NetApp’s FlexClone technology we can test our DR plan without breaking the mirrors, so you could test whenever and as often as necessary without affecting production.

First step, create a FlexClone Volume of the replicated snapshot you want to be able to mount and test with. Choose the appropriate volume then select Snapshop—>Clone from the Volumes menu. Important to note is that you must use a File space guarantee or the FlexClone creation will fail! This can be done via System Manager, CLI, or FilerView:

SNAGHTML4f473a3

Your new volume will now be visible in the Filer’s volumes list, which also creates a corresponding LUN that you’ll notice is offline by default:

Bring the new LUN online and then you will be able to present it to your DR ESX hosts:

SNAGHTML4fab0c2

The LUN should instantly appear on the ESX hosts in your iGroup but if it does not run a rescan on the appropriate storage adapter:

ESX sees the LUN, now it needs to be added to the cluster. Switch to the Storage page in vCenter and select “add storage.” Select the new disk and be sure to select “Assign a new signature” to the disk or ESX will not be able to write to it! Only a new UUID is written, all existing data will be preserved.

The new LUN will now be accessible to all ESX hosts in the cluster. Note the default naming convention used after adding the LUN:

This is where things get REALLY manual and you wish you ponied that $25k for SRM. Browse the newly mounted datastore and register each VM you need to test, one by one.

Now your organization’s policies take over as to how far you need to go to satisfy a successful DR test. Once the VM is registered it can be powered on, but if it’s counterpart is still running in production disable its vNIC first. If you need to go further, then shut down the production VM, re-IP the DR clone and then bring it online. If you need to have users connect to the DR clone then there are other implications to consider such as DNS, DFS, ODBC pointers, etc. When your test is complete, power off all DR VM clones, dismount the datastore from within ESX, delete the FlexClone Volume on the DR Filer, bring the production VMs back up, check DNS, done. A beautiful thing and best of all the prod—>DR mirror is still in sync!

References:

SMVI 2.0 Best Practices

SMVI 2.0 Admin Guide

NetApp KB52524

Masking and Presenting LUNs in NetApp Filers

I’m documenting the easy method to this process having just gone through the more round-about backdoor method. The proper process to mask and present LUNs on a NetApp array is through the creation of LUNs and iGroups (mask) which is wizard driven inside the NetApp System Manager. This can all be done, of course, via the CLI which I have done but this way is much simpler. This all assumes that you have already properly setup FCP or iSCSI on your array and that all requisite fiber-channel zoning is complete. My filers are running ONTAP 7.3.2 and I have 2 fiber-channel fabrics powered by Brocade switches.

Open System Manager and navigate to the controller node that will be hosting your LUN. Expand Storage and click LUNs. Click create under LUN Management. Enter a name for your new LUN along with the size and presentation type.

Next allow ONTAP to create a new volume or choose an existing container. I’ll create a new volume:

The next step will mask the LUN to a host of your choosing via the creation of an iGroup. Click Add Initiator Host and select the WWpN of the host HBA you wish to connect to the LUN. You can only add a single initiator in this step. Once complete move your new iGroup from “Known initiator hosts” to “Hosts to connect”.

Review the changes to be made in the summary and click next to start the process. When complete you will see your new LUN and iGroup with the host’s WWpN. By default NetApp likes to put each initiator port in its own iGroup. Because I am building a SQL cluster and need all hosts to see all of the same LUNs I will be adding all cluster members to the same iGroup. This can be done either by creating a new iGroup and selecting the WWpN you want to add followed by reassigning it to the proper iGroup, or you can copy/paste the port name into the iGroup you just created.

Make sure to enable ALUA if your array supports these features.

Verify the iGroup via the CLI if you wish by issuing the igroup show command:

Now, assuming you have ONTAP DSM (MPIO) installed, you should see the new LUN on your host.