Moving a Nutanix Hyper-V Cluster between Domains


So you have a shiny 4-node Server 2012 R2 Hyper-V Failover Cluster running on Nutanix humming along no problem. Sadly, you only have a single virtual domain controller hosted somewhere else that owns the AD for your severs and failover cluster. Crap, someone deleted the DC! Well, you wanted to move this cluster to your primary domain anyway, guess now you have an excuse. This is probably a very rare scenario but should you be unlucky enough to have it happen to you, here is how to deal with it. All of this took place on my 4-node Nutanix cluster which is honestly inconsequential, but goes to show that the Nutanix storage cluster itself was entirely unaffected and frankly unconcerned by what I was doing. The same basic steps in this post would apply assuming your VMs are on stable, well-connected, shared storage.

In this scenario, there are really 2 ways to go:

Option A: start over, rebuild your hosts
Some might opt for this and as long as you don’t have data that needs preserving, go for it. In my case I have CVMs and VMs on every host, not interested.

Option B: change domains, rebuild the Failover Cluster, migrate VMs
This might seem messy but since I have data I need to save, this is the route I’ll be going. This scenario assumes that networking was functioning prior to the migration and that it will remain the same. Adding IP, vSwitch or other network changes to this could really complicate things and is not recommend.

Core or GUI

If you’re a PowerShell master then staying in Core mode may not be an issue for you. If you’re not, you might want to convert your Server Core instance to full GUI mode first, if it isn’t there already. I wrote about how to do that here. While you’re at it, make sure all nodes are at exactly the same Windows patch level.

Out with the old

I’ll be transitioning 4 active nodes from a dead and gone domain named test1.com to my active domain dvs.com. First, power off all VMs and remove their association from the old cluster. We’re not touching the storage here so there will be no data loss.
image

Migration of each node will occur one at a time by first evicting the node to be converted from the old cluster. Important: do this BEFORE you change domains!
image

If, and only if, this is the last and final node you will be migrating, you can now safely destroy the old failover cluster. Do this only on the very last node!
SNAGHTMLf08207b

Once a node is ready to make the switch, change the host’s DNS entries to point to the DCs of the domain you will be migrating to, then join the new domain.
image

In with the new

Once your first node is back up, create a new failover cluster. I’m reusing the same IP that was assigned to the old cluster to keep things simple.  Since this is Nutanix which manages its own storage, there are no disks to be managed by the failover cluster, so don’t import anything. Nothing bad will happen if you do, but Hyper-V doesn’t manage these disks so there’s no point. Also, if you run the cluster suitability checks they will fail on the shared storage piece.
image

Repeat this process for each node to be migrated, adding each to the new cluster. Next import your pre-existing VMs into the new cluster and configure them for high availability.
image

In Prism, just for good measure, update the FQDN in cluster details:
image

Let the Distributed Storage Fabric settle. Once the CVMs are happy, upgrade the NOS if desired.
image

Pretty easy if you do things in the right order. Here is a view of Prism with my 4 hosts converted and only CVMs running. Definitely not a busy cluster at the moment but it is a happy cluster ready for tougher tasks!
image

If things go badly

Maybe you pulled the trigger on the domain switch first before you evicted nodes and destroyed the cluster? If so, any commands directed at the old cluster to the old domain will likely fail with access being denied. You will be prevented from removing the node or destroying the old cluster.
image

If this happens you’ll need to manually remove & restore the cluster services on that node. Since none of the Cmdlets are working it’s time to turn to the registry. Find “ClusDisk” and “ClusSVC” keys within the following path and delete them both. You’ll see entries reflecting the old cluster and old configuration:

HKLM\System\CurrentControlSet\Services\

image

Now you can remove the Failover Clustering feature from the Remove Roles and Features wizard:
image

Reboot the host and install the Failover Clustering feature again. This will set the host back to square one from a clustering perspective, so you can now create a new cluster or join one preexisting.

For more information…

Forcibly removing clustering features
Dell XC Web-scale Appliance Architectures for VDI
Dell XC Web-Scale Converged Appliances
nutanix.com

Dell XC730 – Graphics HCI Style

Our latest addition to the wildly popular Dell XC Web-Scale Converged Appliance series powered by Nutanix, is the XC730 for graphics. The XC730 is a 2U node outfitted with 1 or 2 NVIDIA GRID K1 or K2 cards suited for highly scalable and high-performance graphics within VDI deployments.  The platform supports either Hyper-V with RemoteFX or VMware vSphere 6 with NVIDIA GRID vGPU.  Citrix XenDesktop, VMware Horizon or Dell Wyse vWorkspace can be deployed on the platform and configured to run with the vGPU offering. Being a graphics platform, the XC730 currently supports up to 2 x 120w 10, 12, or 14-core Haswell CPUs with 16 or 32GB 2133MHz DIMMs. The platform requires a minimum of 2 SSDs and 4 HDDs with a range of options within each tier. All XC nodes come equipped with a 64GB SATADOM to boot the hypervisor and 128GB total spread across the first 2 SSDs for the Nutanix Home. Don’t know what mix of components to choose or how to size your environment? No problem, we create optimized platforms and validate them to make your life easier.

 

image

Optimized High Performance Graphics

Our validated platform for the XC730 is a high-end “B7” variant that provides the absolute pinnacle of graphics performance in an HCI offer. 14-core CPUs, 256GB RAM, dual K2’s, 2 x 400GB SSDs and 4 x 1TB HDDs. This is all about graphics so we need more wattage and the XC730 comes equipped with dual 1100w PSUs. iDRAC8 comes standard in the XC Series as well as your choice of 10Gb SFP+ or BaseT Ethernet adapters. I chose this mix of components to provide a very high-end experience to support the maximum number of users allowed by the K2 vGPU profiles. Watch this space for additional optimized platforms designed to remove the guess of sizing graphics in VDI. Even when GPUs are present, the basic laws of VDI exist which state: thou shalt exhaust CPU first. Dual 14-core parts will ensure that doesn’t happen. Make sure to check out our Appliance Architectures at the bottom of this post for more detailed info.

 

image

 

NVIDIA GRID vGPU

The cream of the crop in server virtualized graphics comes courtesy of NVIDIA with the K1 and K2 Kepler-based boards. The K2 has 4x the CUDA cores of the K1 board with a slightly lower core clock and less total memory. The K2 is the board you want for fewer higher end users. The K1 is appropriate for a higher overall number of lower end graphical users. vGPU is where the magic happens as graphics are hardware-accelerated to the virtual environment running within vSphere.  Each desktop VM within the vGPU environment runs a native set of the NVIDIA drivers and addresses a real slice of the server grade GPU. The graphic below from NVIDIA portrays this quite well.

 

image

(image courtesy of NVIDIA)

 

The magic of vGPU happens within the profiles that ultimately get assigned. Both K1 and K2 in conjunction with vGPU have a preset number of profiles that ultimately control the amount of graphics memory assigned to each desktop along with the total number of desktops that can be supported by each card. The profile pattern is K2xxx for K2 and K1xxx for K1, the smaller index identifying a great user density. The K280Q and K180Q are basically pass-through profiles where an entire physical GPU is passed through to a desktop VM. You’ll notice how the densities change per card and per GPU depending on how these profiles are assigned. Lower performance = greater densities with up to 64 total users possible on an XC730 with dual K1 cards. In the case of the K2, the maximum number of users possible on a single XC730 with dual cards is 32.

 

image

So how does it fit?

When designing a solution using the XC730, all of the normal Nutanix and XC rules apply including a minimum of 3 nodes in a cluster and a CVM running on every node. Since the XC730 is a graphics appliance it is recommended to have additional XC630s in the architecture to run the VDI infrastructure management VMs as well as for the users not requiring high-performance graphics. Nothing will stop you from running mgmt infa on your XC730 nodes but financial practicality will probably dictate that you use XC630s for this purpose. Scale of either is unlimited.

 

image

 

It is entirely acceptable (and expected) to mix XC730 and XC630 nodes within the same NDFS cluster. The boundaries you will need to draw will be around your vSphere HA clusters separating each function into a discrete cluster up to 64 nodes, as depicted below. This allows each discrete layer to scale independently while providing dedicated N+1 HA for each function. Due to the nature of graphics virtualization, vMotion is not currently supported neither is automated HA VM restarts when GPUs are attached. HA in this particular scenario would be cold should a node in the cluster fail. Using pooled desktops and mobilizing user data, this should only equate to a minor inconvenience worst case. 

 

image

 

As is the case with any Nutanix deployment, very large NDFS clusters can be created with multiple hypervisor pods created within a singular namespace. Below is an example depiction of a 30K user deployment within a single site. Each compute pod is composed of a theoretical maximum number of users within a VDI farm, serviced by a dedicated pair of mgmt nodes for each farm. Mgmt is separated into a discrete cluster for all farms and compute is separated per the HA maximum. This is a complicated architecture but demonstrates the capabilities of NDFS when designing for a very large scale use case.

image

 

This is just the tip of the iceberg! For more information on the XC series architecture, platforms, Nutanix, test results of our XC730 platform and more, please see our latest Appliance Architectures which include the XC730:

XC Series for Citrix

XC Series for VMware

XC Series for vWorkspace

Dell XC Series 2.0: Product Architectures

Following our launch of the new 13G-based XC series platform, I present our product architectures for the VDI-specific use cases. Of the platforms available, this use case is focused on the extremely powerful 1U XC630 with Haswell CPUs and 2133MHz RAM. We offer these appliances on both Server 2012 R2 Hyper-V and vSphere 5.5 U2 with Citrix XenDesktop, VMware Horizon View, or Dell vWorkspace.  All platform architectures have been optimized, configured for best performance and documented.
dn

Platforms

We have three platforms to choose from optimized around cost and performance, all being ultimately flexible should specific parameters need to change. The A5 model is the most cost effective leveraging 8-core CPUs, 256GB RAM 2 x 200GB SSDs for performance and 4 x 1TB HDDs for capacity.  For POCs, small deployments or light application virtualization, this platform is well suited. The B5 model steps up the performance by adding four cores per socket, increasing the RAM density to 384GB and doubling the performance tier to 2 x 400GB SSDs. This platform will provide the best bang for the buck on medium density deployments of light or medium level workloads. The B7 is the top of the line offering 16-core CPUs and a higher capacity tier of 6 x 1TB HDDs. For deployments requiring maximum density of knowledge or power user workloads, this is the platform for you.
image
At 1U with dual CPUs, 24 DIMM slots and 10 drive bays…loads of potential and flexibility!
image

Solution Architecture

Utilizing 3 platform hardware configurations, we are offering 3 VDI solutions on 2 hypervisors. Lots of flexibility and many options. 3 node cluster minimum is required with every node containing a Nutanix Controller VM (CVM) to handle all IO. The SATADOM is present for boot responsibilities to host the hypervisor as well as initial setup of the Nutanix Home area. The SSDs and NL SAS disks are passed through directly to each CVM which straddle the hypervisor and hardware. Every CVM contributes its directly-attached disks to the storage pool which is stretched across all nodes in the Nutanix Distributed File System (NDFS) cluster. NDFS is not dependent on the hypervisor for HA or clustering. Hyper-V cluster storage pools will present SMB version 3 to the hosts and vSphere clusters will be presented with NFS. Containers can be created within the storage pool to logically separate VMs based on function. These containers also provide isolation of configurable storage characteristics in the form of dedupe and compression. In other words, you can enable compression on your management VMs within their dedicated container, but not on your VDI desktops, also within their own container. The namespace is presented to the cluster in the form of \\NDFS_Cluster_name\container_name.
The first solution I’ll cover is Dell’s Wyse vWorkspace which supports either 2012 R2 Hyper-V or vSphere 5.5. For small deployments or POCs we offer this solution in a “floating mgmt” architecture which combines the vWorkspace infrastructure management roles and VDI desktops or shared session VMs. vWorkspace and Hyper-V enables a special technology for non-persistent/ shared image desktops called Hyper-V Catalyst which includes 2 components: HyperCache and HyperDeploy. Hyper-V Catalyst provides some incredible performance boosts and requires that the vWorkspace infrastructure components communicate directly with the hyper-V hypervisor. This also means that vWorkspace does not require SCVMM to perform provisioning tasks for non-persistent desktops!

  • HyperCache – Provides virtual desktop performance enhancement by caching parent VHDs in host RAM. Read requests are satisfied from cache including requests from all subsequent child VMs.
  • HyperDeploy – Provides instant cloning of parent VHDs massively diminishing virtual desktop pool deployment times.

You’ll notice the HyperCache components included on the Hyper-V architectures below. 3 to 6 hosts in the floating management model, depicted below with management, desktops and RDSH VMs logically separated only from a storage container perspective by function. The recommendation of 3-7 RDSH VMs is based our work optimizing around NUMA boundaries. I’ll dive deeper into that in an upcoming post. The B7 platform is used in the architectures below.
image
Above ~1000 users we recommend the traditional distributed management architecture to enable more predictable scaling and performance of both the compute and management hosts. The same basic architecture is the same and scales to the full extent supported by the hypervisor, is this case Hyper-V which supports up to 64 hosts. NDFS does not have a scaling limitation so several hypervisor clusters can be built within a single contiguous NDFS namespace. Our recommendation is to then build independent Failover Clusters for compute and management discretely so they can scale up or out independently.
The architecture below depicts a B7 build on Hyper-V applicable to Citrix XenDesktop or Wyse vWorkspace.
image

This architecture is relatively similar for Wyse vWorkspace or VMware Horizon View on vSphere 5.5 U2 but fewer total compute hosts per HA cluster, 32 total. For vWorkspace, Hyper-V Catalyst is not present in this scenario so vCenter is required to perform desktop provisioning tasks.
image
For the storage containers, the best practice of less is more still stands. If you don’t need a particular feature don’t enable it, as it will consume additional resources. Deduplication is always recommended on the performance tier since the primary OpLog lives on SSD and will always benefit. Dedupe or compression on the capacity tier is not recommended, of course you absolutely need it. And if you do prepare to increase each CVM RAM allocation to 32GB.

Container Purpose Replication Factor Perf Tier Deduplication Capacity Tier Deduplication Compression
Ds_compute Desktop VMs 2 Enabled Disabled Disabled
Ds_mgmt Mgmt Infra VMs 2 Enabled Disabled Disabled
Ds_rdsh RDSH Server VMs 2 Enabled Disabled Disabled

Network Architecture

As a hyperconverged appliance, the network architecture leverages the converged model. A pair of 10Gb NICs minimum in  each node handle all traffic for the hypervisor, guests and storage operations between CVMs. Remember that the storage of all VMs is kept local to the host to which the VM resides, so the only traffic that will traverse the network is LAN and replication. There is no need to isolate storage protocol traffic when using Nutanix. 
Hyper-V and vSphere are functionally similar. For Hyper-V there are 2 vSwitches per host, 1 external that aggregates all of the services of the host management OS as well as the vNICs for the connected VMs. The 10Gb NICs are connected to a LBFO team configured in Dynamic Mode. The CVM alone connects to a private internal vSwitch so it can communicate with the hypervisor.
image
In vSphere it’s the same story but with the concept of Port Groups and vMotion.
image
We have tested the various configurations per our standard processes and documented the performance results which can be found in the link below. These docs will be updated as we validate additional configurations.

Product Architectures for 13G XC launch:

Resources:

About Wyse vWorkspace HyperCache
About Wyse vWorkspace HyperDeploy
SJbWUYl4LRCwFNcatHuG

Dell XC Series Web-scale Converged Appliance 2.0

I am pleased to present the Dell XC 2.0 series of web-scale appliances based on the award-winning 13G PowerEdge server line from Dell. There’s lots more in store for the XC so just focusing on this part of the launch we are introducing the XC630 and the XC730xd appliances.

Flexibility and performance are key tenets of this launch providing not only a choice in 1U or 2U form factors, but an assortment of CPU and disk options. From a solution perspective, specifically around VDI, we are releasing three optimized and validated platforms with accompanied Product Architectures to help you plan and size your Dell XC deployments.

The basic architecture of the Dell XC powered by Nutanix remains the same. Every node is outfitted with a Nutanix Controller VM (CVM) that connects to a mix of SSD and HDD to contribute to a distributed storage pool that has no inherent scaling limitation. Three nodes minimum required and either VMware vSphere or Microsoft Windows Server 2012 R2 Hyper-V are supported hypervisors. Let’s take a look at the new models.

image

XC630

The XC630 is a 1U dual-socket platform that supports 6-core to 16-core CPUs and up to 24 x 2133MHz 16GB RDIMMs or 32GB LRDIMMs. The XC630 can be configured using all flash or using two tiers of storage which can consist of 2 to 4 x SSDs (200GB, 400GB or 800GB) and 4 to 8 x 1TB HDDs (2TB HDDs coming soon). Flexible! All flash nodes must have a minimum of 6 x SSDs while nodes with two storage tiers must have a minimum of two SSDs and four HDDs. All nodes have a minimum of 2 x 1Gb plus 2 x 10Gb SFP+ or BaseT Ethernet that can be augmented via an additional card.

New to the XC 2.0 series is a 64GB SATADOM that is used to boot each node. Each node is also outfitted with a 16GB SD card used for the purposes of initial deployment and recovery. The SSDs and HDDs that comprise the Nutanix Distributed File System (NDFS) storage pool are presented to each CVM via an on-board 1GB PERC H730 set in pass-through mode. Simple, powerful, flexible.

image

 

XC730xd

For deployments requiring a greater amount of cold tier data capacity, the XC730xd can provide up to 32TB raw per node. The XC730xd is a 2U dual-socket platform that supports 6-core to 16-core CPUs and up to 24 x 2133MHz 16GB RDIMMs or 32GB LRDIMMs. The XC730xd is provided with two chassis options: 24 x 2.5” disks or 12 x 3.5” disks. The 24-drive model requires the use of two tiers of storage which can consist of 2 to 4 x SSDs (200GB, 400GB or 800GB) and 4 to 22 x 1TB HDDs .The 12-drive model also requires two tiers of storage consisting of 2 to 4 x SSDs (200, 400GB or 800GB) and up to 10 x 4TB HDDs. All nodes have a minimum of 2 x 1Gb plus 2 x 10Gb SFP+ or BaseT Ethernet that can be augmented via an additional card.

The XC730xd platforms are also outfitted with a 64GB SATADOM that is used to boot the nodes. The 16GB SD card used for the purposes of initial deployment and recovery is present on these models as well. The SSDs and HDDs that comprise the Nutanix Distributed File System (NDFS) storage pool are presented to each CVM via an on-board 1GB PERC H730 set in pass-through mode. Simple, powerful, flexible.

12 drive option, hopefully the overlaps in the image below make sense:

image

 

24 drive option:

image

 

Nutanix Software Editions

All editions of the Nutanix software platform are available with variable lengths for support and maintenance.

image

This is just the beginning. Keep an eye out for additional platforms and offerings from the Dell + Nutanix partnership! Next up is the VDI product architectures based on the XC630. Stay tuned!!

http://www.dell.com/us/business/p/dell-xc-series/pd

Dell XC Series – Product Architectures

Hyperconverged Web-scale Software Defined Storage (SDS) solutions are white hot right now and Nutanix is leading the pack with their ability to support all major hypervisors (vSphere, Hyper-V and KVM) while providing nearly unlimited scale. Dell partnering with Nutanix was an obvious mutually beneficial choice for the reasons above plus supplying a much more robust server platform. Dell also provides a global reach for services and support as well as solving other challenges such as hypervisors installed in the factory.

Nutanix operates below the hypervisor layer so as a result requires a lot of tightly coupled interaction with the hardware directly. Many competing platforms in this space sit above the hypervisor so require vSphere, for example, to provide access to storage and HA but they are also limited by the hypervisor’s limitations (scale).  Nutanix uses its own algorithm for clustering and doesn’t rely on a common transactional database which can cause additional challenges when building solutions that span multiple sites. Because of this the Nutanix Distributed Filesystem (NDFS) supports no known limits of scale. There are current Nutanix installations in the thousands of nodes across a contiguous namespace and now you can build them on Dell hardware.

Along with the Dell XC720xd appliances, we have released a number of complementary workload Product Architectures to help customers and partners build solutions using these new platforms. I’ll discuss the primary architectural elements below.

Wyse Datacenter Appliance Architecture for Citrix

Wyse Datacenter Appliance Architecture for VMware

Wyse Datacenter Appliance Architecture for vWorkspace

 

Nutanix Architecture

Three nodes minimum are required for NDFS to achieve quorum so that is the minimum solution buy in, then storage and compute capacity can be increased incrementally beyond by adding one or more nodes to an existing cluster. The Nutanix architecture uses a Controller VM (CVM) on each host which participates in the NDFS cluster and manages the hard disks local to its own host. Each host requires two tiers of storage: high performance/ SSD and capacity/ HDD. The CVM manages the reads/writes on each host and automatically tiers the IO across these disks. A key value proposition of the Nutanix model is data locality which means that the data for a given VM running on a given host is stored locally on that host as apposed to having reads and writes crossing the network. This model scales indefinitely in a linear block manner where you simply buy and add capacity as you need it. Nutanix creates a storage pool that is distributed across all hosts in the cluster and presents this pool back to the hypervisor as NFS or SMB.

You can see from the figure below that the CVM engages directly with the SCSI controller through which it accesses the disks local to the host it resides. Since Nutanix sits below the hypervisor and handles its own clustering and data HA, it is not dependent upon the hypervisor to provide any features nor is it limited by any related limitations.

image

From a storage management and feature perspective, Nutanix provides two tiers of optional deduplication performed locally on each host (SSD and HDD individually), compression, tunable replication (number of copies of each write spread across disparate nodes in the cluster) and data locality (keeps data local to the node the VM lives on). Within a storage pool, containers are created to logically group VMs stored within the namespace and enable specific storage features such as replication factor and dedupe. Best practice says that a single storage pool spread across all disks is sufficient but multiple containers can be used. The image below shows an example large scale XC-based cluster with a single storage pool and multiple containers.

image

While the Nutanix architecture can theoretically scale indefinitely, practicality might dictate that you design your clusters around the boundaries of the hypervisors, 32 nodes for vSphere, 64 nodes for Hyper-v. The decision to do this will be more financially impactful if you separate your resources along the lines of compute and management in distinct SDS clusters. You could also, optionally, install many maximum node hypervisor clusters within a single very large, contiguous Nutanix namespace, which is fully supported. I’ll discuss the former option below as part of our recommended pod architecture.

Dell XC720xd platforms

For our phase 1 launch we have five platforms to offer that vary in CPU, RAM and size/ quantity of disks. Each appliance is 2U, based on the 3.5” 12-gen PowerEdge R720XD and supports from 5 to 12 total disks, each a mix of SSD and HDD. The A5 platform is the smallest with a pair of 6-core CPUs, 200GB SSDs and a recommended 256GB RAM. The B5 and B7 models are almost identical except for the 8-core CPU on the B5 and the 10-core CPU on the B7. The C5 and C7 boast a slightly higher clocked 10-core CPU with doubled SSD densities and 4-5x more in the capacity tier. The suggested workloads are specific with the first three targeted at VDI customers. If greater capacity is required, the C5 and C7 models work very well for this purpose too.

image

For workload to platform sizing guidance, we make the following recommendations:

Platform

Workload

Special Considerations

A5

Basic/ light task users, app virt

Be mindful of limited CPU cores and RAM densities

B5

Medium knowledge workers

Additional 4 cores and greater RAM to host more VMs or sessions

B7

Heavy power users

20 cores per node + a recommended 512GB RAM to minimize

oversubscription

C5 Heavy power users Higher density SSDs + 20TB in the capacity tier for large VMs or amount of user data
C7 Heavy power users Increased number of SSDs with larger capacity for greater amount of T1 performance

Here is a view of the 12G-based platform representing the A5-B7 models. The C5 and C7 would add additional disks in the second disk bay. The two disks in the rear flexbay are 160GB SSDs configured in RAID1 via PERC used to host the hypervisor and CVM, these disks do not participate in the storage pool. The six disks in front are controlled by the CVM directly via the LSI controller and contribute to the distributed storage pool across all nodes.

image

Dell XC Pod Architecture

This being a 10Gb hyperconverged architecture, the leaf/ spine network model is recommended. We do recommend a 1Gb switch stack for iDRAC/ IPMI traffic and build the leaf layer from 10Gb Force10 parts. The S4810 is shown in the graphic below which is recommended for SFP+ based platforms or the S4820T can be used for 10GBase-T.

image

In our XC series product architecture, the compute, management and storage layers, typically all separated, are combined here into a single appliance. For solutions based on vWorkspace under 10 nodes, we recommend a “floating management” architecture which allows the server infrastructure VMs to move between hosts also being used for desktop VMs or RDSH sessions. You’ll notice in the graphics below that compute and management are combined into a single hypervisor cluster which hosts both of these functions.

Hyper-V is shown below which means the CVMs present the SMBv3 protocol to the storage pool. We recommend three basic containers to separate infrastructure mgmt, desktop VMs and RDSH VMs. We recommend the following feature attributes based on these three containers (It is not supported to enable compression and deduplication on the same container):

Container

Purpose

Replication Factor

Perf Tier Deduplication

Capacity Tier Deduplication

Compression

Ds_compute

Desktop VMs

2

Enabled

Enabled

Disabled

Ds_mgmt

Mgmt Infra VMs

2

Enabled

Disabled

Disabled

Ds_rdsh

RDSH Server VMs

2

Enabled

Enabled

Disabled

You’ll notice that I’ve included the resource requirements for the Nutanix CVMs (8 x vCPUs, 32GB vRAM). The vRAM allocation can vary depending on the features you enable within your SDS cluster. 32GB is required, for example, if you intend to enable both SSD and HDD deduplication. If you only require SSD deduplication and leave the HDD tier turned off, you can reduce your CVM vRAM allocation to 16GB. We highly recommend that you disable any features that you do not need or do not intend to use!

image

For vWorkspace solutions over 1000 users or solutions based on VMware Horizon or Citrix XenDesktop, we recommend separating the infrastructure management in all cases. This allows management infrastructure to run in its own dedicated hypervisor cluster while providing very clear and predictable compute capacity for the compute cluster. The graphic below depicts a 1000-6000 user architecture based on vWorkspace on Hyper-V. Notice that the SMB namespace is stretched across both of the discrete compute and management infrastructure clusters, each scaling independently. You could optionally build dedicated SDS clusters for compute and management if you desire, but remember the three node minimum, which would raise your minimum build to 6 nodes in this scenario.

image

XenDesktop on vSphere, up to 32 nodes max per cluster, supporting around 2500 users in this architecture:

image

Horizon View on vSphere, up to 32 nodes max per cluster. supporting around 1700 users in this architecture:

image

Network Architecture

Following the leaf/ spine model, each node should connect 2 x 10Gb ports to a leaf switch which are then fully mesh connected to an upstream set of spine switches.

image

On each host there are two virtual switches: one for external access to the LAN and internode communication and one private internal vSwitch used for the CVM alone. On Hyper-V the two NICs are configured in a LBFO team on each host with all management OS vNICs connected to it.

image

vSphere follows the same basic model except for port groups configured for the VM type and VMKernel ports configured for host functions:

image

Performance results

The tables below summarize the user densities observed for each platform during our testing. Please refer to the product architectures linked at the beginning of this post for the detailed performance results for each solution.

image

image

Resources:

http://en.community.dell.com/dell-blogs/direct2dell/b/direct2dell/archive/2014/11/05/dell-world-two-questions-cloud-client-computing

http://blogs.citrix.com/2014/11/07/dell-launches-new-appliance-solution-for-desktop-virtualization/

http://blogs.citrix.com/2014/11/10/xendesktop-technologies-introduced-as-a-new-dell-wyse-datacenter-appliance-architecture/

http://blogs.vmware.com/euc/2014/11/vmware-horizon-6-dell-xc-delivering-new-economics-simplicity-desktop-application-virtualization.html

http://stevenpoitras.com/the-nutanix-bible/

http://www.dell.com/us/business/p/dell-xc-series/pd

Dell XC Web Scale Converged Appliance

That’s a mouthful! Here’s a quick taste of the new Dell + Nutanix appliance we’ve been working on.  Our full solution offering will be released soon, stay tuned. In the meantime, the Dell marketing folks put together a very sexy video:

Nutanix Node Removed from MetaData Store

image

My team at Dell is working feverishly to bring the new Dell and Nutanix partnership to fruition through new and exciting hardware platforms and solutions. Part of the process is learning and playing…a LOT.  A few weeks ago there was a very bad storm in Austin that knocked my team’s datacenter offline. Our four-node Nutanix cluster, along with everything else, went down hard. Fortunately three nodes came back up without issue, the 4th only partially. At this point we’re just doing POC work and functionality testing so there is no alerting configured. We had a failed Controller VM (CVM) on one of our nodes and didn’t know it. This is probably an extraordinary situation which is why I want to document it.

This cluster is running Server 2012 R2 Hyper-V with an SMB3 storage pool. The architecture is simple, all nodes and their disks participate in two layers of clustering: one private cluster controlled by and run for Nutanix, the other a failover cluster for the Hyper-V hypervisors. SMB resources cannot be owned by a cluster in Hyper-V anyway as there are no disks to add to the cluster, this is simply a namespace that you utilize via UNC pathing. The two clusters operate completely independent from one another. The physical disks are owned by the Nutanix CVMs and are completely obscured from Hyper-V. So even though our 4th node was fine from a Hyper-V perspective, able to run and host VMs, the CVM living on that node was caput as were its local disks, from a Nutanix cluster perspective.

In Prism, it was plain to see that this node was having trouble and the easy remediation steps didn’t work. Rebooting the CVM, rebooting the host, enabling the metadata store, all had no effect, neither did trying to start the cluster services manually. I removed the host from the cluster via Prism hoping I would be easily able to add it back. 

image

Once the disks had been completely removed in Prism, the remaining nodes could see that this CVM and it’s physical resources were gone. Unfortunately, I was unable to expand the cluster and easily add this node back into the mix. I could not open the prism console nor the cluster init page on this CVM. To clean up the metadata and “factory reset” this CVM I ran the following command in the CVM’s console:

cluster –f –s 127.0.0.1 destroy

Once complete, I tried to expand the cluster again in Prism and this time the CVM was discovered. Woot!

SNAGHTML91ec7e23

Cluster expanded successfully and all is well in the world again, right? Not quite. The disks of the 4th node never joined the pool even though Prism now saw four nodes in my cluster. Hmm. Running a “cluster status” on the CVM revealed this:

image

We’re not out of the woods yet. Enter the Nutanix CLI to check the disk tombstone entries. Both of my SSDs and three of my SATA disks had been tombstoned as part of the previous configuration, so were being prevented from being assigned now.

image

One by one these entries needed to be removed so that the CVM and these disk resources could again be free to join the pool.

image

Now, perform a “cluster start” to start the services on this CVM and voila, back in business.

image

Check the current activities in Prism which reflect the node getting added (for real this time). 

image

All disk resources are now present and accounted for.

SNAGHTML9202dae3

Pretty simple fix for what should be a fairly irregular situation. If you ever run into this issue this is what solved it for us. Big shout to Mike Tao and Rob Tribe at Nutanix for the assist!