Product : Nutanix, ECP [HCI]/5.19, Ultimate
Feature : Node Failure Protection, Reads/Writes, Data Availability
Content Owner:  Herman Rutten
Summary
1-2 Replicas (2N-3N) (primary)
Erasure Coding (N+1/N+2) (secondary)
Details
Nutanix implementation of replicas is called Replication Factor or RF in short (RF2 = 2N; RF3 = 3N). Maintaining replicas is the default method for protecting data that is written to the Nutanix cluster. An implementation of erasure coding, called EC-X by Nutanix, can be optionally enabled for protecting data once its write-cold (not overwritten for a cerain amount of time). RF and EC-X are enabled on a per-container basis, but are applied at the individual VM/file level.

Replicas: Before any write is acknowledged to the host, it is synchronously replicated on an adjacent node. All nodes in the cluster participate in replication. This means that with 2N one instance of data that is written is stored on the local node and another instance of that data is stored on a different node in the cluster. The latter happens in a fully distributed manner, in other words, there is no dedicated partner node. When a disk fails, it is marked offline and data is read from another instance instead. At the same time data re-replication of the associated replicas is initiated in order to restore the desired replication factor.

Erasure Coding: Nutanix EC-X was introduced in AOS 5.0 and is used for protecting write cold data only. Nutanix EC-X provides more efficient protection than Nutanix RF, as it does not use full copies of data extents. Instead, EC-X uses parity for protecting data extents and distributes both data and parity across all nodes in the cluster. The amount of nodes within a Nutanix cluster determines the EC-X stripe size. When a disk fails, it is marked offline and data needs to be rebuild in-flight using the parity as data is being read, incurring a performance penalty. At the same time, data re-replication is initiated in order to restore the desired EC-X protection. Nutanix EC-X requires at least a 4-node setup. When Nutanix EC-X is enabled for the write cold data, it automatically uses the same resiliency level as is already in use for the write hot data, so 1 parity block per stripe for data with RF2 protection enabled and 2 parity blocks per stripe for data with RF3 protection enabled. Nutanix EC-X is enabled on the Storage Container (=vSphere Datastore) level.


AOS 5.18 introduces EC-X support for Object Storage containers.