Adding Two-Node Fault Tolerance to Ensure Data Integrity for Even the Largest Clusters
Google processes around 6 billion searches per day and just two months ago Gmail reached 1.5 billion users. Google and all the other hyperscalers such as Facebook or AWS have built massive data centers to provide the scale to handle those kinds of workloads. The rise of the hyperscalers introduced a bold new concept: The expectation of failure. What do I mean by that? Before these new web-scale architectures, conventional wisdom was to design systems with five nines availability. However, the new hyperscaler architecture consisted of infrastructure built from low-cost commodity hardware, which is bound and expected to fail. The difference was the software, which was designed to ensure business continuity, and provide uninterrupted availability and fault tolerance when any service, node, or hardware component fails.
Cohesity is built on the same web-scale principles as the hyperscalers. At the core of our solution is a fully distributed architecture where every node in the cluster runs the same software services. We call this distributed file system, SpanFS. There is no single-point-of-failure and a defect in one service or any hardware failure doesn’t affect any of the other services. The software-defined distributed architecture lays a strong foundation to provide the benefits of scalability, resiliency, and data integrity to our enterprise customers.
To further improve resilience and to ensure the integrity of data in even your largest clusters, our Pegasus 6.1.1 release uniquely provides two-node fault tolerance. Similar to the hyperscalers, we see a trend amongst our enterprise customers to increase the number of nodes in a single cluster, rather than installing multiple smaller-sized clusters. With two-node fault tolerance the system can withstand the simultaneous failure of up to two random nodes (in clusters with greater than 4 nodes) and operate without degradation by distributing the workloads to the other surviving nodes.
Each node stores file data and corresponding metadata and in the case of a node failure both file and metadata become unavailable. File data is reconstructed from the surviving nodes that have units from the same stripes. This reconstructed data is placed in nodes that have available (spare) capacity. File metadata is constructed from surviving nodes by copying the replicas from other nodes and placing them in nodes that have the most available SSD capacity. Note that the replacements are made to ensure the same resiliency as at the beginning of the cluster lifetime.
Two-node fault tolerance works hand in hand with the existing two-disk fault tolerance to provide granular control. While two-node fault tolerance is configured at a cluster level, individual storage domains can be configured to survive the loss of two random disks.
In addition to our distributed architecture and two-node / two-disk fault tolerance, other key capabilities that we already deliver resilience include:
Strict consistency: Cohesity uniquely delivers strict consistency at scale, which guarantees that reading the same object on different nodes by different clients always returns the most recently written value. Click here to learn more about how we achieve strict consistency.
Erasure coding (EC): This transforms data fragments into a form that allows the recovery of the original data from a subset of the fragments.
Replication factor (RF): This protects against failures of a unit of data by creating replicas.