Cohesity Cluster Upgrades

By Patrick Lundquist • November 30, 2016

Cohesity innovates fast. With fast innovation comes new releases packed with an ever growing list of new and updated features including dedup, compression, fully hydrated snapshots, software based encryption, mixed workloads, global search, analytics, and node by node cluster expansion.

All this new functionality is great, but our customers also tell us that they don’t want to deal with frequent disruptive upgrades. To provide access to all the latest and greatest features with minimum disruption, Cohesity has developed a smooth and painless upgrade workflow. This same upgrade workflow is used for software upgrades, security updates, firmware upgrades, and even OS upgrades. We keep it simple. One upgrade workflow for everything.

In this post, we’ll go over how Cohesity’s upgrade process works and why designing upgrade right is so important for any enterprise-grade system.

Active-Passive Root Partition Scheme

Linux users may be familiar with the common update workflows using yum or apt-get. Those package managers have been known to cause problems when they move configuration files around, force daemons to use new libraries, or leave the system in an inconsistent state when interrupted. Using these package managers for an enterprise quality system can result in an inconceivable number of permutations of libraries and executables (usually referred to as dependency hell). Recovering from such failure scenarios can be a puzzle. When Cohesity gets a support call, our goal is to address the issue in the earliest possible time. We achieve that by maintaining a consistent system image across all the appliances in the field.

Thankfully, the Cohesity upgrade avoids all those issues. Each node uses an active-passive root partition scheme to make our consistent upgrades possible. One active partition hosts the current OS, while another passive partition is reserved for upgrades. We deploy our upgrades to the passive partition and, in one atomic operation, set the passive partition to become active after a restart. And while the node restarts, there is no disruption to the cluster as we’ll discuss in the next section. The node’s entire OS is upgraded at the same time. That means a Cohesity node is never in a partially upgraded state. It also means that our upgrades are powerful enough to change nearly any part of the system, which makes our software more future proof.

Partition P1 is active while the upgrade is deployed on passive partition P2.
After restart, partition P2 is active and running the new software, while partition P1 is passive.>

Non-disruptive Rolling Upgrades

For some other storage systems, you may discover that upgrades and other maintenance tasks require downtime. With different groups of users and applications using the storage system, it can be difficult if not impossible to negotiate a downtime window that satisfies all stakeholders. Furthermore, scheduling downtime can put the business’s RPO and RTO at risk, and in the case of critical security updates, delaying a