Cohesity Delivers One-Click, Non-Disruptive Upgrades

By Damien Philip • August 19, 2015

In my 16 years of being an sysadmin/storage engineer, I have been lucky to always have worked on bleeding edge technologies way before they became GA products. Each of these products were MVPs (Minimum Viable Products), and after launch, became the leaders in their own domains.

The reason I work with early technologies is because I love working with some of the most brilliant people in Silicon Valley and am amazed by the amount of thought and research that goes into developing an enterprise-ready product.

The first MVP product I worked on was Cisco MDS SAN switches back in 2001, almost one plus year before they became generally available (GA), followed by Cisco UCS servers in 2008, and most recently, Nutanix in 2012. Working with the core engineers who were developing the product and being able to provide feedback and define direction of the product is what motivates and excites me.

Here are some examples of the features that were available in a first generation product that enabled its great success:

  • Cisco MDS 1.0 was released with a feature called VSAN that enabled customers to converge workflows of production and test/dev onto the same switch, simplifying the architecture and design
  • UCS introduced the concept of stateless computing
  • Nutanix created a new category of infrastructure, hyper-convergence, which integrated the concepts of distributed storage and computing

One thing that was common with each of these products was that they had at least one mind-blowing feature that was way ahead of its time in the industry at that time of launch; and based on that feature, customers gave these products a chance and made them successful.

Fast forward to today, I have yet again started working with another generation one product in a startup company called Cohesity. At GA, Cohesity will bring to market the most mature and most enterprise ready first generation product I’ve ever had the chance to work with. I have been working with the product for the last 6 months now, and I am amazed by the innovation, design, and effort a team of over 30 engineers and product managers have put in to make this product stand apart from any other secondary storage solutions out there.

Cohesity is solving one of the biggest problems of data growth and management in the world today by converging secondary storage workflows onto a single, infinitely scalable data platform.

The product at GA will have non-disruptive software upgrades, which is something that even established legacy platforms do not have.  Just as an example, one early adopter of the Cohesity Data Platform (as part of our Early Access Program), told us that it takes him about three months of planning and coordination with various different teams before he can actually take a downtime to upgrade his current secondary storage solutions. Due to the amount of work that needs to go in just for planning, they are limited to performing software upgrades just once a year. When this customer began using the Cohesity Data Platform, he was blown away by the fact that Cohesity has already integrated one-click, non-disruptive upgrades into its Early Access Release! Here is what he had to say when he experienced this feature in action:

Here’s how it works.

When the software on the system needs to be upgraded, all the admin has to do is download the code, and initiate a one-click upgrade. Then, the admin can sit back and wait for an entire 8-, 16-, or n-node cluster to be upgraded automatically. Gone are the days where extended periods of planning need to be done. Gone are the days when applications need to shut down for a software upgrade to take place. Cohesity’s upgrade process is completely non-disruptive for all data and management operations. The concept of Virtual IPs (VIPs) handle ingress and egress storage IO traffic without being affected by node reboots or binary upgrades. VIPs are used for application facing ingress and egress traffic, and if the node they are bound to reboots during software upgrade, the VIP is moved to another active node to handle all IO.

A distributed lockmgr in the platform is responsible to make sure that the upgrade of each and every node in the cluster completes. When the upgrade is triggered, the distributed lockmgr issues a token to all the nodes in the cluster and forces the nodes to participate in a race condition.

All the nodes race to the lockmgr with the tokens that were issued, and the lockmgr then allows the winner of this race condition to go through the upgrade process, and then waits for the node to come back and report its new version. While the node is upgrading, the VIP for this node floats over to another node in the cluster; hence the IO and other services are not affected. Once this node has upgraded to the new version, the lockmgr allows serially each and every other node in sequence to upgrade.

The crux of the matter is the ability to perform non-disruptive upgrades is a very difficult feature to deliver. The inclusion of a feature such as this for GA gives an idea of how focused the Cohesity team is to deliver a next generation secondary storage platform.