data backup and recovery challenges-with cassandra snapshots banner

Data Backup and Recovery Challenges With Cassandra Snapshots

By Jay Desai • June 18, 2020

In a previous post, I described some of the storage amplification and management overhead challenges of using Cassandra database snapshots as part of your data backup  strategy. In this post, I’ll describe why restoring from Cassandra snapshots is cumbersome and challenging.

Here’s a Cassandra backup and recovery scenario that resulted from a developer accidentally deleting an important keyspace in a large production database. The accidental keyspace deletion occurred today around 8am and the closest snapshot taken on that Cassandra cluster was last night at 10pm. That snapshot is the basis of the data recovery, or restore, that needs to happen today as a result of this error. To complicate matters, after last night’s backup, the topology of your production Cassandra cluster has changed because two new nodes have been added. When these nodes were added, the token distribution changed. 

Because of the topology change, you can no longer just copy files from the snapshot directory to the original storage directory and conduct a nodetool refresh to restore the data. You will need to reshard the data to account for the two new nodes and the change in the token map distribution. This can be done by manually running the SSTableLoader utility on the corresponding Cassandra nodes to load the data from last night’s snapshot. This is a tedious and time consuming process that has to be repeated for every table that needs to be recovered.

Another issue during Cassandra data recovery occurs if the user has changed the replication properties  of the table or keyspace, for example changing the replication factor or strategy. Any change to keyspace replication requires resharding of the data according to the changed property which involves running nodetool repair—a manual process.

If you have a large Cassandra cluster, the number of tasks and complexity is compounded. The snapshot restore will have to be done on every node of the cluster. If you have multiple tables that need to be restored, your operational overhead will exponentially increase. In addition, different tables could have been snapshotted at different intervals, depending on the requirements for recovery point or recovery time objectives. This makes finding the suitable snapshot for recovery or restore into a very involved process. 

Is Cassandra snapshot restore possible? Yes. Is it easy? No.

We’ve spent considerable time at Cohesity designing what we believe is the right architectural approach to data protection for Cassandra databases and other modern data sources. Read earlier blog posts on this topic or the Cohesity solution for Cassandra backup and recovery.