With the growing popularity of NoSQL databases—such as Cassandra, Couchbase, and MongoDB—organizations are generally comfortable running large-scale, mission-critical applications in production. Most of these applications are key to business success, so data protection issues with these applications can have severe negative consequences.
These impacts can manifest themselves in many ways, including millions of dollars in lost revenue, permanent data loss, customer attrition, negative brand perception, and higher IT costs. Data management—incorporating database backup and recovery, or restore—is a critical infrastructure element that needs to be carefully thought through.
In the SQL database world, robust solutions for data management are available and implemented at the time of production, but this is not the case in the NoSQL world. Data management tools lag in functionality and do not meet the requirements of mission-critical applications.
In this post, I share several common data management challenges voiced by Cassandra database customers — and these situations are equally applicable to other NoSQL environments.
Real-World NoSQL Data Protection Challenges
Someone accidentally deleted data from a production database using the Truncate command. Can I restore my data?
Fortunately, yes because auto_snapshot was enabled. Otherwise, the data would have been lost. However, the snapshot folder holds a number of snapshot files, and it’s unclear which one(s) to use to recover the data. It will take hours to locate the right files and manually recover the data, and application downtime will result. Data recovery need not be cumbersome.
We have a lot of product quality issues because our QA team is not testing with production data sets.
Currently the QA team tests with static and fabricated data sets. With this fake data set, the QA team does not perform real-world tests. Unfortunately this means once the software is deployed into production environments, issues can and do arise.
We constantly run out of space on our Cassandra nodes, and every time this happens, it’s a fire drill to add more storage.
It’s convenient to do daily backups of the Cassandra database with snapshots. Snapshots don’t take up a lot of space but every time compaction starts reorganizing the SSTables and creating new files, snapshot storage utilization shoots up significantly and storage consumption on the Cassandra cluster reaches 100 percent. Fire drill!
We used to keep two weeks’ worth of Cassandra backups in snapshots. But since we added new brands to our application, we can keep only two days’ worth of backups. That doesn’t meet our service-level agreements (SLAs).
The issue here and in the third challenge is the same. With more data being loaded into Cassandra, there is less space available for snapshots, hence the reduction in the number of backups and the inability to meet SLAs.
My Cassandra production database is onsite and I’m using Amazon S3 for storing Cassandra backups. Our backups take a long time and my monthly Amazon bills are going up.
Two challenges arise with this deployment. First, since the backups go over a wide-area network, every large backup (full or otherwise) takes a long time to complete. Also notably, for every compaction, backup data volume can be sizable. This situation leads to the second challenge: rising Amazon S3 bills. Because of the large volume of data generated by periodic full backups and compaction, storage requirements keep growing and so does the Amazon S3 bill.
For each of the Big Data stores—Cassandra, MongoDB, and Hadoop—in our environment, the backup and recovery tools and procedures are completely different. This makes our environment operationally difficult to manage.
For organizations like this, Big Data backup and recovery becomes more complex and must be done manually. Although each Big Data technology includes its own command-line interface for backup and recovery, the CLIs alone are insufficient for ensuring automated and error-free backups. Developers must write (and maintain) wrapper scripts for each data store to automate the backup process on each node, manage space on each node, and clean up older backups that are no longer required. Then, for consistent backups and reliable recoveries, operations teams must master each script.
Learn more about Data Management for NoSQL Database
Having heard these stories many times, we at Cohesity embarked on building an enterprise-grade backup and recovery solution for Cassandra.
Read more about the Cohesity Solution for Cassandra backup and recovery.