Cohesity – shining light on dark data

By Jerome Joseph • July 8, 2016

Why do we need a secondary storage?

With compute, memory and a network that can provide a sub millisecond latency to applications, the storage latency of 5-15ms becomes a bottleneck when a very low latency is required. This situation sparked a new revolution in the primary storage business with the advent of the expensive low latency All Flash Arrays (AFAs) and hybrid storage (SSD+HDD with Intelligent Lifecycle Manager -ILM).

Primary Storage

a) All Flash Arrays
All Flash Array (AFA) storage systems are expensive, so using it for any purpose other than to read and write the application data is not cost-effective. If the storage architects want to replicate the primary data to a remote site for Disaster Recovery(DR) purpose, they need a similar expensive High Availability(HA) pair on the remote site as well.

b) Hybrid Storage Arrays
The storage architects can use the hybrid storage array, but then they have to right size the SSD tier. the active workset can become unpredictable due to the change in application workload, multiple reads by the backup subsystem, and when running the analytics on the primary storage.

So, in addition to managing the application performance expectations, the following features must be provided

  • Disaster recovery solution ( at minimum, double the cost of expensive storage arrays)
  • Backup to tape/VTL/cloud using expensive backup software that requires media agents, virtual server agents, etc.
  • Reports on data growth, type of data, etc ( Internal or External Analytical Engine)

Many of the features can be offloaded to the secondary storage system.

Secondary storage

The ability of secondary storage to take on many of the functions of the primary has created a huge market.

Shortcomings of the existing backup products:

  • UI is not elegant and unintuitive
  • Purpose Built Backup Appliance (PBBA) is expensive and not scalable. It is a hodgepodge of multiple incompatible solutions stuck together (backup software, storage, replication software, and cloud connector from multiple vendors).
  • The backup software and media/VSA agents are complex to setup.

In short, the existing backup solution is built for backup and it cannot do anything else.

In addition to backup, the following features make the secondary storage platform interesting.

  1. Workflow: Easy to deploy, easy to scale, easy to operate with an intuitive self-guided UI
  2. Test and Dev Workloads: Optimized for backup and test/dev workloads at the right price/performance point
  3. Data Reduction Technologies: Higher Compression and dedup ratio for a longer retention period.
  4. Snapshot and Change Block Tracking(CBT): Hydrated snapshots for a low RPO/RTO, forever incremental backup and instant recovery
  5. Data Throttling: By controlling the ingest, storage architects can backup during work hours without affecting the application performance
  6. Intuitive and customizable backup scheduler
  7. DR capabilities: Ability to replicate from one secondary storage to another remote secondary storage without affecting the application performance
  8. Analytics workBench: Analyze the data for trends without affecting application performance and ability to build custom search capabilities. It utilizes big data analytics to analyze dark data
  9. Cloud Archival: Easy workflow and support for multiple cloud providers
  10. Automation and customization: Programmable interface (REST API) , so customers can automate their backup and replication workflows (data pane) in addition to managing the dashboards (control pane)
  11. Backup and Restore anything: Ability to backup virtual machines and physical machine
  12. Application Aware Backup and Restore: Adapters for various applications to provide app consistent backups with log file truncation and the abilit to do granular restores ( SMBR- single mailbox restore, FLR – file level restore)
  13. Reuse the existing backup solution: It should act as a backup target for Commvault, Netbackup, Veeam and other backup solutions
  14. Encryption: Ability to encrypt the data when written locally or to the cloud

I will blog about these features in the future. I am still learning the ropes of secondary storage and would love your feedback.