Transparent Data Management with Modern Storage Accounting Framework

By Surya Swaminathan • October 22, 2019

Storage Accounting Framework is Cohesity’s answer to providing enterprise-grade transparency into data management with a next-generation user interface that presents fine-grain detail in a simpler way. In this post, we introduce new metrics and grouping that enable better chargeback, showback, capacity planning and forecasting. In addition, we explain global deduplication, its impact, and how to account for it in a shared storage environment. Finally, we discuss our approach to simplify storage computation for various use cases.

Storage Accounting Framework

In our efforts to modernize and simplify data management, Cohesity introduces a new storage framework powered by a next-generation user interface that allows Cohesity DataPlatform to better report on storage utilization, reduction & resiliency.

Key benefits of the framework include:

  • More accurate and fine-grain data
  • Data that better informs operational decisions (such as forecasting, capacity planning) and business decisions (e.g. chargeback, showback, etc.)
  • More flexible filters to analyze the storage metrics

The framework introduces new metrics and groupings that enhance operational insights and decision making.

A next-generation user-interface for data management presenting fine-grain detail – in a simple way.

New Metrics

For convenience and consistency, we have revised storage metrics. The new metrics are categorized as follows:

  • DataProtect
  • NAS
  • Ratios

The table below provides details on the terms used in the new storage accounting framework. Understanding these terms can aid in getting to precise and comprehensive awareness of data and storage utilization in Cohesity solutions.

Category Metrics Metric Definition
DataProtect Logical Size of Primary object
Data-in Data sent from Primary to Cohesity DataPlatform
Data-written Data written post reduction
Resiliency Impact Space consumed by resiliency setting
Storage Available Space available in cluster
Storage Consumed Data written after honoring resiliency setting
NAS Logical Logical data in view
Quota Logical quota
Physical Data Physical data stored (pre-resiliency)
Resiliency Impact Space consumed by resiliency setting
Storage Consumed NAS Physical data stored post applying resiliency setting
Ratios Data Reduction Space saved because of deduplication and compression.
(Ratio of data-in to data-written)
Storage Reduction Overall change in data footprint between source data to post resiliency consumption
(Ratio of logical data to storage consumed)

New Groupings

The above metrics are now available at a fine-grain (e.g. for each backup task, a replication task, or a NAS share, etc.) as well as an aggregate level for the following four logical groups:

  1. Cluster – The consumption metrics and the consumption trends for a cluster are captured.
  2. Storage domain – The consumption metrics from all the named storage location on a cluster is calculated.
  3. Organizations – The consumption per tenant/organization is reported. The metrics are particularly useful to service providers for chargeback.
  4. Consumers – Metrics are now aggregated and reported upon for each consuming service such as Backup, Replication, NAS.

Global Deduplication

Deduplication eliminates redundant copies of data to reduce storage consumption on Cohesity cluster. It ensures that only unique instances of data are transferred over the network and retained on storage media.

When certain metrics are aggregated (at, for example, a protection task level, a NAS share level, or an organization level), the effects of deduplication must be considered. It is not obvious how to account for shared chunks of data. For example, if there is a chunk of data shared by both backup task #1 and backup task #2, should that data chunk’s “storage consumed” be attributed to task #1 or task #2?

Here’s our approach: the “storage consumed” (i.e. physical bytes of storage used) for a protection task is computed as if there were no other protection tasks in the system. That is, as if the task were in its own private dedupe domain. That’s a key insight. We use the same approach when calculating the “storage consumed” for an individual NAS share (i.e. a Cohesity View), or an organization. This approach has the benefit that such “storage consumed” numbers can be used directly for chargeback/showback.

Furthermore, because they measure the physical bytes of storage used, the metric is suggestive of the capacity able to be reclaimed if some elements of the protection task, view or org were to be deleted. In other words, this is useful for capacity management. Technically, for example, if a protection task were deleted, not all of its “storage consumed” would be freed up. Instead, the “unique” data chunks would be immediately freed, and the “shared” data chunks would have their reference counts decremented.

An important side-note when looking over these numbers:

  • Physical consumption metrics (Data Written, Physical Data & Storage Consumed) cannot be summed up to avoid double-counting the shared data chunks. In other words, for example, adding physical numbers across protection tasks will not reflect the actual storage consumption because of global deduplication.

As providing a best-in-class user experience is a two-way street, we diligently engage with our customers to understand their expectations. With the Storage Accounting Framework, we aim to give our customers more information and fine-grain insights, grouped in logical categories to address diverse real-world use cases.

Cohesity’s Karandeep Chawla and Yu-shen Ng from Product Management, and Sanjeev Desai from Solutions Marketing contributed to this blog.