Mar 15, 2023|7 min|Products

Cohesity DataHawk: Continuing the AI/ML transformation of data security and management

After early customer access, we are proud to announce that Cohesity DataHawk is now generally available as of March 15, 2023. This is a significant milestone for organizations concerned about cyber resiliency as DataHawk brings AI/ML to data security and management to tackle the never-ending escalation of threats that challenge data recovery.

Helping CISOs and CIOs meet cyber resilience challenges

Is it safe? It’s a simple question posed to CISOs, CIOs, security and IT leaders on a continuous basis about their operations and data. The answer is probably a qualified one—it depends on the incident, threat, or circumstances and whether the organization has the adequate protections for their mission-critical data and processes.

IT faces a persistent cadence of threats that are growing in complexity and elusiveness and then implements countermeasures and tactics. A statement of the obvious, but this is why AI/ML have become a requirement for IT operations today as the rate and complexity of threats rise. We see AI/ML everywhere, from ChatGPT, to IBM Watson, to Netfilx’s recommendations, and Google search. And AI/ML is transforming security especially with SIEM/SOAR solutions, XDR, and threat intelligence. With the ingestion of vast amounts of data, baselines and outliers can be established to detect anomalies and unusual or suspicious activity.

AI/ML as a foundation for data security and management

Data security and management is no exception as the AI/ML journey began years ago with anomaly detection, scheduling, and optimizing. As modern day threats now threaten the ability of organizations to leverage their backup data for recovery, AI/ML has a critical role in data security and management: identify recovery threats and vulnerabilities and help the organization access the impact on sensitive data from an incident.

Identify threats with AI/ML

Backup data is foundational to data security and management. Based on criticality, organizations take snapshots of data in case they are needed to recover from ransomware, disasters, or other cyber incidents. These snapshots contain what was present in the production data—unfortunately, that may include elusive malware that may have evaded their cyber defenses.

Threat protection can be used to identify threats in these snapshots in two ways: proactively and when an incident occurs such as ransomware. Cohesity leverages AI/ML to detect user and data anomalies that could indicate an emerging attack, utilizing threat intelligence to ensure recovery-data is malware free. This automates the arduous and manual threat-hunting tactics that rely on security analysts to create YARA rules from various threat sources and feeds. This manual approach lacks scalability in depth and breadth. An organization can only search for a few rules across a few data sources.

When faced with a ransomware attack, organizations must scale to ensure that critical data is safe for recovery so that malware does not immediately reinfect the environment and create another crippling encryption event across data stores. Immediate and push-button execution of threat detection in backup snapshots is foundational to maintaining RPO/RTOs that support an organization’s SLAs and business objectives.

So what is required to effectively automate threat scanning for data security and management and why?

  1. The latest threat intelligence powered by AI/ML: Threat actors continually morph their exploitation tools to evade detection. With a threat intelligence aggregator and curator, threats are tested, validated, and vetted by numerous sources. These threat lists are continually updated so that organizations can ensure that recently discovered threats will be detected.
  2. Point and click simplicity: When a ransomware attack is suspected, teams need to move quickly in their response and potentially efforts. With the right tools, security and IT teams can both run parallel processes to ensure that attacks are confirmed, remediation and recovery operate seamlessly. In data security and management solutions, simplicity allows non-security administrators to proactively validate recovery data so that if a recovery is needed, they stand by the ready with recovery data that will not spawn reinfections.
  3. Scalability: With point-and-click simplicity organizations need to quickly scan their backup snapshots for indicators of compromise with threat detection. They need a solution that can handle their data footprint and support multiple sources in a geographically dispersed architecture. All without violating privacy and residency requirements.

AI/ML to assess attack exposure

In the instance an attack has occurred, response teams have enumerable responsibilities. Critically, they need to assess what data exposure may have occurred. Data exposure has several implications for an organization. First, what customer and employee data could have been compromised? With that intelligence organizations can make informed decisions on privacy and regulatory responses that are needed, such as notifications and remedies to affected parties. Second, have trade secrets or other sensitive information been exposed and what legal consideration should be considered? Third, what operational data was leaked and how that may affect supply chains and partners? This is not an exhaustive list of considerations, but is provided to represent the various implications of data exposure and why organizations must have an accurate accounting of what data may have been impacted.

While organizations track sensitive data with many tools and processes (logical data models, enterprise architecture, data discovery and classification tools, data catalogs) they all have a central weakness. The weakness is simply shelf-life—what has changed since the last update of the tool(s) used to identify sensitive data? Given the massive rate of data growth and proliferation, it is safe to assume that there is some gap in what organizations know about their sensitive data. Certainly these tools and artifacts should be referenced, but the definitive conclusion about sensitive data exposure should be done immediately after an attack.

By examining the backup copies that were targeted in an attack, organizations can have the absolute latest intelligence to make the critical decisions enumerated above. Inferred in this approach is accuracy; the evaluation of data exposure should use the utmost precision to drive the appropriate responses.

So what is needed to drive a high degree of confidence that data exposure is accurately assessed and that the organization takes all appropriate measures?

  1. Extensive pre-fined data patterns and policies: Supporting national and global search requires an extensive variety of sensitive data definitions, driver licenses, national IDs, phone numbers, etc. To support these requirements, solutions should have extensive patterns that can be combined into policies to ensure that private, health and financial data is identified, regardless of origin and/or format.
  2. Accuracy of results driven by AI/ML: Accuracy is paramount to making the right decisions about potential data exposure. Solutions should be able to find sensitive data fragments and sensitive data that has been modified from standard formats. Rigid search techniques such as SQL and regular expressions will not suffice as they will not find sensitive data outside the bounds of the search term. AI/ML leveraging natural language processing provides the accuracy to identify data with precision that would be missed with structured techniques.
  3. Scalability: As with threat scanning, organizations need to quickly scan their backup snapshots for sensitive and confidential information. They need a solution that can handle their data footprint and support multiple sources in a geographically dispersed architecture. All without violating privacy and residency requirements.

Cohesity DataHawk: AI/ML for data security and management

As a recap of our DataHawk announcement, here are the critical capabilities organizations can use for their cyber resilience programs:

  • Threat protection that can save the day: DataHawk incorporates a deep learning-based ransomware detection engine. It also provides intelligent threat protection with rapid scanning for anomalies, potential threats, and other indicators of a ransomware attack. DataHawk integrates a set of highly curated and managed IOC (Indicators of Compromise) threat feeds that are updated daily. Organizations can leverage over 100K threat rules that ensure elusive malware is identified.
  • Data classification to quickly assess impact: When under attack, organizations want to rapidly understand any potential impact to their valuable data. DataHawk is leveraging the exceptional classification technology from BigID to accurately discover and classify large sets of data at scale to help minimize risk, improve their security posture, and understand the impact of an attack. Customers can save time chasing false positives and reach resolution faster with more than 200 built-in classifiers and ML-driven algorithms to analyze, tag, categorize, label, and classify data sets. Predefined policies for data privacy and protection regulations like the “General Data Protection Regulation” (GDPR), “Payment Card Industry” (PCI), and “Health Insurance Portability and Accountability Act” (HIPAA) help organizations quickly identify and prioritize these sensitive data sets.

In addition to these ML/AI driven capabilities, DataHawk includes Cohesity’s award-winning data isolation service, Cohesity FortKnox, to provide fail-safe protection and meet the best practices advised by CISA and government regulators:

  • Cyber vaulting that provides data recovery and resiliency when it’s needed most: Organizations should always keep separate copies of their critical apps and data as part of a 3-2-1 strategy to build cyber resiliency. With Cohesity FortKnox, included in DataHawk, customers can secure an offsite copy of data in a modern cloud-based cyber vaulting service, where data is kept out of the hands of bad actors via a virtual air gap. Stored data can be recovered from this Cohesity-managed cloud vault back to the original source location or alternate targets, including the public cloud.

For more product information and demos, please visit

Written by

Robert Shields

Robert Shields

Director Product Marketing, Data Security and Governance

You may also like


Introducing Cohesity DataHawk


A new era of Cohesity’s leadership in data security and management


Cohesity Data Cloud Release 7.0: Data management and security for cyber resilience

X image
Icon ionic ios-globe

You are now leaving the German section of and come to an English section of the site. Please click if you want to continue.

Don't show this warning again

Icon ionic ios-globe

You are now leaving the German section of and come to an English section of the site. Please click if you want to continue.

Don't show this warning again