Copy Fail: What Cohesity Administrators Need to Know About CVE-2026-31431

Overview

Cohesity REDLab analyzed CVE-2026-31431, a critical Linux kernel local privilege escalation vulnerability publicly known as Copy Fail. Disclosed on April 29, 2026, by researchers at Theori, this flaw allows any unprivileged local user to gain root access on virtually every major Linux distribution shipped since 2017. The vulnerability carries a CVSS 3.x score of 7.8, and a publicly available proof-of-concept exploit has been confirmed to work reliably across Ubuntu, Red Hat Enterprise Linux, Debian, SUSE, Amazon Linux, and AlmaLinux.

Copy Fail affects production Linux systems across the enterprise — application servers, database hosts, Kubernetes nodes, CI/CD runners, file servers, and cloud compute instances. For Cohesity administrators, the primary concern is safeguarding the production workloads, ensuring data remains secure, recoverable, and resilient against cyber threats and operational disruptions. Compromised production systems that remain unpatched become compromised snapshots inside Cohesity Data Cloud. If an attacker has achieved root on a Linux server through Copy Fail and planted backdoors, implants, or modified system binaries, those artifacts will be faithfully captured in your next backup. A clean recovery depends on clean source systems, and Copy Fail puts that assumption at risk across your entire Linux estate.

This advisory explains how Copy Fail works, why it matters for data protection teams, and what Cohesity administrators should do right now — in coordination with their infrastructure and security colleagues — to help identify affected systems, drive patching, and ensure that compromised workloads do not silently propagate through the backup lifecycle.

Recommended Actions for Cohesity Administrators

Copy Fail is a Linux kernel vulnerability. Cohesity administrators occupy a unique vantage point in the organization: you have visibility into what is being protected, how often it changes, and whether something looks wrong. The actions below focus on using that vantage point to support the broader remediation effort and to keep your backup data trustworthy.

1. Partner with Infrastructure and Security Teams to Drive Kernel Patching

The definitive fix for Copy Fail is applying vendor-issued kernel updates to affected production Linux systems. While patching is the responsibility of your infrastructure and platform engineering teams, Cohesity administrators can play a valuable coordinating role:

Share your protection source inventory with the infrastructure team. You know which Linux systems are being backed up, and that list is a useful starting point for identifying systems that need patching.
Advocate for prioritizing systems that run in multi-tenant or container environments. Copy Fail can be exploited from within an unprivileged Kubernetes container to compromise the underlying host node, making Kubernetes worker nodes and CI/CD runners especially urgent targets.
Coordinate patching schedules with backup windows. Kernel updates typically require a reboot. Work with infrastructure teams to align patching with your protection schedules so that a clean, post-patch backup is captured as soon as possible after remediation.

2. Help Identify Unpatched Systems Through Your Backup Inventory

Cohesity administrators often have one of the most complete inventories of Linux systems in the organization - ideally every production workload flows through the backup platform. Use this visibility to support the patching effort:

Export your list of protected Linux sources and share it with the security operations or vulnerability management team. They may have blind spots in their asset inventory that your protection source list can fill.
Flag any protected systems where backup jobs have recently started failing or behaving unexpectedly. While there are many benign causes for job failures, a sudden change in behavior on a Linux host after the Copy Fail disclosure could warrant investigation.
Identify workloads that are running on older Linux kernels (4.14 through 6.19.12, the affected range) if that information is available through your protection source metadata or through coordination with the infrastructure team.

3. Scan Existing Snapshots for Signs of Compromise

If Copy Fail has already been exploited on a production system before patching, the evidence of that compromise may be present in your backup snapshots. Use Cohesity's threat detection capabilities to look for indicators:

Rapid Threat Hunt: Run a scan using the Cohesity default threat library across recent backup snapshots of Linux workloads. Look for indicators associated with privilege escalation tooling, post-exploitation frameworks (such as Sliver, Metasploit payloads, or cryptocurrency miners), or unexpected files in directories like /tmp, /usr/lib, or /var/run.
Anti-Ransomware scan: Perform a global Anti-Ransomware scan to detect unusual data churn, entropy spikes, or unexpected file modifications within recent snapshots. Attackers who have gained root through Copy Fail may stage additional tooling or begin exfiltrating data before launching a destructive attack.
Anomaly detection: Review anomaly alerts for protection jobs that show sudden changes in data change rates or backup durations. A spike in changed data on a Linux host that has not undergone a planned update could indicate unauthorized modifications.

These scans serve a dual purpose: they help identify already-compromised systems, and they establish a baseline so that you can distinguish between clean pre-exploit snapshots and potentially tainted post-exploit ones during a future recovery scenario.

4. Ensure Recovery Readiness by Identifying Your Last Known-Good Snapshots

In the event that a production system is confirmed compromised through Copy Fail, the recovery conversation will immediately turn to your backup data. Prepare for that conversation now:

For critical Linux workloads, identify the most recent snapshot taken before the Copy Fail proof-of-concept became publicly available on April 29, 2026. These snapshots represent your most reliable known-good recovery points for systems where compromise timing is uncertain.
Confirm that retention policies on these pre-disclosure snapshots will keep them available long enough for the organization to complete its patching and investigation cycle. If necessary, apply extended retention or legal hold to preserve these recovery points.
Validate that DataLock (WORM) is enabled on critical backup policies so that these known-good snapshots cannot be modified or deleted, even if an attacker has compromised credentials elsewhere in the environment.

If you are using Cohesity FortKnox for cyber vaulting, verify that recent vault copies are current and that vault access policies are enforced. An isolated copy provides a recovery path even in a worst-case scenario.

5. Establish a Post-Patch Backup Cadence

Once the infrastructure team has patched a production system, it is important to capture a clean backup promptly:

Coordinate with infrastructure teams to trigger an on-demand backup immediately after a system has been patched and rebooted. This ensures that you have a fresh, known-clean snapshot that reflects the remediated state.
Consider running a Rapid Threat Hunt scan against the first post-patch snapshot to confirm that the system was not compromised prior to patching and that no residual attacker artifacts remain.
For systems where compromise is suspected but not confirmed, discuss with the security team whether a clean rebuild followed by a fresh backup is more appropriate than patching in place.

6. Communicate the Backup Team's Role in the Response

Vulnerability response is often led by security operations and infrastructure teams, and the backup team may not be included in early coordination calls. Be proactive:

Reach out to your security operations center or vulnerability management team and offer the visibility that Cohesity provides — asset inventory, anomaly detection, and the ability to scan snapshots for indicators of compromise.
Make clear that the backup team needs to be in the loop on patching timelines so that backup schedules can be aligned, known-good snapshots can be preserved, and post-patch backups can be captured.
If your organization runs tabletop exercises or incident response drills, advocate for including a Copy Fail scenario that tests the recovery workflow from backup, including the process of identifying the correct pre-compromise snapshot and validating its integrity before restoring.

Technical Details of Copy Fail (CVE-2026-31431)

Root Cause

Copy Fail is a deterministic logic flaw in the Linux kernel's cryptographic subsystem. The vulnerability resides in the algif_aead module, which implements the AEAD (Authenticated Encryption with Associated Data) socket interface within the kernel's user-space crypto API, known as AF_ALG.

The flaw resulted from the convergence of three independent kernel changes introduced over several years: the addition of the authencesn algorithm in 2011, AF_ALG gaining AEAD support in 2015, and a performance optimization in 2017 that introduced an in-place operation mode for AEAD encryption. This 2017 optimization caused the source and destination buffers to share a single scatterlist, which meant that page cache pages provided by the splice() system call were improperly placed in a writable position within the destination scatterlist.

During cryptographic operations, the authencesn algorithm uses the caller's destination buffer as a scratch pad to rearrange extended sequence numbers. Because of the shared scatterlist, this scratch-pad operation writes four controlled bytes past the legitimate output region, directly into the kernel's file page cache. Critically, the algorithm fails to restore those bytes afterward.

Exploitation Mechanism

An unprivileged attacker exploits this by opening an AF_ALG socket, using splice() to pass page cache pages of a target file (such as /usr/bin/su) into the crypto subsystem, and then triggering the four-byte overwrite. The attacker controls the exact value written by specifying the low half of the sequence number in the Associated Authenticated Data during the sendmsg() call. The attacker controls where the overwrite lands by manipulating the splice offset, splice length, and associated data length parameters.

By targeting the page cache of a setuid-root binary, the attacker can inject a small shellcode payload into the binary's in-memory representation. When that binary is subsequently executed, it runs with root privileges and executes the attacker's code.

Why Copy Fail Is Especially Severe

Several characteristics distinguish Copy Fail from prior Linux privilege escalation vulnerabilities:

Deterministic and race-free. Unlike Dirty COW or Dirty Pipe, Copy Fail does not rely on winning a race condition. The exploit fires successfully on the first attempt, every time.
Highly portable. The entire exploit fits in a 732-byte Python script that uses only standard libraries and runs unmodified across all major Linux distributions shipped since kernel version 4.14 (2017).
Stealthy. The corruption occurs entirely in the page cache in RAM. The on-disk binary remains unmodified, which means traditional file integrity monitoring tools will not detect the change. Once the page is evicted from the cache or the system reboots, the evidence disappears. However, any persistent changes an attacker makes after gaining root — dropped binaries, modified configurations, planted backdoors — will be written to disk and captured in subsequent backups.
Container-escapable. Because the kernel and its page cache are shared across an entire host, an attacker running inside an unprivileged container can use Copy Fail to modify a setuid binary on the host, effectively escaping the container and gaining root on the node.

Interim Mitigation for Affected Production Systems

The definitive fix is applying vendor-issued kernel updates. If patching is not immediately possible, the infrastructure team can disable the affected algif_aead kernel module. Please refer to your distribution’s vendor for mitigation details. CERT-EU has also offered mitigation details that are available on their website.

Important caveat: On some distributions, the algif_aead module is compiled directly into the kernel (CONFIG_CRYPTO_USER_API_AEAD=y) rather than loaded as a separate module. In that case, modprobe.d rules have no effect and rmmod cannot unload it. Infrastructure teams should verify whether the module is built-in on each distribution in use. For built-in configurations, kernel command-line parameters or a patched kernel are the only reliable mitigations.

This workaround does not affect dm-crypt/LUKS, kTLS, IPsec/XFRM, OpenSSL, GnuTLS, NSS, or SSH, so it should not disrupt normal application or backup agent operations on production hosts.

The upstream fix (commit a664bf3d603d) resolves the issue by reverting the 2017 optimization, separating the source and destination scatterlists so that page cache pages remain strictly in the read-only source buffer.

Conclusion

Copy Fail is a reminder that a single vulnerability in the Linux kernel can ripple across the entire production environment — and that the consequences extend into the backup lifecycle. Every unpatched Linux system that gets backed up carries the risk of preserving an attacker's foothold alongside the legitimate data your organization depends on for recovery.

Cohesity administrators are uniquely positioned to contribute to the response. You maintain one of the most complete inventories of production Linux systems in the organization. You have tools to scan snapshots for indicators of compromise. You control the retention policies that determine whether a known-good recovery point will still be available when it is needed. And you can coordinate with infrastructure and security teams to ensure that patching and backup schedules work together rather than at cross-purposes.

The work here is collaborative. Patching belongs to the infrastructure team. Threat investigation belongs to the security team. But ensuring that backup data remains clean, trustworthy, and recoverable — that is the data protection team's contribution to the response, and it is essential.

For the latest advisories and technical details, visit Cohesity REDLab.