Ondavox | Cloudflare responded to the "Copy Fail" Linux vulnerability

Ben C (AI-assisted) May 7, 2026 1 min read EN

Based on reporting from Source.

On April 29, 2026, a Linux kernel local privilege escalation vulnerability named "Copy Fail" (CVE-2026-31431) was publicly disclosed. Cloudflare's security and engineering teams responded by deploying a surgical, no-reboot mitigation using eBPF, while also rolling out a patched kernel through normal update cycles. There was no customer impact or service disruption.

The Vulnerability

Copy Fail exploits an out-of-bounds write in the Linux kernel's algif_aead module, part of the AF_ALG socket family used for kernel-level cryptography. The bug was introduced in 2017 when in-place crypto operations were optimized without enforcing write boundaries. An unprivileged attacker can use splice() to chain a target file's page cache pages into the crypto scatterlist. The authencesn wrapper then writes 4 bytes past the legitimate output region, tainting the cached file. By targeting /usr/bin/su (a setuid-root binary present on most distributions), an attacker can inject shellcode and escalate to root privileges.

The upstream fix (commit a664bf3d603d) reverts the 2017 optimization. The vulnerability affects Linux kernel versions from 4.14 to 5.10.

Cloudflare's Response

Cloudflare's response involved several parallel workstreams:

Blast radius mapping: Security and kernel engineers determined which kernel versions were vulnerable and assessed exposure.
Detection validation: Existing behavioral detection flagged the exploit pattern within minutes during authorized internal testing, without any signature update or rule change.
Threat hunting: Security searched for signs of prior exploitation across fleet-wide logs covering 48 hours before disclosure. No evidence was found.
Mitigation engineering: Kernel engineers built a runtime mitigation using bpf-lsm (BPF Linux Security Module).
Software updates: A patched kernel was built and rolled out through normal reboot automation.

The bpf-lsm Mitigation

Instead of removing the vulnerable algif_aead module (which would break legitimate kernel crypto API users), Cloudflare deployed an eBPF program that hooks the socket_bind LSM hook. The program:

Checks if the socket family is AF_ALG.
If so, checks the calling binary's path against an allow-list of known legitimate users.
Denies the bind for any binary not on the allow-list.

To verify the mitigation, the Copy Fail write-up provides a one-liner:

python3 -c 'import socket; s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0); s.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));'

On a mitigated machine, this returns PermissionError: [Errno 1] Operation not permitted.

Rollout Timeline

April 29, 16:00 UTC: Copy Fail publicly disclosed.
April 29, evening: First mitigation attempt (module removal) failed in staging due to a dependency conflict; rolled back safely.
April 29, overnight: bpf-lsm program drafted.
April 30, morning: bpf-lsm program tested and made production-ready.
April 30, afternoon: Visibility pipeline (eBPF tracing of AF_ALG socket usage) deployed fleet-wide, confirming only one legitimate AF_ALG user.
April 30, evening: bpf-lsm mitigation rolled out behind a separate enforcement gate. End-to-end verification confirmed the exploit no longer worked.
May 4 onward: Patched kernel rolled out through normal reboot automation.

Key Takeaways

Cloudflare's response highlights several practical lessons:

Behavioral detection works: Existing endpoint detection flagged the exploit without vulnerability-specific rules.
bpf-lsm is a powerful mitigation primitive: It allows surgical, no-reboot fixes for kernel vulnerabilities.
Visibility matters: eBPF-based tracing of kernel subsystem usage (AF_ALG sockets) confirmed the blast radius and allowed safe mitigation.
Staged rollouts prevent outages: The mitigation was deployed in two phases—visibility first, enforcement second.

Cloudflare identified areas for improvement: better visibility into kernel-API dependencies, improved bpf-lsm tooling (faster deployments, better playbooks, better logging), and proactive reduction of kernel attack surface by auditing and removing unused modules.

Bottom Line

Copy Fail was a serious local privilege escalation vulnerability, but Cloudflare's response demonstrates that a combination of behavioral detection, eBPF-based runtime mitigation, and disciplined patch management can contain such threats without customer impact. The bpf-lsm approach—denying the vulnerable code path to all but allow-listed binaries—is a template for handling similar kernel vulnerabilities in the future.

Cloudflare responded to the "Copy Fail" Linux vulnerability

The Vulnerability

Cloudflare's Response

The bpf-lsm Mitigation

Rollout Timeline

Key Takeaways

Bottom Line

Sources 1

More articles like this

Motherboard sales are now collapsing amid unprecedented shortages fueled by AI

AlphaEvolve: Gemini-powered coding agent scaling impact across fields

Building the TD4 4-Bit CPU

Diskless Linux boot using ZFS, iSCSI and PXE

How I made $350K from an open-source JavaScript library using dual licensing

Starlight 0.39