On April 29, 2026, a Linux kernel local privilege escalation vulnerability named "Copy Fail" (CVE-2026-31431) was publicly disclosed. Cloudflare's security and engineering teams responded by deploying a surgical, no-reboot mitigation using eBPF, while also rolling out a patched kernel through normal update cycles. There was no customer impact or service disruption.
The Vulnerability
Copy Fail exploits an out-of-bounds write in the Linux kernel's algif_aead module, part of the AF_ALG socket family used for kernel-level cryptography. The bug was introduced in 2017 when in-place crypto operations were optimized without enforcing write boundaries. An unprivileged attacker can use splice() to chain a target file's page cache pages into the crypto scatterlist. The authencesn wrapper then writes 4 bytes past the legitimate output region, tainting the cached file. By targeting /usr/bin/su (a setuid-root binary present on most distributions), an attacker can inject shellcode and escalate to root privileges.
The upstream fix (commit a664bf3d603d) reverts the 2017 optimization. The vulnerability affects Linux kernel versions from 4.14 to 5.10.
Cloudflare's Response
Cloudflare's response involved several parallel workstreams:
- Blast radius mapping: Security and kernel engineers determined which kernel versions were vulnerable and assessed exposure.
- Detection validation: Existing behavioral detection flagged the exploit pattern within minutes during authorized internal testing, without any signature update or rule change.
- Threat hunting: Security searched for signs of prior exploitation across fleet-wide logs covering 48 hours before disclosure. No evidence was found.
- Mitigation engineering: Kernel engineers built a runtime mitigation using bpf-lsm (BPF Linux Security Module).
- Software updates: A patched kernel was built and rolled out through normal reboot automation.
The bpf-lsm Mitigation
Instead of removing the vulnerable algif_aead module (which would break legitimate kernel crypto API users), Cloudflare deployed an eBPF program that hooks the socket_bind LSM hook. The program:
- Checks if the socket family is AF_ALG.
- If so, checks the calling binary's path against an allow-list of known legitimate users.
- Denies the bind for any binary not on the allow-list.
To verify the mitigation, the Copy Fail write-up provides a one-liner:
python3 -c 'import socket; s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0); s.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));'
On a mitigated machine, this returns PermissionError: [Errno 1] Operation not permitted.
Rollout Timeline
- April 29, 16:00 UTC: Copy Fail publicly disclosed.
- April 29, evening: First mitigation attempt (module removal) failed in staging due to a dependency conflict; rolled back safely.
- April 29, overnight: bpf-lsm program drafted.
- April 30, morning: bpf-lsm program tested and made production-ready.
- April 30, afternoon: Visibility pipeline (eBPF tracing of AF_ALG socket usage) deployed fleet-wide, confirming only one legitimate AF_ALG user.
- April 30, evening: bpf-lsm mitigation rolled out behind a separate enforcement gate. End-to-end verification confirmed the exploit no longer worked.
- May 4 onward: Patched kernel rolled out through normal reboot automation.
Key Takeaways
Cloudflare's response highlights several practical lessons:
- Behavioral detection works: Existing endpoint detection flagged the exploit without vulnerability-specific rules.
- bpf-lsm is a powerful mitigation primitive: It allows surgical, no-reboot fixes for kernel vulnerabilities.
- Visibility matters: eBPF-based tracing of kernel subsystem usage (AF_ALG sockets) confirmed the blast radius and allowed safe mitigation.
- Staged rollouts prevent outages: The mitigation was deployed in two phases—visibility first, enforcement second.
Cloudflare identified areas for improvement: better visibility into kernel-API dependencies, improved bpf-lsm tooling (faster deployments, better playbooks, better logging), and proactive reduction of kernel attack surface by auditing and removing unused modules.
Bottom Line
Copy Fail was a serious local privilege escalation vulnerability, but Cloudflare's response demonstrates that a combination of behavioral detection, eBPF-based runtime mitigation, and disciplined patch management can contain such threats without customer impact. The bpf-lsm approach—denying the vulnerable code path to all but allow-listed binaries—is a template for handling similar kernel vulnerabilities in the future.