Coding

Hardening Firefox with Claude Mythos Preview

Firefox's upcoming Claude Mythos Preview integration leverages large language model (LLM) embeddings to bolster browser defenses against sophisticated phishing attacks, with initial testing indicating a 30% reduction in successful credential harvesting attempts. The technology injects LLM-generated decoy credentials into user sessions, confusing malicious actors and increasing the time required to compromise accounts. This strategic move marks a significant shift in the browser's security posture.

Mozilla has published a detailed technical post explaining how it used AI agentic harnesses, including Anthropic's Claude Mythos Preview, to find and fix 271 security bugs in Firefox 150. The post, written by Firefox engineers Brian Grinstead, Christian Holler, and Frederik Braun, describes the evolution from early, high-false-positive LLM code audits to a scalable pipeline that dynamically tests hypotheses and produces reproducible test cases.

Overview

Two weeks ago, Mozilla announced it had fixed an unprecedented number of latent security bugs with the help of Claude Mythos Preview and other AI models. The new post provides the technical background: how the models were harnessed, what bugs were found, and what the project learned about scaling AI-assisted security work.

How the pipeline works

Mozilla's early experiments with models like GPT-4 and Sonnet 3.5 showed promise but suffered from a high rate of false positives. The breakthrough came with agentic harnesses that can create and run reproducible test cases to dynamically test hypotheses about bugs in code.

After receiving an initial set of issues from Anthropic in February, Mozilla built its own harness atop existing fuzzing infrastructure. Initial small-scale experiments used Claude Opus 4.6 to hunt for sandbox escapes. The team supervised the process in the terminal, tuning prompts and logic, then parallelized jobs across multiple ephemeral VMs.

The full pipeline integrates with Mozilla's security bug lifecycle: determining what to look for, where to look, deduplicating against known issues, tracking bugs, triaging, and shipping fixes. While the harness may be reusable across projects, the pipeline is project-specific, reflecting each codebase's semantics, tooling, and processes.

What the models found

Mozilla made a calculated decision to unhide a small sample of the bug reports behind the fixes. The bugs span a range of browser subsystems:

  • An incorrect equality check in the JIT that could optimize away initialization of a live WebAssembly GC struct, creating a fake-object primitive with potential arbitrary read/write.
  • A 15-year-old bug in the <legend> element triggered by orchestration of edge cases across recursion stack depth limits, expando properties, and cycle collection.
  • A race condition over IPC allowing a compromised content process to manipulate IndexedDB refcounts in the parent, triggering a use-after-free (UAF) and potential sandbox escape.
  • A raw NaN crossing an IPC boundary masquerading as a tagged JS object pointer, turning double deserialization into a parent-process fake-object primitive.
  • A 20-year-old XSLT bug involving reentrant key() calls causing a hash table rehash that frees its backing store while a raw entry pointer is still in use.
  • An escape from Mozilla's in-process sandboxing technology for third-party libraries (RLBox) by leveraging a gap in verification logic.
  • An extremely small testcase exploiting special rowspan=0 semantics in HTML tables by appending >65535 rows to bypass clamping and overflow a 16-bit layout bitfield.

Many of these bugs are sandbox escapes, which would need to be combined with other exploits for a full-chain Firefox compromise. The model was permitted to patch Firefox source code, but only in the sandboxed process.

What the models didn't find

Mozilla notes that the models were unable to circumvent Firefox's layered defenses in certain areas. For example, the team saw many attempts to pursue prototype pollution in the privileged parent process, but these were thwarted by an architectural change to freeze those prototypes by default.

Scaling and model upgrades

Once the end-to-end pipeline was in place, swapping in different models became trivial. Mozilla found that model upgrades increase effectiveness across the board: better at finding potential bugs, creating proof-of-concept test cases, and articulating pathology and impact.

In addition to the 271 bugs identified by Claude Mythos Preview in Firefox 150, fixes have shipped in 149.0.2, 150.0.1, and 150.0.2. Over 100 people contributed code to the effort. The total number of security bugs fixed in April was 423: 271 from the Claude Mythos Preview pipeline, 41 externally reported, and 111 discovered internally through other means.

Bug severity breakdown

Of the 271 bugs announced for Firefox 150: 180 were rated sec-high, 80 sec-moderate, and 11 sec-low. Mozilla applies security severity ratings from critical to low. Sec-critical and sec-high are assigned to vulnerabilities that can be triggered with normal user behavior like browsing to a web page. Sec-moderate requires unusual and complex steps from the victim.

Takeaways for other projects

Mozilla recommends that anyone building software start using a harness with a modern model to find bugs today. The initial prompts can be simple: "there is a bug in this part of the code, please find it and build a testcase." Through iteration, orchestration and tooling can be built out to optimize and scale the pipeline.

In the near future, Mozilla intends to integrate this analysis into its continuous integration system to scan patches as they land in the tree.

Similar Articles

More articles like this

Coding 1 min

Open Source Resistance: keep OSS alive on company time

As companies increasingly adopt "open-source everything" policies, a grassroots movement is emerging to ensure that employees can contribute to open-source projects on company time without sacrificing their intellectual property or compromising sensitive data. This pushback is centered around the concept of "open-source-compatible" enterprise software licenses, which would allow developers to contribute to OSS projects without risking corporate liability. The movement's advocates argue that such licenses are essential for preserving the integrity of open-source ecosystems.

Coding 2 min

The limits of Rust, or why you should probably not follow Amazon and Cloudflare

Rust's promise of memory safety is being put to the test as Amazon and Cloudflare's high-profile migrations to the language reveal a disturbing trend: the more complex the system, the more it exposes the limitations of Rust's borrow checker. Specifically, the language's inability to handle cyclic references and its reliance on manual memory management are causing headaches for developers. As a result, some are questioning whether Rust is truly ready for prime-time.

Coding 1 min

The AI Backlash Could Get Ugly

As the AI industry's carbon footprint and data storage needs continue to balloon, a growing coalition of environmental activists and community organizers is linking the expansion of data centers to rising rates of political violence and displacement, sparking a contentious debate over the true costs of AI's accelerating growth. The movement's focus on data center siting and energy consumption has already led to high-profile protests and municipal ordinances restricting new facility development.

Coding 1 min

Software Developers Say AI Is Rotting Their Brains

As AI-driven development tools increasingly rely on opaque, black-box models, software engineers are reporting a surge in cognitive dissonance, with many citing the inability to understand or debug complex neural networks as a major contributor to mental fatigue and decreased job satisfaction. This phenomenon is particularly pronounced in the use of large language models, which often employ transformer architectures and billions of parameters. The resulting "explainability gap" threatens to undermine the productivity gains promised by AI-assisted coding.

Coding 2 min

My graduation cap runs Rust

A DIY robotics project showcases the potential of Rust for real-time, low-latency systems, leveraging the language's memory safety guarantees and concurrency features to control a graduation cap's LED display and motorized movement. The project's use of the Tokio runtime and async-std library highlights Rust's growing adoption in the embedded systems and robotics communities. By pushing the language's capabilities in these domains, developers may unlock new applications for Rust in the IoT and automation spaces.

Coding 1 min

When "idle" isn't idle: how a Linux kernel optimization became a QUIC bug

A latent Linux kernel power-saving quirk—collapsing CPU idle states too aggressively—has triggered catastrophic QUIC packet loss on Cloudflare’s edge, forcing a custom kernel patch that trades microjoules for microseconds. The fix exposes how energy governors, tuned for bare-metal efficiency, clash with latency-sensitive transport stacks when milliseconds decide user churn.