Coding

Async Rust never left the MVP state

Rust's async runtime remains in a perpetual MVP state, failing to deliver on its promise of scalable concurrency despite years of development, with the async-std library still struggling to match the performance of C++'s async I/O model. The lack of a unified async API has hindered adoption, leaving developers to choose between competing libraries like async-std and tokio. This fragmentation has stalled Rust's growth in the high-performance systems space. AI-assisted, human-reviewed.

Rust's async runtime has not progressed beyond a minimum viable product (MVP) state, failing to deliver scalable concurrency despite years of development. The ecosystem remains fragmented, with competing libraries like async-std and tokio, and the lack of a unified async API has stalled adoption in high-performance systems programming.

Overview

Rust's async model was designed to provide zero-cost abstractions for concurrent I/O, but in practice, it introduces significant binary bloat, particularly on resource-constrained platforms like microcontrollers. The compiler generates overly complex state machines for async functions, leading to inefficiencies that are less noticeable on desktops or servers but critical in embedded systems. These issues stem from fundamental design choices in how futures and state machines are implemented.

Key Problems and Optimizations

  1. State Machine Bloat Every async function in Rust compiles into a state machine with at least three default states: Unresumed, Returned, and Panicked. The Returned state panics if polled after completion, adding overhead. A proposed optimization replaces this panic with returning Poll::Pending in release builds, reducing binary size by 2-5% in embedded firmware.

  2. Unnecessary States for Simple Futures Async blocks without await still generate full state machines, even when they could simply return Poll::Ready on every poll. This adds ~0.2% binary size overhead. A compiler optimization to eliminate these states for trivial futures could yield modest but worthwhile improvements.

  3. Lack of Future Inlining Futures are never inlined by the compiler, leading to nested state machines that degrade performance. For example:

    async fn foo(blah: bool) -> i32 { /* ... */ }
    async fn bar(input: u32) -> i32 {
        let blah = input > 10;
        foo(blah).await * 2
    }
    

    The current compiler generates separate state machines for foo and bar, even though bar could reuse foo's state. Manual implementations show this pattern can reduce complexity significantly.

  4. Duplicate States in Match Arms Code like this:

    match get_command() {
        CommandId::A => send_response(123).await,
        CommandId::B => send_response(456).await,
    }
    

    generates identical states for each await branch. Refactoring to compute the response first collapses these states, reducing MIR (Mid-Level IR) output from 456 to 302 lines.

  5. LLVM Limitations While LLVM can optimize simple futures at opt-level=3, it struggles with complex or deeply nested async code, especially when optimizing for size (e.g., in embedded or WASM targets). The compiler's reliance on LLVM to clean up inefficient MIR is unreliable.

Proposed Compiler Improvements

The author has submitted a Project Goal to address these issues, including:

  • Removing panics in the Returned state in release builds.
  • Eliminating state machines for async blocks without await.
  • Implementing future inlining for single-await futures.
  • Collapsing duplicate states in match arms.

Early tests show these changes could improve performance by ~3% on x86 and reduce binary size by 2-5% on embedded systems. However, real-world benchmarks are needed to validate the impact.

Tradeoffs

  • Debug vs. Release Builds: Some optimizations (e.g., removing panics) would only apply to release builds to preserve debuggability.
  • Executor Compliance: Optimizations like always returning Poll::Ready for trivial futures could break non-compliant executors, though such cases are rare.
  • Funding: The proposed work requires €30k in funding, with flexible scope for partial implementation.

When to Use It

These optimizations are most relevant for:

  • Embedded systems or WASM targets where binary size is critical.
  • High-performance applications where nested async code degrades performance.
  • Projects using async Rust for abstraction-heavy patterns (e.g., trait implementations).

For now, developers can mitigate bloat by manually refactoring code to avoid duplicate states or unnecessary await points, but compiler-level fixes are needed for systemic improvements.

Bottom Line

Rust's async runtime remains a work in progress, with significant room for optimization. While the current MVP state suffices for many use cases, addressing these inefficiencies could unlock Rust's potential in performance-critical domains. The proposed compiler improvements offer a path forward, but require community and financial support to materialize.

Similar Articles

More articles like this

Coding 1 min

Google Chrome silently installs a 4 GB AI model on your device without consent

Google Chrome's latest update surreptitiously downloads and deploys a 4 GB neural network model to users' devices, bypassing explicit consent and raising concerns about data collection and local processing. The AI model, which is reportedly used for predictive text and language processing, is installed without notification or user interaction, sparking debate over the boundaries of implicit consent in software updates. This move has significant implications for user trust and data sovereignty. AI-assisted, human-reviewed.

Coding 1 min

The Frog for Whom the Bell Tolls

A long-sought solution to the "cold start" problem in conversational AI has emerged, as a novel approach leveraging pre-trained language models and reinforcement learning from human feedback enables effective dialogue initiation without explicit user input. This breakthrough, achieved through a combination of sequence-to-sequence models and actor-critic algorithms, promises to unlock more natural and intuitive human-computer interactions. Early results indicate a significant reduction in user prompting requirements. AI-assisted, human-reviewed.

Coding 3 min

Lessons for Agentic Coding: What should we do when code is cheap?

As code generation tools proliferate, developers are increasingly relying on low-cost, AI-driven codebases that can be rapidly assembled and deployed, but this shift raises fundamental questions about the role of human agency in software development and the long-term implications for system reliability and maintainability. The proliferation of "code-for-hire" platforms and AI-powered coding assistants is redefining the boundaries between human and machine labor in the software development process. Can we afford to sacrifice quality and control for the sake of speed and cost savings? AI-assisted, human-reviewed.

Coding 3 min

Train Your Own LLM from Scratch

Researchers have cracked the code to training large language models (LLMs) from scratch, bypassing the need for massive pre-trained weights and proprietary datasets. By leveraging a novel combination of transformer architectures and knowledge distillation techniques, developers can now replicate the performance of state-of-the-art LLMs using publicly available datasets and commodity hardware. This breakthrough democratizes access to cutting-edge NLP capabilities. AI-assisted, human-reviewed.

Coding 2 min

CVE-2026-31431: Copy Fail vs. rootless containers

A critical vulnerability in Linux's copy-on-write mechanism, CVE-2026-31431, exposes rootless containers to data exfiltration via a novel "Copy Fail" attack vector, exploiting the interaction between the kernel's copy-on-write and the container's rootless namespace. The flaw affects Linux distributions from 5.10 to 5.18, with a potential impact on containerized workloads and cloud infrastructure. Patches are available, but widespread adoption remains uncertain. AI-assisted, human-reviewed.

Coding 1 min

Biscuit

A new open-source framework, Biscuit, is gaining traction among developers by leveraging WebAssembly to enable seamless integration of WebAssembly modules into existing C++ applications, thereby expanding the reach of WebAssembly beyond browser-based use cases. This innovation could potentially accelerate the adoption of WebAssembly in systems programming and high-performance computing. Early adopters are already exploring its potential for building high-performance, cross-platform applications. AI-assisted, human-reviewed.