Tracking issue for improving std::sync::{Mutex, RwLock, Condvar}

A few weeks ago, on January 12th, the @rust-lang/libs team discussed a plan to improve `std::sync::Mutex`, `std::sync::Condvar`, and `std::sync::RwLock`.

On most platforms, these structures are currently wrappers around their `pthread` equivalent, such as `pthread_mutex_t`. These types are not movable, however, forcing us to wrap them in a `Box`, resulting in an allocation and indirection for our lock types. This also gets in the way of a `const` constructor for these types, which makes `static` locks more complicated than necessary.

@Amanieu presented his library, `parking_lot`, which implements [the 'parking lot' structure from WebKit](https://webkit.org/blog/6161/locking-in-webkit/). Rather than letting the operating system handle any waiting queues, this manages the queues in user space in a global hash table indexed by the address of the lock. This allows for flexible 1-byte mutexes with almost no restrictions.

There has been [an attempt at merging `parking_lot` into std](https://github.com/rust-lang/rust/pull/56410), as a replacement for the implementation of std's locking primitives. However, many different blockers came up and that PR did not get merged. (See also [my RustConf talk](https://www.youtube.com/watch?v=DnYQKWs_7EA) for some of this history.) Many of these blockers have been resolved since.

One of the problems with replacing std's lock implementations by `parking_lot` is that `parking_lot` allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.

A possible solution is to use a `static` table with a fixed number of entries, which does not scale with the number of threads. This could work well in many situations, but can become problematic in highly parallel programs and systems. Another possible solution is to skip the global allocator and directly allocate this hash table with a native system API such as `mmap` on Unix. (See [this thread](https://github.com/Amanieu/parking_lot/pull/305) for some discussion.)

In our meeting, we discussed what's natively available on various operating systems/platforms:

- On Windows, we have already switched to Windows SRW locks, which do not require allocation as they can be moved. They are small (32-bit), efficient, and `const` constructible. Just for Windows, there does not seem much advantage to switching to `parking_lot`, although there might be some (small) performance difference.

- On both Linux and some BSDs there's a native `futex()` syscall, which provides functionality similar (but more primitive) than a parking lot. Nearly equivalent functionality [is available in Wasm](https://github.com/WebAssembly/threads/blob/main/proposals/threads/Overview.md#wait-and-notify-operators). While not trivial, it's possible to implement our synchronization primitives directly based on such a futex. This makes all the primitives small (32-bit), efficient, and `const` constructible.

- On macOS and some other Unixes there doesn't seem to be anything that's similar futex syscall. ([`parking_lot` uses `pthread` for its thread parking operations on those platforms](https://github.com/Amanieu/parking_lot/blob/a75875b0bf904287a9749e8eabea919b5e9dd8a9/core/src/thread_parker/unix.rs#L30-L31).)

- [Some smaller platforms](https://github.com/rust-lang/rust/tree/master/library/std/src/sys) that are (partially) supported by the standard library have [their own non-trivial implementation](https://github.com/rust-lang/rust/tree/master/library/std/src/sys/sgx/waitqueue) of the standard locks, which are hard to maintain and have not gotten much validation.

After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and `const` constructible. This means [SRW locks](https://docs.microsoft.com/en-us/windows/win32/sync/slim-reader-writer--srw--locks) on Windows, and futex-based locks on Linux, some BSDs, and Wasm. And for other platforms without suitable native APIs, a `parking_lot`-based implementation using one of the possible workarounds for the allocation problem.

This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.

It also means that porting the standard library to a new platform will be easier, and maintainance of the std implementation for the less common platforms will become easier. These platforms will now only have to implement [a thread parker](https://github.com/Amanieu/parking_lot/blob/a75875b0bf904287a9749e8eabea919b5e9dd8a9/core/src/thread_parker/mod.rs#L4) and can rely on a performant and correct implementation of the standard locks on top of that.

Update: We've decided to not directly use parking lot's one-byte (two bit) locks, but instead use the equivalent of its internal `WordLock` or Windows' SRW locks, which are one pointer in size and require no global state in the program. That solves the allocation problem.

Update 2: To maintain priority inheritance of the native locks (e.g. on macOS), we've kept using pthread locks on non-futex unix platforms, at least for now. Using https://github.com/rust-lang/rust/pull/97647, we've made it possible for the locks to have `const fn new`.

---

Primary goals:

- [x] Efficient non-allocating locks
    - [x] Windows
    - [x] Linux
    - [ ] ~~macOS~~
      - We can't avoid pthread if we want to keep features of the OS' native locks, like priority iniheritance.
    - [x] {Free, Open}BSD
- [x] Turn `std::sync::{Mutex, Condvar, RwLock}::new` into `const` functions: https://github.com/rust-lang/rust/pull/97791
- [x] Resolve some bugs around pthreads
    - [x] https://github.com/rust-lang/rust/issues/85434
    - [x] https://github.com/rust-lang/rust/issues/94564#issuecomment-1098942913

Possible later goals:

- Add a `Condvar::wait_rwlock` to make `Condvar` usable with `RwLock`s?
- Allow `Send`ing `MutexGuard`s to other threads?

---

To do:

- [x] Relax `Condvar` requirements to allow for unboxed mutexes. https://github.com/rust-lang/rust/pull/76932
- [x] Use unboxed SRW locks on Windows.
    - [x] Make Microsoft promise that SRW locks are indeed movable. https://github.com/MicrosoftDocs/sdk-api/pull/447
    - [x] https://github.com/rust-lang/rust/pull/76645
    - [x] Refactor `sys_common::Mutex` to have a separate `MovableMutex` type to allow unboxing on some platforms. https://github.com/rust-lang/rust/pull/77147
    - [x] Remove the `Box`es in `Mutex` and `Condvar` on Windows and Wasm. https://github.com/rust-lang/rust/pull/77380
    - [x] Remove the `Box` from `RwLock` on Windows and Wasm. https://github.com/rust-lang/rust/pull/84687
    - [x] (Start using `WaitOnAddress` and `NtWaitForKeyedEvent` in the standard library.) https://github.com/rust-lang/rust/pull/77618
        - This unblocks usage of these APIs in `parking_lot` if we want to use that on Windows too. But if we just use SRW locks, this was not necessary to unblock the lock improvements.
- [x] Use futex-based locks on Linux.
    - [x] Start using the futex syscall to get some experience with it in the standard library. https://github.com/rust-lang/rust/pull/76919
    - [x] Implement the `futex()` syscall in Miri to allow Miri to run standard library tests. https://github.com/rust-lang/miri/pull/1568
        - [x] Also implement the bitset futex operations in Miri. https://github.com/rust-lang/miri/pull/2054
    - [x] Implement `Mutex` and `Condvar` using `futex()`.
        - [x] Design these.
            - [x] Experimentation: https://github.com/m-ou-se/futex-lock-experiment/
            - [x] Investigate other implementations (glibc, musl, Windows, etc.)
                - [x] musl: https://github.com/rust-lang/rust/issues/93740#issuecomment-1041391284
                - [x] glibc
                    - [x] Mutexes: https://github.com/rust-lang/rust/issues/93740#issuecomment-1041572651
                    - [x] Condition variables: https://github.com/rust-lang/rust/issues/93740#issuecomment-1048886597
                    - [x] Reader-writer locks: https://github.com/rust-lang/rust/issues/93740#issuecomment-1055354913
                - [x] Boost: https://github.com/rust-lang/rust/issues/93740#issuecomment-1064286563
                - [x] Windows' SRW locks: https://github.com/rust-lang/rust/issues/93740#issuecomment-1064139337
                - [x] Wine's SRW locks: https://github.com/rust-lang/rust/issues/93740#issuecomment-1055672041
                - [x] Apple Darwin libpthread: https://github.com/rust-lang/rust/issues/93740#issuecomment-1069240178
                - [x] FreeBSD's libpthread: https://github.com/rust-lang/rust/issues/93740#issuecomment-1069265943
            - [x] Make some design decisions: https://github.com/rust-lang/rust/issues/93740#issuecomment-1070696128
        - [x] Implementation: https://github.com/rust-lang/rust/pull/95035
    - [x] Implement `ReentrantMutex` using `futex()`: https://github.com/rust-lang/rust/pull/95727
    - [x] Implement `RwLock` using `futex()`: https://github.com/rust-lang/rust/pull/95801
- [x] Use the same `ReentrantMutex` on all platforms: https://github.com/rust-lang/rust/pull/96042
- [x] Use futex-based locks on \*BSD.
    - [x] Add NetBSD's FUTEX_* constants to the libc crate. https://github.com/rust-lang/libc/pull/2762
    - [x] Add OpenBSD's futex() to the libc crate. https://github.com/rust-lang/libc/pull/2761
    - [x] Add FreeBSD umtx functions and constants to the libc crate. https://github.com/rust-lang/libc/pull/2770
    - [x] Add DragonFlyBSD's umtx_* functions to the libc crate. https://github.com/rust-lang/libc/pull/2763
    - [x] Switch *BSD over to the same futex locks as on Linux. https://github.com/rust-lang/rust/pull/96510
- [x] Use futex based locks on all other platforms that support a futex-like API.
    - [x] Use futex based locks on Emscripten. https://github.com/rust-lang/rust/pull/96205
    - [x] Use `atomic.wait`/`atomic.notify` based locks on Wasm. https://github.com/rust-lang/rust/pull/96206
    - [ ] Use [zx_futex_wait](https://fuchsia.dev/fuchsia-src/reference/syscalls/futex_wait) based locks on Fuchsia.
       - (Tier 2 platform.)
    - [ ] Use [syscall::call::futex](https://docs.rs/redox_syscall/latest/syscall/call/fn.futex.html) based locks on Redox.
       - (Tier 3 platform.)
    - [x] Use futexes on Hermit. https://github.com/rust-lang/rust/pull/101475
- [x] Lazily allocate on platforms where we still use pthread: https://github.com/rust-lang/rust/pull/97647
    - [x] macOS
    - [x] NetBSD, other unix
- [x] Make the `new` functions `const`. https://github.com/rust-lang/rust/pull/97791
- [x] [future / nice to have] Use 'WordLock' (aka SRW lock, aka 'user space linked-list queue lock') on other platforms.
    - [x] Remove locks from `Instant::now()`, to avoid a cyclic dependency. 
    - [x] Stop using `std::sync::{Mutex, Condvar}` in std::sys's thread parker, to avoid a cyclic dependency. https://github.com/rust-lang/rust/pull/96393
    - [x] Implement this type of locks, and replace the lock implementations of non-futex platforms by it. (https://github.com/rust-lang/rust/pull/110211)
- [x] Mark this issue as fixed: https://github.com/rust-lang/rust/issues/85434
- [ ] ~~Write some blog post about all this.~~ (https://github.com/rust-lang/rust/issues/93740#issuecomment-1398913130)
- [x] Celebrate. :tada:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions