You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A few weeks ago, on January 12th, the @rust-lang/libs team discussed a plan to improve std::sync::Mutex, std::sync::Condvar, and std::sync::RwLock.
On most platforms, these structures are currently wrappers around their pthread equivalent, such as pthread_mutex_t. These types are not movable, however, forcing us to wrap them in a Box, resulting in an allocation and indirection for our lock types. This also gets in the way of a const constructor for these types, which makes static locks more complicated than necessary.
@Amanieu presented his library, parking_lot, which implements the 'parking lot' structure from WebKit. Rather than letting the operating system handle any waiting queues, this manages the queues in user space in a global hash table indexed by the address of the lock. This allows for flexible 1-byte mutexes with almost no restrictions.
There has been an attempt at merging parking_lot into std, as a replacement for the implementation of std's locking primitives. However, many different blockers came up and that PR did not get merged. (See also my RustConf talk for some of this history.) Many of these blockers have been resolved since.
One of the problems with replacing std's lock implementations by parking_lot is that parking_lot allocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.
A possible solution is to use a static table with a fixed number of entries, which does not scale with the number of threads. This could work well in many situations, but can become problematic in highly parallel programs and systems. Another possible solution is to skip the global allocator and directly allocate this hash table with a native system API such as mmap on Unix. (See this thread for some discussion.)
In our meeting, we discussed what's natively available on various operating systems/platforms:
On Windows, we have already switched to Windows SRW locks, which do not require allocation as they can be moved. They are small (32-bit), efficient, and const constructible. Just for Windows, there does not seem much advantage to switching to parking_lot, although there might be some (small) performance difference.
On both Linux and some BSDs there's a native futex() syscall, which provides functionality similar (but more primitive) than a parking lot. Nearly equivalent functionality is available in Wasm. While not trivial, it's possible to implement our synchronization primitives directly based on such a futex. This makes all the primitives small (32-bit), efficient, and const constructible.
After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and const constructible. This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm. And for other platforms without suitable native APIs, a parking_lot-based implementation using one of the possible workarounds for the allocation problem.
This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.
It also means that porting the standard library to a new platform will be easier, and maintainance of the std implementation for the less common platforms will become easier. These platforms will now only have to implement a thread parker and can rely on a performant and correct implementation of the standard locks on top of that.
Update: We've decided to not directly use parking lot's one-byte (two bit) locks, but instead use the equivalent of its internal WordLock or Windows' SRW locks, which are one pointer in size and require no global state in the program. That solves the allocation problem.
Update 2: To maintain priority inheritance of the native locks (e.g. on macOS), we've kept using pthread locks on non-futex unix platforms, at least for now. Using #97647, we've made it possible for the locks to have const fn new.
Primary goals:
Efficient non-allocating locks
Windows
Linux
macOS
We can't avoid pthread if we want to keep features of the OS' native locks, like priority iniheritance.
This unblocks usage of these APIs in parking_lot if we want to use that on Windows too. But if we just use SRW locks, this was not necessary to unblock the lock improvements.
A few weeks ago, on January 12th, the @rust-lang/libs team discussed a plan to improve
std::sync::Mutex,std::sync::Condvar, andstd::sync::RwLock.On most platforms, these structures are currently wrappers around their
pthreadequivalent, such aspthread_mutex_t. These types are not movable, however, forcing us to wrap them in aBox, resulting in an allocation and indirection for our lock types. This also gets in the way of aconstconstructor for these types, which makesstaticlocks more complicated than necessary.@Amanieu presented his library,
parking_lot, which implements the 'parking lot' structure from WebKit. Rather than letting the operating system handle any waiting queues, this manages the queues in user space in a global hash table indexed by the address of the lock. This allows for flexible 1-byte mutexes with almost no restrictions.There has been an attempt at merging
parking_lotinto std, as a replacement for the implementation of std's locking primitives. However, many different blockers came up and that PR did not get merged. (See also my RustConf talk for some of this history.) Many of these blockers have been resolved since.One of the problems with replacing std's lock implementations by
parking_lotis thatparking_lotallocates memory for its global hash table. A Rust program can define its own custom allocator, and such a custom allocator will likely use the standard library's locks, creating a cyclic dependency problem where you can't allocate memory without locking, but you can't lock without first allocating the hash table.A possible solution is to use a
statictable with a fixed number of entries, which does not scale with the number of threads. This could work well in many situations, but can become problematic in highly parallel programs and systems. Another possible solution is to skip the global allocator and directly allocate this hash table with a native system API such asmmapon Unix. (See this thread for some discussion.)In our meeting, we discussed what's natively available on various operating systems/platforms:
On Windows, we have already switched to Windows SRW locks, which do not require allocation as they can be moved. They are small (32-bit), efficient, and
constconstructible. Just for Windows, there does not seem much advantage to switching toparking_lot, although there might be some (small) performance difference.On both Linux and some BSDs there's a native
futex()syscall, which provides functionality similar (but more primitive) than a parking lot. Nearly equivalent functionality is available in Wasm. While not trivial, it's possible to implement our synchronization primitives directly based on such a futex. This makes all the primitives small (32-bit), efficient, andconstconstructible.On macOS and some other Unixes there doesn't seem to be anything that's similar futex syscall. (
parking_lotusespthreadfor its thread parking operations on those platforms.)Some smaller platforms that are (partially) supported by the standard library have their own non-trivial implementation of the standard locks, which are hard to maintain and have not gotten much validation.
After some discussion, the consensus was to providing the locks as 'thinnest possible wrapper' around the native lock APIs as long as they are still small, efficient, and
constconstructible. This means SRW locks on Windows, and futex-based locks on Linux, some BSDs, and Wasm. And for other platforms without suitable native APIs, aparking_lot-based implementation using one of the possible workarounds for the allocation problem.This means that on platforms like Linux and Windows, the operating system will be responsible for managing the waiting queues of the locks, such that any kernel improvements and features like debugging facilities in this area are directly available for Rust programs.
It also means that porting the standard library to a new platform will be easier, and maintainance of the std implementation for the less common platforms will become easier. These platforms will now only have to implement a thread parker and can rely on a performant and correct implementation of the standard locks on top of that.
Update: We've decided to not directly use parking lot's one-byte (two bit) locks, but instead use the equivalent of its internal
WordLockor Windows' SRW locks, which are one pointer in size and require no global state in the program. That solves the allocation problem.Update 2: To maintain priority inheritance of the native locks (e.g. on macOS), we've kept using pthread locks on non-futex unix platforms, at least for now. Using #97647, we've made it possible for the locks to have
const fn new.Primary goals:
macOSstd::sync::{Mutex, Condvar, RwLock}::newintoconstfunctions: Make {Mutex, Condvar, RwLock}::new() const. #97791Possible later goals:
Condvar::wait_rwlockto makeCondvarusable withRwLocks?SendingMutexGuards to other threads?To do:
Condvarrequirements to allow for unboxed mutexes. Relax promises about condition variable. #76932sys_common::Mutexto have a separateMovableMutextype to allow unboxing on some platforms. Split sys_common::Mutex in StaticMutex and MovableMutex. #77147Boxes inMutexandCondvaron Windows and Wasm. Unbox mutexes and condvars on some platforms #77380BoxfromRwLockon Windows and Wasm. Multiple improvements to RwLocks #84687WaitOnAddressandNtWaitForKeyedEventin the standard library.) Add fast WaitOnAddress-based thread parker for Windows. #77618parking_lotif we want to use that on Windows too. But if we just use SRW locks, this was not necessary to unblock the lock improvements.futex()syscall in Miri to allow Miri to run standard library tests. Implement futex_wait and futex_wake. miri#1568MutexandCondvarusingfutex().ReentrantMutexusingfutex(): Replace ReentrantMutex by a futex-based one on Linux. #95727RwLockusingfutex(): Replace RwLock by a futex based one on Linux #95801ReentrantMutexon all platforms: Use a single ReentrantMutex implementation on all platforms. #96042atomic.wait/atomic.notifybased locks on Wasm. Use sys::unix::locks::futex* on wasm+atomics. #96206newfunctionsconst. Make {Mutex, Condvar, RwLock}::new() const. #97791Instant::now(), to avoid a cyclic dependency.std::sync::{Mutex, Condvar}in std::sys's thread parker, to avoid a cyclic dependency. std: directly use pthread in UNIX parker implementation #96393RwLockwith custom implementation #110211)Write some blog post about all this.(Tracking issue for improving std::sync::{Mutex, RwLock, Condvar} #93740 (comment))