Overview
The KnownLayout trait encodes enough information about a type's layout to be able to synthesize pointers to values of that type at runtime. While we can already do this for sized types and slice types, KnownLayout generalizes this to "slice DSTs" - types with leading fixed-sized fields followed by a trailing slice.
Progress
Motivation
Many users use zerocopy by defining types whose layouts match the layouts of data that needs to be parsed, serialized, or otherwise interacted with. Currently, zerocopy supports types with fixed size ("sized" types) and slices (a repeated structure of 0 or more instances of a sized type). However, many of our users have data which is not shaped like either of these.
In particular, many of our users have data which is shaped like a "slice-based dynamically-sized type" or "slice DST". A slice DST is a type with some number of fixed-sized fields followed by a trailing slice field. For example:
#[repr(C)]
struct UdpHeader {
src_port: U16,
dst_port: U16,
length: U16,
checksum: U16,
}
#[repr(C)]
struct UdpPacket {
header: UdpHeader,
body: [u8],
}
In this example, all UdpPackets have the same fixed UdpHeader, but may have a variable number of bytes in the trailing body field. The layout of UdpPacket exactly matches the layout of a UDP packet on the wire.
This design aims to support users whose data can be modeled by slice DSTs.
Design
What KnownLayout needs to encode
KnownLayout needs to provide any information required in order to perform a conversion from &[u8] to &T where T is a sized type, a slice type, or a slice DST. This conversion may be fallible due to size or alignment.
In order to support these conversions, zerocopy must know the following:
- Alignment
- For sized types, size
- For slices and slice DSTs:
- The offset to the trailing slice field (0 for slice types)
- The size of the element type of the trailing slice field
This is encoded in the DstLayout type:
struct DstLayout {
align: NonZeroUsize,
size_info: SizeInfo,
}
enum SizeInfo {
Sized { size: usize },
SliceDst(TrailingSliceLayout),
}
struct TrailingSliceLayout {
offset: usize,
elem_size: usize,
}
trait KnownLayout {
#[doc(hidden)]
const LAYOUT: DstLayout;
}
Computing layout
In order to compute KnownLayout's associated LAYOUT constant, we use a recursive approach:
- For sized types, we query the alignment and size via
align_of and size_of
- For slice types, we rely on
align_of and size_of to get the alignment and size of the element type; the slice's layout can be computed from these values
For composite types (namely in the context of #[derive(KnownLayout)]), we need to extract a few pieces of information:
- The alignment of the overall type
- The offset of the trailing slice field (if any)
From these, we can compute the LAYOUT. However, computing these in a generic context is highly non-trivial.
There are multiple avenues we could approach, but they all have pros and cons, and many are subject to timelines beyond our control since they require stabilizing currently-unstable Rust features. Our plan is to "race" them - try our best to drive progress on all fronts and see which approach gets to the finish line first. Since stabilizing unstable Rust features is a low-bandwidth/high-latency process, we expect this to be a reasonable use of our time.
Option 1: offset_of! and unsized align_of
The offset_of! macro provides the ability to directly query for the offset of a field within a type. It is currently unstable, and only supports sized types. If we could stabilize offset_of! and add support for unsized types, we could use it to query the offset of a type's trailing slice field.
We also need to support alignment. For this, we'd need align_of to drop its T: Sized bound.
Option 2: size_of_val_raw and align_of_val_raw
TODO
Option N: Manually implement Rust's repr(C) layout algorithm
TODO
Validating and computing casts
TODO: validate_cast_and_convert_metadata
Relationship to other traits
Ideally, we'd like KnownLayout to be a super-trait of FromZeroes (or TryFromBytes (#5) once FromZeroes: TryFromBytes). Unfortunately, FromZeroes supports types which KnownLayout cannot support: unsized types without a fixed representation (ie, repr(rust)). These types do not provide sufficient guarantees about their layout in order to satisfy KnownLayout's safety conditions. We have two options:
- Restrict
FromZeroes to only support types which KnownLayout can support, and then have FromZeroes: KnownLayout
- Use
KnownLayout as a bound on individual functions and methods
We intend for zerocopy to be a general-purpose library which supports use cases that do not require known representations (any use case which doesn't require a type's layout to correspond to a fixed specification, but only to be consistent within the context of a program's execution). The first option would preclude us from supporting these use cases, so we opt for the latter: we will use KnownLayout as a bound on individual functions and methods.
Deriving KnownLayout
Computing alignment
Use align_of_val_raw
Blocked on rust-lang/rust#69835
TODO
Use a #[repr(C)] type and field offset
Blocked on either rust-lang/rust#106655 or rust-lang/rust#69835
This design is prototyped in #576:
macro_rules! align_of {
($t:ty) => {{
#[repr(C)]
struct OffsetOfTrailingIsAlignment {
_byte: u8,
_trailing: $t,
}
trailing_field_offset!(OffsetOfTrailingIsAlignment, _trailing)
}};
}
This design relies on the trailing_field_offset! macro added in #540 and described below. This macro relies on stabilizing rust-lang/rust#69835. Alternatively, if we can stabilize offset_of! (rust-lang/rust#106655) and add support for using offset_of! with unsized types, then we can replace trailing_field_offset! here with offset_of!.
Computing trailing field offset
The DstLayout type needs to know the offset of the trailing slice within a DST. For example, given Bar:
#[repr(C)]
struct Foo {
u: u16,
tail: [u8],
}
#[repr(C)]
struct Bar {
u: u16,
foo: Foo,
}
...the trailing tail: [u8] is at offset 4 (after 2 bytes for Bar.u and 2 bytes for Foo.u).
In order to compute trailing slice offset in the context of a custom derive, it's necessary to compute recursively. In particular, given the offset of the trailing field within the outermost type (in this case, Bar.foo) and the offset of the trailing slice within the inner type (in this case, Foo.tail), it's possible to compute the offset of the trailing slice within the outer type as the sum of these two values. In all cases, this is a recursive computation that bottoms out at an actual slice type, [T], whose trailing slice offset is 0.
Thus the challenge is, given the AST of an arbitrary, possibly dynamically-sized type, to produce code which computes the byte offset of the trailing field. We have a few options.
offset_of!
Blocked on rust-lang/rust#106655
We could simply wait for the standard library's offset_of! macro to stabilize and add support for unsized types.
addr_of!
Blocked on rust-lang/rust#69835
Another option is to rely on the standard library's addr_of! macro. Given a raw pointer to the beginning of a type, the expression addr_of!((*ptr).trailing_field) will compute the address of trailing field, and the raw pointer method offset_from can be used to compute the byte offset between these two pointers, effectively computing the trailing field offset.
Unfortunately, place projection the addr_of! macro performs a place projection, and even if the pointers are never dereferenced, place projections may only happen inside the bounds of a valid allocation. Per The Reference, the following is undefined behavior:
Performing a place projection that violates the requirements of in-bounds pointer arithmetic. A place projection is a field expression, a tuple index expression, or an array/slice index expression.
Thus, in order to use addr_of!, we need a valid allocation which is large enough that the field projection will not go out-of-bounds (the allocation does not need to be properly aligned). It is fairly trivial to produce a large allocation, even at const time:
const LARGE_ALLOCATION: NonNull<[u8]> = {
const REF: &[u8; 1 << 16] = &[0; 1 << 16];
let ptr: *const [u8; 1 << 16] = REF;
let ptr: *const [u8] = ptr::slice_from_raw_parts(ptr.cast(), 1 << 16);
unsafe { NonNull::new_unchecked(ptr.cast_mut()) }
};
This, however, isn't enough. We also need to bounds check our allocation to make sure that the field projection won't go out-of-bounds. While it's unlikely we'd ever encounter a type with a trailing field offset of 2^16 bytes, it's not impossible, and we can't exhibit undefined behavior if we encounter such a type.
In order to perform the bounds check, we need some way of obtaining an upper bound for the trailing field offset before we've actually computed it. The only way of doing this (to my knowledge) is to calculate the size of the smallest possible value of the type (ie, the value with 0 trailing slice elements). We can't construct such an instance in the general case, but we can construct a raw pointer to such an instance:
// A `*const [()]` with 0 elements.
let slc = core::ptr::slice_from_raw_parts(&() as *const _, 0);
let t = slc as *const T;
This relies on behavior which is currently not well-defined, but is in review as of this writing.
However, once we have this raw pointer, we need to know its size. The only way to do this is with the currently-unstable size_of_val_raw. Once we've computed the size of the smallest possible value for our type (ie, size_of_val_raw(t)), we have an upper bound for the offset of the trailing field; so long as this value is not larger than the size of our allocation, the field projection is guaranteed to be in-bounds, allowing us to soundly compute the trailing field offset.
Old text
Add a replacement for MaybeUninit which supports unsized types. Add support for conversions on custom dynamically sized-types (DSTs).
Status
Phase 1 is in review in #312.
Motivation
This proposal aims to solve two problems using a single, unified design.
Unsized MaybeUninit
The standard library's MaybeUninit type don't support wrapping unsized types. MaybeUninit is a core building block of important designs like TryFromBytes. So long as it doesn't support unsized types, designs like TryFromBytes also can't support unsized types.
Custom DSTs
While zerocopy supports conversions on slice types, it doesn't support conversions on custom dynamically-sized types (DSTs) like:
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpPacket {
header: UdpHeader,
body: [u8], // Unsized field makes this type a "custom DST"
}
Currently, users such as packet-formats instead write code like:
#[derive(FromZeroes, FromBytes, AsBytes)]
#[repr(C)]
struct UdpHeader { ... }
struct UdpPacket<B> {
header: Ref<B, UdpHeader>,
body: B,
}
The latter code is more cumbersome, and requires storing an extra pointer in order to refer to a UDP packet in memory.
The ability to support custom DSTs is a frequent request from our users.
Design
This design comes in two phases; the first phase can be implemented without the second phase, and will provide value on its own.
In Phase 1, support is added for a MaybeUninit-like type that supports wrapping both T: Sized and [T] where T: Sized. This will unlock the TryFromBytes design as described above. In the Phase 2, support is added for a KnownLayout trait which provides a superset of the functionality from Phase 1, and supports zero-copy conversion of arbitrary custom DSTs. A custom derive is provided KnownLayout.
Phase 1: MaybeUninit
The standard library's MaybeUninit<T> type has the same layout as T, but it has no bit validity constraints - any byte value, including an uninitialized byte, may be written at any byte offset in a MaybeUninit<T>. A replacement for this type just needs to have these semantics, and also needs to support unsized types.
Our design builds upon the fact that MaybeUninit exists and works for sized types. We define the following trait:
pub unsafe trait AsMaybeUninit {
/// A type which has the same layout as `Self`, but which has no validity
/// constraints.
///
/// Roughly speaking, this type is equivalent to what the standard library's
/// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
type MaybeUninit: ?Sized;
}
For Sized types, the implementation is trivial:
unsafe impl<T: Sized> AsMaybeUninit for T {
type MaybeUninit = core::mem::MaybeUninit<T>;
}
For all other types, we use the standard library's MaybeUninit type as a building block to build up a type whose layout is the same as what MaybeUninit's would be if it supported unsized types:
unsafe impl<T: Sized> AsMaybeUninit for [T] {
type MaybeUninit = [core::mem::MaybeUninit<T>];
}
unsafe impl AsMaybeUninit for str {
type MaybeUninit = <[u8] as AsMaybeUninit>::MaybeUninit;
}
Finally, we add our own MaybeUninit type, which is simply a convenience wrapper:
#[repr(transparent)]
pub struct MaybeUninit<T: AsMaybeUninit + ?Sized> {
inner: T::MaybeUninit,
}
// The equivalent impl for `MaybeUninit<T>` is already covered by the blanket impl for `T: Sized`.
unsafe impl<T: Sized> AsMaybeUninit for MaybeUninit<[T]> {
type MaybeUninit = [<T as AsMaybeUninit>::MaybeUninit];
}
In Phase 1, these are the only supported unsized types. In Phase 2, we allow deriving AsMaybeUninit on arbitrary types, which adds support for custom DSTs.
Safety invariants
The safety invariants on AsMaybeUninit are somewhat complex. This is mostly a result of needing to support unsized types. For a more in-depth explanation of why supporting unsized types introduces a lot of complexity, see here.
pub unsafe trait AsMaybeUninit {
/// A type which has the same layout as `Self`, but which has no validity
/// constraints.
///
/// Roughly speaking, this type is equivalent to what the standard library's
/// [`MaybeUninit<Self>`] would be if `MaybeUninit` supported unsized types.
///
/// # Safety
///
/// For `T: AsMaybeUninit`, the following must hold:
/// - Given `m: T::MaybeUninit`, it is sound to write any byte value,
/// including an uninitialized byte, at any byte offset in `m`
/// - `T` and `T::MaybeUninit` have the same alignment requirement
/// - It is valid to use an `as` cast to convert a `t: *const T` to a `m:
/// *const T::MaybeUninit` and vice-versa (and likewise for `*mut T`/`*mut
/// T::MaybeUninit`). Regardless of which direction the conversion was
/// performed, the sizes of the pointers' referents are always equal (in
/// terms of an API which is not yet stable, `size_of_val_raw(t) ==
/// size_of_val_raw(m)`).
/// - `T::MaybeUninit` contains [`UnsafeCell`]s at exactly the same byte
/// ranges that `T` does.
///
/// [`MaybeUninit<Self>`]: core::mem::MaybeUninit
/// [`UnsafeCell`]: core::cell::UnsafeCell
type MaybeUninit: ?Sized;
}
Let's walk through these:
- The reason that we describe bit validity in terms of writing to an existing value (rather than producing a new value) is that it allows us to sidestep a lot of the complexity of defining size equivalence between unsized types. This requirement is just a generalization of what you'd write for sized types: something like, "it must be valid to produce a
T::MaybeUninit containing any bit pattern, including uninitialized bytes."
- We speak of
T and T::MaybeUninit's alignment in prose rather than by referring to core::mem::align_of because that function requires that its type argument is sized.
- The
as cast requirement ensures that T and T::MaybeUninit are either both sized or have compatible unsized types. In particular, Rust prohibits sized-to-unsized casts; requiring both directions to be valid ensures that casts are sized-to-sized or unsized-to-unsized. The size equality constraint ensures that:
- For sized types, sizes are equal
- For custom DSTs, the trailing slice element's sizes are equal; thus, a
T and a T::MaybeUninit whose trailing slices are of the same length have the same size
- It is intended to be sound to synthesize two references to the same memory region where only one reference treats that region as containing an
UnsafeCell. In other words, interior mutability soundness is a runtime property that is only violated when an UnsafeCell is used, not merely when a reference to it is constructed. Unfortunately, this is not actually currently guaranteed, and is unsound under the stacked borrows memory model. Thus, we need to ensure that UnsafeCells in T and T::MaybeUninit line up perfectly.
Pointer conversion
We want to be able to provide unsized equivalents of the assume_init_ref and assume_init_mut methods. However, the naive implementation doesn't work:
impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
pub unsafe fn assume_init_ref(&self) -> &T {
let ptr = (&self.inner) as *const T::MaybeUninit as *const T;
unsafe { &*ptr }
}
}
The *const T::MaybeUninit as *mut T cast isn't valid in a generic context where T: ?Sized because Rust doesn't know what type of pointers these are (thin, fat, what kind of fat pointer, etc). In order to get around this problem, we add the following methods to AsMaybeUninit:
fn raw_from_maybe_uninit(maybe_uninit: *const Self::MaybeUninit) -> *const Self;
fn raw_mut_from_maybe_uninit(maybe_uninit: *mut Self::MaybeUninit) -> *mut Self;
This allows us to get assume_init_ref and assume_init_mut to work:
impl<T: AsMaybeUninit + ?Sized> MaybeUninit<T> {
pub unsafe fn assume_init_ref(&self) -> &T {
let ptr = T::raw_from_maybe_uninit(&self.inner);
unsafe { &*ptr }
}
}
Phase 2: KnownLayout
TODO
TODO: Mention that this will require removing the blanket impl of AsMaybeUninit for T: Sized, which will be a breaking change. We need to not publish in between Phases 1 and 2.
Alternatives
-
For MaybeUninit, we could propose a change to Rust to add limited support for unsized unions, which would allow MaybeUninit (which is a union type) to support unsized types. Even if such a proposal were accepted, it would likely take months or years to be stabilized and thus available for our use.
-
For custom DSTs, we could wait until the ptr_metadata and layout_for_ptr features stabilize, which would allow us to rewrite Ref like this:
use core::ptr::Pointee;
pub struct Ref<B, T: ?Sized> {
data: *mut (),
meta: <T as Pointee>::Metadata,
_marker: PhantomData<(B, T)>,
}
For Sized types, T::Metadata would be (), and so Ref would consume only a single word (rather than two, as it does today).
On its own, we could still support slice types (T = [U]) as we do today using from_raw_parts and from_raw_parts_mut. However, using layout_for_ptr, we could also determine a DST's prefix and element sizes and thus support arbitrary DSTs rather than just slices.
This design would work, but it's unlikely that either of these features will stabilize soon, and we need something in the meantime. The design presented also unblocks unsized MaybeUninit, which these features wouldn't help with.
Overview
The
KnownLayouttrait encodes enough information about a type's layout to be able to synthesize pointers to values of that type at runtime. While we can already do this for sized types and slice types,KnownLayoutgeneralizes this to "slice DSTs" - types with leading fixed-sized fields followed by a trailing slice.Progress
constcodeKnownLayoutfor sized typesKnownLayoutfor unsized types (to de-risk, see which of the following three approaches finishes first)T: Sizedbound fromalign_of<T>()offset_of!and expand it to support unsized typesalign_of_val_rawoffset_of!and expand it to support unsized typessize_of_val_rawrepr(C)layout algorithmDstLayout::extend#633DstLayout::pad_to_align#638derive(KnownLayout)on DSTs #643KnownLayoutinRef, and use this to supportKnownLayoutin other places#[doc(hidden)]Refconstructor which takesT: ?Sized + KnownLayout, and is named something likenew_known_layout_name_to_be_bikesheddedKnownLayout, removeSizedfrom some APIs #967Refmethod named something likederef_known_layout_name_to_be_bikeshedded(and similar forderef_mut). This method should not have aT: FromBytesbound, and should instead beunsafeand require that the containedTis valid as a safety precondition (this allows it to be used inTryFromBytes).KnownLayout, removeSizedfrom some APIs #967TryFromBytes::try_from_ref) to internally construct aRefand then use the appropriate private deref methodKnownLayout, removeSizedfrom some APIs #967T: KnownLayoutrequiresT's fields to beKnownLayout#1162RequireKnownLayoutto apply recursively to all fields to remain forwards-compatible with #494size_of::<T>())repr(transparent)types (and maybe other reprs?)KnownLayouton unsizedrepr(transparent)types #1477FromBytesmethods and/orRefconstructors which operate on the whole buffer (rather than just the prefix/suffix) and take an explicit trailing element countwith_elemsRefconstructors #1258TODO(#29)commentsKnownLayoutviolates private-in-public rules in some cases #1292KnownLayoutpublic (as of this writing, this will be in 0.8 per #671):#[doc(hidden)]from all itemsT: Sizedbound withT: ?Sized + KnownLayout; merge these APIs with any private APIs which were added during development (such asRef::new_known_layout_name_to_be_bikeshedded)FromBytes:: ref_from_prefixandref_from_suffixhave stale doc comments that refer tosize_of::<Self>()Ref::new,new_from_prefix, andnew_from_suffixhave stale doc comments that refer tosize_of::<T>()Motivation
Many users use zerocopy by defining types whose layouts match the layouts of data that needs to be parsed, serialized, or otherwise interacted with. Currently, zerocopy supports types with fixed size ("sized" types) and slices (a repeated structure of 0 or more instances of a sized type). However, many of our users have data which is not shaped like either of these.
In particular, many of our users have data which is shaped like a "slice-based dynamically-sized type" or "slice DST". A slice DST is a type with some number of fixed-sized fields followed by a trailing slice field. For example:
In this example, all
UdpPackets have the same fixedUdpHeader, but may have a variable number of bytes in the trailingbodyfield. The layout ofUdpPacketexactly matches the layout of a UDP packet on the wire.This design aims to support users whose data can be modeled by slice DSTs.
Design
What
KnownLayoutneeds to encodeKnownLayoutneeds to provide any information required in order to perform a conversion from&[u8]to&TwhereTis a sized type, a slice type, or a slice DST. This conversion may be fallible due to size or alignment.In order to support these conversions, zerocopy must know the following:
This is encoded in the
DstLayouttype:Computing layout
In order to compute
KnownLayout's associatedLAYOUTconstant, we use a recursive approach:align_ofandsize_ofalign_ofandsize_ofto get the alignment and size of the element type; the slice's layout can be computed from these valuesFor composite types (namely in the context of
#[derive(KnownLayout)]), we need to extract a few pieces of information:From these, we can compute the
LAYOUT. However, computing these in a generic context is highly non-trivial.There are multiple avenues we could approach, but they all have pros and cons, and many are subject to timelines beyond our control since they require stabilizing currently-unstable Rust features. Our plan is to "race" them - try our best to drive progress on all fronts and see which approach gets to the finish line first. Since stabilizing unstable Rust features is a low-bandwidth/high-latency process, we expect this to be a reasonable use of our time.
Option 1:
offset_of!and unsizedalign_ofThe
offset_of!macro provides the ability to directly query for the offset of a field within a type. It is currently unstable, and only supports sized types. If we could stabilizeoffset_of!and add support for unsized types, we could use it to query the offset of a type's trailing slice field.We also need to support alignment. For this, we'd need
align_ofto drop itsT: Sizedbound.Option 2:
size_of_val_rawandalign_of_val_rawTODO
Option N: Manually implement Rust's
repr(C)layout algorithmTODO
Validating and computing casts
TODO:
validate_cast_and_convert_metadataRelationship to other traits
Ideally, we'd like
KnownLayoutto be a super-trait ofFromZeroes(orTryFromBytes(#5) onceFromZeroes: TryFromBytes). Unfortunately,FromZeroessupports types whichKnownLayoutcannot support: unsized types without a fixed representation (ie,repr(rust)). These types do not provide sufficient guarantees about their layout in order to satisfyKnownLayout's safety conditions. We have two options:FromZeroesto only support types whichKnownLayoutcan support, and then haveFromZeroes: KnownLayoutKnownLayoutas a bound on individual functions and methodsWe intend for zerocopy to be a general-purpose library which supports use cases that do not require known representations (any use case which doesn't require a type's layout to correspond to a fixed specification, but only to be consistent within the context of a program's execution). The first option would preclude us from supporting these use cases, so we opt for the latter: we will use
KnownLayoutas a bound on individual functions and methods.Deriving
KnownLayoutComputing alignment
Use
align_of_val_rawBlocked on rust-lang/rust#69835
TODO
Use a
#[repr(C)]type and field offsetBlocked on either rust-lang/rust#106655 or rust-lang/rust#69835
This design is prototyped in #576:
This design relies on the
trailing_field_offset!macro added in #540 and described below. This macro relies on stabilizing rust-lang/rust#69835. Alternatively, if we can stabilizeoffset_of!(rust-lang/rust#106655) and add support for usingoffset_of!with unsized types, then we can replacetrailing_field_offset!here withoffset_of!.Computing trailing field offset
The
DstLayouttype needs to know the offset of the trailing slice within a DST. For example, givenBar:...the trailing
tail: [u8]is at offset 4 (after 2 bytes forBar.uand 2 bytes forFoo.u).In order to compute trailing slice offset in the context of a custom derive, it's necessary to compute recursively. In particular, given the offset of the trailing field within the outermost type (in this case,
Bar.foo) and the offset of the trailing slice within the inner type (in this case,Foo.tail), it's possible to compute the offset of the trailing slice within the outer type as the sum of these two values. In all cases, this is a recursive computation that bottoms out at an actual slice type,[T], whose trailing slice offset is 0.Thus the challenge is, given the AST of an arbitrary, possibly dynamically-sized type, to produce code which computes the byte offset of the trailing field. We have a few options.
offset_of!Blocked on rust-lang/rust#106655
We could simply wait for the standard library's
offset_of!macro to stabilize and add support for unsized types.addr_of!Blocked on rust-lang/rust#69835
Another option is to rely on the standard library's
addr_of!macro. Given a raw pointer to the beginning of a type, the expressionaddr_of!((*ptr).trailing_field)will compute the address of trailing field, and the raw pointer methodoffset_fromcan be used to compute the byte offset between these two pointers, effectively computing the trailing field offset.Unfortunately, place projection the
addr_of!macro performs a place projection, and even if the pointers are never dereferenced, place projections may only happen inside the bounds of a valid allocation. Per The Reference, the following is undefined behavior:Thus, in order to use
addr_of!, we need a valid allocation which is large enough that the field projection will not go out-of-bounds (the allocation does not need to be properly aligned). It is fairly trivial to produce a large allocation, even at const time:This, however, isn't enough. We also need to bounds check our allocation to make sure that the field projection won't go out-of-bounds. While it's unlikely we'd ever encounter a type with a trailing field offset of 2^16 bytes, it's not impossible, and we can't exhibit undefined behavior if we encounter such a type.
In order to perform the bounds check, we need some way of obtaining an upper bound for the trailing field offset before we've actually computed it. The only way of doing this (to my knowledge) is to calculate the size of the smallest possible value of the type (ie, the value with 0 trailing slice elements). We can't construct such an instance in the general case, but we can construct a raw pointer to such an instance:
This relies on behavior which is currently not well-defined, but is in review as of this writing.
However, once we have this raw pointer, we need to know its size. The only way to do this is with the currently-unstable
size_of_val_raw. Once we've computed the size of the smallest possible value for our type (ie,size_of_val_raw(t)), we have an upper bound for the offset of the trailing field; so long as this value is not larger than the size of our allocation, the field projection is guaranteed to be in-bounds, allowing us to soundly compute the trailing field offset.Old text
Add a replacement for
MaybeUninitwhich supports unsized types. Add support for conversions on custom dynamically sized-types (DSTs).Status
Phase 1 is in review in #312.
Motivation
This proposal aims to solve two problems using a single, unified design.
Unsized
MaybeUninitThe standard library's
MaybeUninittype don't support wrapping unsized types.MaybeUninitis a core building block of important designs likeTryFromBytes. So long as it doesn't support unsized types, designs likeTryFromBytesalso can't support unsized types.Custom DSTs
While zerocopy supports conversions on slice types, it doesn't support conversions on custom dynamically-sized types (DSTs) like:
Currently, users such as packet-formats instead write code like:
The latter code is more cumbersome, and requires storing an extra pointer in order to refer to a UDP packet in memory.
The ability to support custom DSTs is a frequent request from our users.
Design
This design comes in two phases; the first phase can be implemented without the second phase, and will provide value on its own.
In Phase 1, support is added for a
MaybeUninit-like type that supports wrapping bothT: Sizedand[T] where T: Sized. This will unlock theTryFromBytesdesign as described above. In the Phase 2, support is added for aKnownLayouttrait which provides a superset of the functionality from Phase 1, and supports zero-copy conversion of arbitrary custom DSTs. A custom derive is providedKnownLayout.Phase 1:
MaybeUninitThe standard library's
MaybeUninit<T>type has the same layout asT, but it has no bit validity constraints - any byte value, including an uninitialized byte, may be written at any byte offset in aMaybeUninit<T>. A replacement for this type just needs to have these semantics, and also needs to support unsized types.Our design builds upon the fact that
MaybeUninitexists and works for sized types. We define the following trait:For
Sizedtypes, the implementation is trivial:For all other types, we use the standard library's
MaybeUninittype as a building block to build up a type whose layout is the same as whatMaybeUninit's would be if it supported unsized types:Finally, we add our own
MaybeUninittype, which is simply a convenience wrapper:In Phase 1, these are the only supported unsized types. In Phase 2, we allow deriving
AsMaybeUniniton arbitrary types, which adds support for custom DSTs.Safety invariants
The safety invariants on
AsMaybeUninitare somewhat complex. This is mostly a result of needing to support unsized types. For a more in-depth explanation of why supporting unsized types introduces a lot of complexity, see here.Let's walk through these:
T::MaybeUninitcontaining any bit pattern, including uninitialized bytes."TandT::MaybeUninit's alignment in prose rather than by referring tocore::mem::align_ofbecause that function requires that its type argument is sized.ascast requirement ensures thatTandT::MaybeUninitare either both sized or have compatible unsized types. In particular, Rust prohibits sized-to-unsized casts; requiring both directions to be valid ensures that casts are sized-to-sized or unsized-to-unsized. The size equality constraint ensures that:Tand aT::MaybeUninitwhose trailing slices are of the same length have the same sizeUnsafeCell. In other words, interior mutability soundness is a runtime property that is only violated when anUnsafeCellis used, not merely when a reference to it is constructed. Unfortunately, this is not actually currently guaranteed, and is unsound under the stacked borrows memory model. Thus, we need to ensure thatUnsafeCells inTandT::MaybeUninitline up perfectly.Pointer conversion
We want to be able to provide unsized equivalents of the
assume_init_refandassume_init_mutmethods. However, the naive implementation doesn't work:The
*const T::MaybeUninit as *mut Tcast isn't valid in a generic context whereT: ?Sizedbecause Rust doesn't know what type of pointers these are (thin, fat, what kind of fat pointer, etc). In order to get around this problem, we add the following methods toAsMaybeUninit:This allows us to get
assume_init_refandassume_init_mutto work:Phase 2:
KnownLayoutTODO
TODO: Mention that this will require removing the blanket impl of
AsMaybeUninitforT: Sized, which will be a breaking change. We need to not publish in between Phases 1 and 2.Alternatives
For
MaybeUninit, we could propose a change to Rust to add limited support for unsized unions, which would allowMaybeUninit(which is a union type) to support unsized types. Even if such a proposal were accepted, it would likely take months or years to be stabilized and thus available for our use.For custom DSTs, we could wait until the
ptr_metadataandlayout_for_ptrfeatures stabilize, which would allow us to rewriteReflike this:For
Sizedtypes,T::Metadatawould be(), and soRefwould consume only a single word (rather than two, as it does today).On its own, we could still support slice types (
T = [U]) as we do today usingfrom_raw_partsandfrom_raw_parts_mut. However, usinglayout_for_ptr, we could also determine a DST's prefix and element sizes and thus support arbitrary DSTs rather than just slices.This design would work, but it's unlikely that either of these features will stabilize soon, and we need something in the meantime. The design presented also unblocks unsized
MaybeUninit, which these features wouldn't help with.