Background and Motivation
Rounding a number to the closest >= power of two is a relatively common operation to perform, especially for developers implementing custom collection type that require a power of two to be used in one of the internal data structures (for reference, CoreCLR uses this method in the ConcurrentQueue type (here). This proposal has two main points:
- Expose this API from the
BitOperations class
- Optimize it using intrinsics
Proposed API
namespace System.Numerics
{
public static class BitOperations
{
+ public static uint RoundUpToPowerOf2(uint i);
+ public static ulong RoundUpToPowerOf2(ulong i);
}
}
Usage Examples
I'm using this same method in a couple places in the Microsoft.Toolkit.HighPerformance package:
- In the
StringPool type, to initialize the internal buckets for the cached string-s (to get a fast % op)
- In the
ArrayPoolBufferWriter<T> type, to round up the requested size to ArrayPool<T> and avoid repeated new[] allocations
Within CoreCLR, there's also that ConcurrentQueue usage example I mentioned above.
Details
The implementation would include vectorized paths like LeadingZeroCount does, checking for Lzcnt, ArmBase and then X86Base, or alternatively it would use software fallback currently (always) used within ConcurrentQueue.
Notes
If the API proposal is approved I'd be happy to help out and make a PR for this 😄
Background and Motivation
Rounding a number to the closest >= power of two is a relatively common operation to perform, especially for developers implementing custom collection type that require a power of two to be used in one of the internal data structures (for reference, CoreCLR uses this method in the
ConcurrentQueuetype (here). This proposal has two main points:BitOperationsclassProposed API
namespace System.Numerics { public static class BitOperations { + public static uint RoundUpToPowerOf2(uint i); + public static ulong RoundUpToPowerOf2(ulong i); } }Usage Examples
I'm using this same method in a couple places in the
Microsoft.Toolkit.HighPerformancepackage:StringPooltype, to initialize the internal buckets for the cachedstring-s (to get a fast%op)ArrayPoolBufferWriter<T>type, to round up the requested size toArrayPool<T>and avoid repeated new[] allocationsWithin CoreCLR, there's also that
ConcurrentQueueusage example I mentioned above.Details
The implementation would include vectorized paths like
LeadingZeroCountdoes, checking forLzcnt,ArmBaseand thenX86Base, or alternatively it would use software fallback currently (always) used withinConcurrentQueue.Notes
If the API proposal is approved I'd be happy to help out and make a PR for this 😄