x86_64: Replace rej_uniform intrinsics with assembly by jakemas · Pull Request #1014 · pq-code-package/mldsa-native

jakemas · 2026-04-03T04:11:25Z

Summary

Resolves #926 and #418 (?)

Hol-light proof needs instructions from awslabs/s2n-bignum#387

Replace AVX2 intrinsics implementation of rej_uniform with hand-written x86_64 assembly
Table passed as parameter (consistent with aarch64 approach), avoiding external symbol references for simpasm compatibility
All constants constructed from immediates (no .rodata section), enabling future HOL-Light formal verification
Register name #defines with #undef cleanup for SCU builds (following mlkem-native pattern)
Adds poly_uniform to component benchmark
HOL-Light proof infrastructure included (bytecode, table definition, proof skeleton, Makefile)

ML-DSA's 23-bit coefficients require 32-bit lanes, which naturally fills a 256-bit YMM register for 8 elements per iteration. This led to the choice of AVX2 over SSE — with SSE's 128-bit registers and 32-bit lanes, we'd only get 4 coefficients per iteration vs 8 with AVX2.

Performance

AMD EPYC 3rd gen (c6a) — opt

Benchmark	Before	After	Change
ML-DSA-44 keypair	68,874	66,828	-3%
ML-DSA-44 sign	187,594	184,181	-2%
ML-DSA-44 verify	68,993	65,665	-5%
ML-DSA-65 keypair	119,089	112,640	-5%
ML-DSA-65 sign	299,488	294,836	-2%
ML-DSA-65 verify	115,385	108,494	-6%
ML-DSA-87 keypair	203,754	185,518	-9%
ML-DSA-87 sign	396,462	378,579	-5%
ML-DSA-87 verify	196,231	177,157	-10%

Proof

Includes HOL-Light and CBMC proofs, written by claude opus 4.7.

HOL-Light / x86_64 HOL Light proof for mldsa_rej_uniform.S (pull_request) Successful in 12m

No constant time/SAFE proof yet. Will continue to work on it as the instruction PR lands in s2n-bignum.

github-actions

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`113118` cycles	`113013` cycles	`1.00`
`ML-DSA-44 sign`	`355649` cycles	`355605` cycles	`1.00`
`ML-DSA-44 verify`	`117801` cycles	`117682` cycles	`1.00`
`ML-DSA-65 keypair`	`196381` cycles	`196214` cycles	`1.00`
`ML-DSA-65 sign`	`589557` cycles	`588943` cycles	`1.00`
`ML-DSA-65 verify`	`194604` cycles	`194375` cycles	`1.00`
`ML-DSA-87 keypair`	`322210` cycles	`322148` cycles	`1.00`
`ML-DSA-87 sign`	`752493` cycles	`752763` cycles	`1.00`
`ML-DSA-87 verify`	`320055` cycles	`319900` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

github-actions

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`212361` cycles	`212622` cycles	`1.00`
`ML-DSA-44 sign`	`760716` cycles	`760066` cycles	`1.00`
`ML-DSA-44 verify`	`228743` cycles	`228987` cycles	`1.00`
`ML-DSA-65 keypair`	`379384` cycles	`379665` cycles	`1.00`
`ML-DSA-65 sign`	`1250617` cycles	`1249827` cycles	`1.00`
`ML-DSA-65 verify`	`371531` cycles	`372045` cycles	`1.00`
`ML-DSA-87 keypair`	`604335` cycles	`605426` cycles	`1.00`
`ML-DSA-87 sign`	`1593243` cycles	`1591413` cycles	`1.00`
`ML-DSA-87 verify`	`618270` cycles	`617375` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`66830` cycles	`68874` cycles	`0.97`
`ML-DSA-44 sign`	`184077` cycles	`187594` cycles	`0.98`
`ML-DSA-44 verify`	`65562` cycles	`68993` cycles	`0.95`
`ML-DSA-65 keypair`	`111959` cycles	`119089` cycles	`0.94`
`ML-DSA-65 sign`	`292002` cycles	`299488` cycles	`0.98`
`ML-DSA-65 verify`	`108472` cycles	`115385` cycles	`0.94`
`ML-DSA-87 keypair`	`185520` cycles	`203754` cycles	`0.91`
`ML-DSA-87 sign`	`379630` cycles	`396462` cycles	`0.96`
`ML-DSA-87 verify`	`177291` cycles	`196231` cycles	`0.90`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`68316` cycles	`68121` cycles	`1.00`
`ML-DSA-44 sign`	`202487` cycles	`202429` cycles	`1.00`
`ML-DSA-44 verify`	`70722` cycles	`70691` cycles	`1.00`
`ML-DSA-65 keypair`	`121061` cycles	`121050` cycles	`1.00`
`ML-DSA-65 sign`	`331574` cycles	`332242` cycles	`1.00`
`ML-DSA-65 verify`	`117810` cycles	`118169` cycles	`1.00`
`ML-DSA-87 keypair`	`198140` cycles	`198283` cycles	`1.00`
`ML-DSA-87 sign`	`427941` cycles	`428124` cycles	`1.00`
`ML-DSA-87 verify`	`194637` cycles	`194645` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 3rd gen (c6a) (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`134578` cycles	`135123` cycles	`1.00`
`ML-DSA-44 sign`	`523923` cycles	`523989` cycles	`1.00`
`ML-DSA-44 verify`	`147640` cycles	`147421` cycles	`1.00`
`ML-DSA-65 keypair`	`228634` cycles	`227032` cycles	`1.01`
`ML-DSA-65 sign`	`864042` cycles	`860343` cycles	`1.00`
`ML-DSA-65 verify`	`236700` cycles	`234883` cycles	`1.01`
`ML-DSA-87 keypair`	`371955` cycles	`371568` cycles	`1.00`
`ML-DSA-87 sign`	`1080535` cycles	`1079389` cycles	`1.00`
`ML-DSA-87 verify`	`383811` cycles	`383403` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`56863` cycles	`56287` cycles	`1.01`
`ML-DSA-44 sign`	`181063` cycles	`181562` cycles	`1.00`
`ML-DSA-44 verify`	`61140` cycles	`61061` cycles	`1.00`
`ML-DSA-65 keypair`	`98291` cycles	`98770` cycles	`1.00`
`ML-DSA-65 sign`	`298368` cycles	`299116` cycles	`1.00`
`ML-DSA-65 verify`	`100343` cycles	`100251` cycles	`1.00`
`ML-DSA-87 keypair`	`152430` cycles	`153265` cycles	`0.99`
`ML-DSA-87 sign`	`354719` cycles	`355417` cycles	`1.00`
`ML-DSA-87 verify`	`153124` cycles	`153884` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton4 (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`128315` cycles	`128272` cycles	`1.00`
`ML-DSA-44 sign`	`447513` cycles	`447600` cycles	`1.00`
`ML-DSA-44 verify`	`138123` cycles	`144678` cycles	`0.95`
`ML-DSA-65 keypair`	`220541` cycles	`220481` cycles	`1.00`
`ML-DSA-65 sign`	`726484` cycles	`726951` cycles	`1.00`
`ML-DSA-65 verify`	`222926` cycles	`223461` cycles	`1.00`
`ML-DSA-87 keypair`	`366142` cycles	`366604` cycles	`1.00`
`ML-DSA-87 sign`	`927541` cycles	`927414` cycles	`1.00`
`ML-DSA-87 verify`	`374016` cycles	`373875` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`72353` cycles	`72235` cycles	`1.00`
`ML-DSA-44 sign`	`212424` cycles	`212375` cycles	`1.00`
`ML-DSA-44 verify`	`75754` cycles	`75714` cycles	`1.00`
`ML-DSA-65 keypair`	`127646` cycles	`127612` cycles	`1.00`
`ML-DSA-65 sign`	`351030` cycles	`350845` cycles	`1.00`
`ML-DSA-65 verify`	`125627` cycles	`125755` cycles	`1.00`
`ML-DSA-87 keypair`	`205980` cycles	`208476` cycles	`0.99`
`ML-DSA-87 sign`	`444778` cycles	`450018` cycles	`0.99`
`ML-DSA-87 verify`	`205601` cycles	`205843` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 3rd gen (c6i) (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`157499` cycles	`157541` cycles	`1.00`
`ML-DSA-44 sign`	`549244` cycles	`549413` cycles	`1.00`
`ML-DSA-44 verify`	`169448` cycles	`168865` cycles	`1.00`
`ML-DSA-65 keypair`	`268437` cycles	`268818` cycles	`1.00`
`ML-DSA-65 sign`	`903422` cycles	`903672` cycles	`1.00`
`ML-DSA-65 verify`	`275283` cycles	`274680` cycles	`1.00`
`ML-DSA-87 keypair`	`448241` cycles	`448464` cycles	`1.00`
`ML-DSA-87 sign`	`1158654` cycles	`1157970` cycles	`1.00`
`ML-DSA-87 verify`	`458704` cycles	`458043` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`42142` cycles	`40662` cycles	`1.04`
`ML-DSA-44 sign`	`134317` cycles	`132808` cycles	`1.01`
`ML-DSA-44 verify`	`44844` cycles	`43607` cycles	`1.03`
`ML-DSA-65 keypair`	`72940` cycles	`71859` cycles	`1.02`
`ML-DSA-65 sign`	`213861` cycles	`213367` cycles	`1.00`
`ML-DSA-65 verify`	`73729` cycles	`72847` cycles	`1.01`
`ML-DSA-87 keypair`	`107003` cycles	`109237` cycles	`0.98`
`ML-DSA-87 sign`	`250851` cycles	`254550` cycles	`0.99`
`ML-DSA-87 verify`	`107681` cycles	`109371` cycles	`0.98`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

AMD EPYC 4th gen (c7a) (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`120754` cycles	`120325` cycles	`1.00`
`ML-DSA-44 sign`	`447570` cycles	`447576` cycles	`1.00`
`ML-DSA-44 verify`	`130511` cycles	`130561` cycles	`1.00`
`ML-DSA-65 keypair`	`205040` cycles	`205018` cycles	`1.00`
`ML-DSA-65 sign`	`728790` cycles	`729474` cycles	`1.00`
`ML-DSA-65 verify`	`210029` cycles	`209605` cycles	`1.00`
`ML-DSA-87 keypair`	`337610` cycles	`336678` cycles	`1.00`
`ML-DSA-87 sign`	`925517` cycles	`924223` cycles	`1.00`
`ML-DSA-87 verify`	`347563` cycles	`347399` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton3 (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`138744` cycles	`138561` cycles	`1.00`
`ML-DSA-44 sign`	`483982` cycles	`484140` cycles	`1.00`
`ML-DSA-44 verify`	`148574` cycles	`162388` cycles	`0.91`
`ML-DSA-65 keypair`	`241921` cycles	`241950` cycles	`1.00`
`ML-DSA-65 sign`	`792702` cycles	`792591` cycles	`1.00`
`ML-DSA-65 verify`	`240763` cycles	`241288` cycles	`1.00`
`ML-DSA-87 keypair`	`396106` cycles	`397138` cycles	`1.00`
`ML-DSA-87 sign`	`1013453` cycles	`1013569` cycles	`1.00`
`ML-DSA-87 verify`	`403446` cycles	`403178` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`113189` cycles	`113255` cycles	`1.00`
`ML-DSA-44 sign`	`355791` cycles	`356042` cycles	`1.00`
`ML-DSA-44 verify`	`117978` cycles	`117969` cycles	`1.00`
`ML-DSA-65 keypair`	`196342` cycles	`196623` cycles	`1.00`
`ML-DSA-65 sign`	`589183` cycles	`589242` cycles	`1.00`
`ML-DSA-65 verify`	`194553` cycles	`194559` cycles	`1.00`
`ML-DSA-87 keypair`	`322537` cycles	`322281` cycles	`1.00`
`ML-DSA-87 sign`	`753613` cycles	`753546` cycles	`1.00`
`ML-DSA-87 verify`	`320115` cycles	`320070` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Graviton2 (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`213219` cycles	`212521` cycles	`1.00`
`ML-DSA-44 sign`	`761553` cycles	`760970` cycles	`1.00`
`ML-DSA-44 verify`	`241351` cycles	`234237` cycles	`1.03`
`ML-DSA-65 keypair`	`380573` cycles	`379762` cycles	`1.00`
`ML-DSA-65 sign`	`1252452` cycles	`1252199` cycles	`1.00`
`ML-DSA-65 verify`	`372839` cycles	`371797` cycles	`1.00`
`ML-DSA-87 keypair`	`607341` cycles	`604584` cycles	`1.00`
`ML-DSA-87 sign`	`1596680` cycles	`1595561` cycles	`1.00`
`ML-DSA-87 verify`	`619175` cycles	`618927` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 verify`	`241351` cycles	`234237` cycles	`1.03`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot · 2026-04-03T04:29:59Z

CBMC Results (ML-DSA-44)

Full Results (194 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1724s	1629s	+5.8%
`polyvecl_pointwise_acc_montgomery_c`	✅	256s	245s	+4%
`rej_uniform_native`	✅	128s	120s	+7%
`mld_invntt_layer`	✅	99s	88s	+12%
`poly_pointwise_montgomery_c`	✅	97s	93s	+4%
`mld_ct_memcmp`	✅	75s	73s	+3%
`mld_attempt_signature_generation`	✅	44s	45s	-2%
`mld_ntt_layer`	✅	42s	42s	+0%
`sign_verify_internal`	✅	33s	28s	+18%
`rej_uniform_native_x86_64`	✅	31s	-	new
`fqmul`	✅	30s	28s	+7%
`polyvec_matrix_expand`	✅	30s	28s	+7%
`sign_signature_internal`	✅	30s	28s	+7%
`keccakf1600x4_permute_native`	✅	24s	23s	+4%
`rej_uniform`	✅	19s	16s	+19%
`rej_uniform_c`	✅	19s	18s	+6%
`polyvecl_chknorm`	✅	17s	18s	-6%
`mld_ntt_butterfly_block`	✅	16s	15s	+7%
`poly_chknorm_c`	✅	15s	17s	-12%
`polyeta_unpack`	✅	15s	13s	+15%
`polyt0_unpack`	✅	15s	15s	+0%
`compute_pack_t0_t1`	✅	14s	13s	+8%
`mld_check_pct`	✅	14s	15s	-7%
`poly_uniform_4x`	✅	14s	11s	+27%
`polyvec_matrix_pointwise_montgomery_yvec`	✅	14s	15s	-7%
`polyz_unpack_c`	✅	13s	11s	+18%
`poly_add`	✅	12s	10s	+20%
`poly_uniform_eta_4x`	✅	12s	14s	-14%
`keccak_absorb_once_x4`	✅	10s	9s	+11%
`mld_compute_pack_z`	✅	10s	10s	+0%
`poly_invntt_tomont_c`	✅	10s	9s	+11%
`poly_power2round`	✅	9s	9s	+0%
`polyvec_matrix_expand_serial`	✅	9s	9s	+0%
`sign`	✅	9s	7s	+29%
`poly_decompose_c`	✅	8s	7s	+14%
`polyveck_decompose`	✅	8s	8s	+0%
`pointwise_acc_native_x86_64`	✅	7s	5s	+40%
`polyveck_invntt_tomont`	✅	7s	3s	+133%
`sign_verify_extmu`	✅	7s	6s	+17%
`keccakf1600_permute_native`	✅	6s	8s	-25%
`pointwise_acc_native_aarch64`	✅	6s	4s	+50%
`polyt0_pack`	✅	6s	7s	-14%
`sign_keypair_internal`	✅	6s	3s	+100%
`sign_signature_pre_hash_internal`	✅	6s	4s	+50%
`keccak_absorb`	✅	5s	6s	-17%
`keccakf1600_extract_bytes (big endian)`	✅	5s	1s	+400%
`keccakf1600_permute`	✅	5s	8s	-38%
`mld_ct_get_optblocker_i64`	✅	5s	2s	+150%
`mld_prepare_domain_separation_prefix`	✅	5s	6s	-17%
`ntt_native_aarch64`	✅	5s	3s	+67%
`pack_sig_c`	✅	5s	2s	+150%
`poly_challenge`	✅	5s	4s	+25%
`poly_invntt_tomont_native`	✅	5s	4s	+25%
`poly_permute_bitrev_to_custom_optional_native`	✅	5s	1s	+400%
`poly_use_hint_native_aarch64`	✅	5s	3s	+67%
`polyt1_pack`	✅	5s	2s	+150%
`polyveck_pack_eta`	✅	5s	3s	+67%
`polyvecl_pack_eta`	✅	5s	2s	+150%
`polyvecl_pointwise_acc_montgomery_native`	✅	5s	6s	-17%
`polyvecl_uniform_gamma1`	✅	5s	3s	+67%
`polyvecl_uniform_gamma1_serial`	✅	5s	3s	+67%
`shake256_squeeze`	✅	5s	2s	+150%
`sign_open`	✅	5s	3s	+67%
`sign_verify`	✅	5s	3s	+67%
`unpack_sk_t0hat`	✅	5s	4s	+25%
`decompose`	✅	4s	3s	+33%
`intt_native_x86_64`	✅	4s	3s	+33%
`keccak_f1600_x4_native_aarch64_v84a`	✅	4s	3s	+33%
`keccakf1600_xor_bytes (big endian)`	✅	4s	3s	+33%
`keccakf1600x4_xor_bytes`	✅	4s	2s	+100%
`mld_ct_cmask_neg_i32`	✅	4s	3s	+33%
`mld_ct_cmask_nonzero_u8`	✅	4s	4s	+0%
`mld_h`	✅	4s	4s	+0%
`mld_polymat_expand_entry`	✅	4s	2s	+100%
`mld_sample_s1_s2`	✅	4s	4s	+0%
`mld_value_barrier_u8`	✅	4s	3s	+33%
`pack_sk_rho_key_tr_s2`	✅	4s	3s	+33%
`poly_caddq_c`	✅	4s	2s	+100%
`poly_decompose_native`	✅	4s	7s	-43%
`poly_permute_bitrev_to_custom_optional`	✅	4s	3s	+33%
`poly_pointwise_montgomery`	✅	4s	2s	+100%
`poly_pointwise_montgomery_native`	✅	4s	1s	+300%
`poly_shiftl`	✅	4s	3s	+33%
`poly_sub`	✅	4s	3s	+33%
`poly_uniform`	✅	4s	4s	+0%
`poly_uniform_gamma1_4x`	✅	4s	4s	+0%
`poly_use_hint_c`	✅	4s	5s	-20%
`poly_use_hint_native`	✅	4s	2s	+100%
`polyt1_unpack`	✅	4s	2s	+100%
`polyveck_chknorm`	✅	4s	5s	-20%
`polyveck_ntt`	✅	4s	5s	-20%
`polyvecl_unpack_eta`	✅	4s	2s	+100%
`polyz_pack`	✅	4s	3s	+33%
`rej_eta`	✅	4s	4s	+0%
`rej_eta_c`	✅	4s	4s	+0%
`rej_eta_native`	✅	4s	5s	-20%
`sign_verify_pre_hash_internal`	✅	4s	4s	+0%
`sk_s1hat_get_poly`	✅	4s	6s	-33%
`yvec_init`	✅	4s	2s	+100%
`caddq`	✅	3s	3s	+0%
`intt_native_aarch64`	✅	3s	5s	-40%
`keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid`	✅	3s	2s	+50%
`keccak_f1600_x4_native_avx2`	✅	3s	2s	+50%
`keccak_finalize`	✅	3s	3s	+0%
`keccak_squeezeblocks_x4`	✅	3s	4s	-25%
`make_hint`	✅	3s	3s	+0%
`mld_ct_cmask_nonzero_u32`	✅	3s	2s	+50%
`mld_ct_get_optblocker_u8`	✅	3s	2s	+50%
`mld_ct_sel_int32`	✅	3s	2s	+50%
`montgomery_reduce`	✅	3s	2s	+50%
`ntt_native_x86_64`	✅	3s	3s	+0%
`nttunpack_native_x86_64`	✅	3s	3s	+0%
`pack_sig_h`	✅	3s	3s	+0%
`pack_sig_z`	✅	3s	2s	+50%
`pack_sk_s1`	✅	3s	4s	-25%
`pointwise_native_aarch64`	✅	3s	1s	+200%
`poly_caddq_native`	✅	3s	7s	-57%
`poly_caddq_native_aarch64`	✅	3s	3s	+0%
`poly_chknorm_native`	✅	3s	3s	+0%
`poly_chknorm_native_aarch64`	✅	3s	4s	-25%
`poly_decompose_32_native_aarch64`	✅	3s	3s	+0%
`poly_ntt`	✅	3s	4s	-25%
`poly_ntt_native`	✅	3s	3s	+0%
`poly_uniform_eta`	✅	3s	3s	+0%
`polyeta_pack`	✅	3s	1s	+200%
`polyvec_matrix_pointwise_montgomery_row`	✅	3s	2s	+50%
`polyveck_caddq`	✅	3s	4s	-25%
`polyveck_pack_w1`	✅	3s	4s	-25%
`polyvecl_ntt`	✅	3s	7s	-57%
`polyw1_pack`	✅	3s	4s	-25%
`polyz_unpack`	✅	3s	2s	+50%
`polyz_unpack_17_native_aarch64`	✅	3s	3s	+0%
`polyz_unpack_native`	✅	3s	2s	+50%
`shake128_absorb`	✅	3s	3s	+0%
`shake128_squeeze`	✅	3s	1s	+200%
`shake256_init`	✅	3s	3s	+0%
`sig_unpack_hints`	✅	3s	3s	+0%
`sign_keypair`	✅	3s	3s	+0%
`sign_pk_from_sk`	✅	3s	6s	-50%
`sign_signature`	✅	3s	3s	+0%
`sign_verify_pre_hash_shake256`	✅	3s	2s	+50%
`sk_s2hat_get_poly`	✅	3s	3s	+0%
`sys_check_capability`	✅	3s	3s	+0%
`unpack_sk_s1hat`	✅	3s	3s	+0%
`yvec_get_poly`	✅	3s	4s	-25%
`fqscale`	✅	2s	2s	+0%
`keccak_f1600_x1_native_aarch64`	✅	2s	4s	-50%
`keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid`	✅	2s	2s	+0%
`keccak_squeeze`	✅	2s	2s	+0%
`keccakf1600_xor_bytes`	✅	2s	2s	+0%
`keccakf1600x4_extract_bytes`	✅	2s	3s	-33%
`keccakf1600x4_permute`	✅	2s	2s	+0%
`mld_ct_get_optblocker_u32`	✅	2s	2s	+0%
`mld_keccakf1600_extract_bytes`	✅	2s	3s	-33%
`mld_sample_s1_s2_serial`	✅	2s	2s	+0%
`mld_value_barrier_i64`	✅	2s	3s	-33%
`mld_value_barrier_u32`	✅	2s	2s	+0%
`pointwise_native_x86_64`	✅	2s	3s	-33%
`poly_caddq`	✅	2s	1s	+100%
`poly_chknorm`	✅	2s	3s	-33%
`poly_decompose`	✅	2s	4s	-50%
`poly_decompose_88_native_aarch64`	✅	2s	5s	-60%
`poly_invntt_tomont`	✅	2s	4s	-50%
`poly_reduce`	✅	2s	3s	-33%
`poly_uniform_gamma1`	✅	2s	4s	-50%
`poly_use_hint`	✅	2s	3s	-33%
`polyveck_reduce`	✅	2s	2s	+0%
`polyveck_unpack_eta`	✅	2s	3s	-33%
`polyvecl_pointwise_acc_montgomery`	✅	2s	3s	-33%
`polyvecl_unpack_z`	✅	2s	3s	-33%
`polyz_unpack_19_native_aarch64`	✅	2s	4s	-50%
`power2round`	✅	2s	3s	-33%
`reduce32`	✅	2s	2s	+0%
`shake128_init`	✅	2s	3s	-33%
`shake128_release`	✅	2s	2s	+0%
`shake128x4_absorb_once`	✅	2s	5s	-60%
`shake256`	✅	2s	4s	-50%
`shake256_absorb`	✅	2s	4s	-50%
`shake256_finalize`	✅	2s	1s	+100%
`shake256_release`	✅	2s	2s	+0%
`shake256x4_absorb_once`	✅	2s	2s	+0%
`shake256x4_squeezeblocks`	✅	2s	3s	-33%
`sign_signature_extmu`	✅	2s	4s	-50%
`sign_signature_pre_hash_shake256`	✅	2s	5s	-60%
`sk_t0hat_get_poly`	✅	2s	3s	-33%
`unpack_sk`	✅	2s	4s	-50%
`unpack_sk_s2hat`	✅	2s	4s	-50%
`use_hint`	✅	2s	3s	-33%
`keccak_f1600_x1_native_aarch64_v84a`	✅	1s	2s	-50%
`keccak_init`	✅	1s	3s	-67%
`mld_ct_abs_i32`	✅	1s	3s	-67%
`poly_ntt_c`	✅	1s	5s	-80%
`shake128_finalize`	✅	1s	3s	-67%
`shake128x4_squeezeblocks`	✅	1s	4s	-75%
`unpack_pk_t1`	✅	1s	3s	-67%

oqs-bot · 2026-04-03T04:30:34Z

CBMC Results (ML-DSA-87)

Full Results (194 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	2140s	2059s	+3.9%
`polyvecl_pointwise_acc_montgomery_c`	✅	326s	289s	+13%
`polyvec_matrix_expand`	✅	196s	184s	+7%
`rej_uniform_native`	✅	132s	127s	+4%
`poly_pointwise_montgomery_c`	✅	106s	104s	+2%
`mld_invntt_layer`	✅	99s	98s	+1%
`mld_ct_memcmp`	✅	86s	83s	+4%
`sign_verify_internal`	✅	61s	63s	-3%
`mld_attempt_signature_generation`	✅	58s	62s	-6%
`sign_signature_internal`	✅	56s	57s	-2%
`mld_ntt_layer`	✅	48s	46s	+4%
`polyvec_matrix_expand_serial`	✅	40s	40s	+0%
`rej_uniform_native_x86_64`	✅	31s	-	new
`fqmul`	✅	30s	29s	+3%
`compute_pack_t0_t1`	✅	29s	28s	+4%
`keccakf1600x4_permute_native`	✅	23s	23s	+0%
`polyvec_matrix_pointwise_montgomery_yvec`	✅	23s	23s	+0%
`rej_uniform_c`	✅	19s	18s	+6%
`mld_check_pct`	✅	17s	17s	+0%
`rej_uniform`	✅	17s	16s	+6%
`polyt0_unpack`	✅	16s	16s	+0%
`mld_ntt_butterfly_block`	✅	15s	18s	-17%
`poly_chknorm_c`	✅	15s	15s	+0%
`poly_uniform_eta_4x`	✅	14s	13s	+8%
`poly_add`	✅	12s	12s	+0%
`poly_invntt_tomont_c`	✅	12s	9s	+33%
`poly_uniform_4x`	✅	12s	11s	+9%
`polyveck_decompose`	✅	12s	11s	+9%
`polyeta_unpack`	✅	11s	10s	+10%
`polyveck_ntt`	✅	11s	10s	+10%
`keccak_absorb_once_x4`	✅	10s	9s	+11%
`polyveck_invntt_tomont`	✅	10s	9s	+11%
`pointwise_acc_native_x86_64`	✅	9s	6s	+50%
`sign`	✅	9s	7s	+29%
`keccakf1600_permute_native`	✅	8s	7s	+14%
`mld_compute_pack_z`	✅	8s	9s	-11%
`poly_power2round`	✅	8s	8s	+0%
`polyveck_caddq`	✅	8s	7s	+14%
`polyz_unpack_c`	✅	8s	7s	+14%
`keccakf1600_permute`	✅	7s	7s	+0%
`pointwise_acc_native_aarch64`	✅	7s	7s	+0%
`sign_pk_from_sk`	✅	7s	6s	+17%
`sign_verify`	✅	7s	5s	+40%
`keccak_absorb`	✅	6s	6s	+0%
`keccak_squeezeblocks_x4`	✅	6s	5s	+20%
`mld_sample_s1_s2`	✅	6s	6s	+0%
`poly_caddq`	✅	6s	2s	+200%
`poly_uniform`	✅	6s	2s	+200%
`polyvecl_chknorm`	✅	6s	4s	+50%
`polyvecl_ntt`	✅	6s	5s	+20%
`unpack_sk_t0hat`	✅	6s	5s	+20%
`keccak_finalize`	✅	5s	3s	+67%
`keccakf1600_extract_bytes (big endian)`	✅	5s	3s	+67%
`keccakf1600_xor_bytes`	✅	5s	4s	+25%
`mld_ct_abs_i32`	✅	5s	2s	+150%
`poly_caddq_c`	✅	5s	3s	+67%
`poly_caddq_native`	✅	5s	3s	+67%
`poly_chknorm_native`	✅	5s	2s	+150%
`poly_decompose_32_native_aarch64`	✅	5s	3s	+67%
`poly_invntt_tomont_native`	✅	5s	3s	+67%
`poly_shiftl`	✅	5s	5s	+0%
`poly_use_hint_c`	✅	5s	5s	+0%
`polyt0_pack`	✅	5s	5s	+0%
`polyvec_matrix_pointwise_montgomery_row`	✅	5s	2s	+150%
`polyveck_chknorm`	✅	5s	5s	+0%
`polyveck_pack_eta`	✅	5s	5s	+0%
`polyveck_unpack_eta`	✅	5s	2s	+150%
`polyvecl_uniform_gamma1`	✅	5s	4s	+25%
`polyz_unpack`	✅	5s	3s	+67%
`sign_signature`	✅	5s	6s	-17%
`sign_verify_extmu`	✅	5s	4s	+25%
`caddq`	✅	4s	2s	+100%
`intt_native_aarch64`	✅	4s	2s	+100%
`keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid`	✅	4s	2s	+100%
`keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid`	✅	4s	3s	+33%
`keccak_init`	✅	4s	2s	+100%
`keccakf1600x4_xor_bytes`	✅	4s	3s	+33%
`mld_ct_get_optblocker_i64`	✅	4s	1s	+300%
`mld_prepare_domain_separation_prefix`	✅	4s	6s	-33%
`mld_sample_s1_s2_serial`	✅	4s	7s	-43%
`montgomery_reduce`	✅	4s	4s	+0%
`ntt_native_x86_64`	✅	4s	3s	+33%
`nttunpack_native_x86_64`	✅	4s	3s	+33%
`pack_sig_z`	✅	4s	3s	+33%
`pointwise_native_x86_64`	✅	4s	1s	+300%
`poly_challenge`	✅	4s	6s	-33%
`poly_chknorm_native_aarch64`	✅	4s	3s	+33%
`poly_decompose_88_native_aarch64`	✅	4s	3s	+33%
`poly_decompose_c`	✅	4s	3s	+33%
`poly_decompose_native`	✅	4s	4s	+0%
`poly_ntt_native`	✅	4s	4s	+0%
`poly_pointwise_montgomery_native`	✅	4s	2s	+100%
`poly_sub`	✅	4s	5s	-20%
`poly_uniform_eta`	✅	4s	7s	-43%
`poly_use_hint_native`	✅	4s	5s	-20%
`poly_use_hint_native_aarch64`	✅	4s	2s	+100%
`polyveck_pack_w1`	✅	4s	3s	+33%
`polyvecl_unpack_eta`	✅	4s	4s	+0%
`polyz_unpack_native`	✅	4s	4s	+0%
`rej_eta_c`	✅	4s	5s	-20%
`rej_eta_native`	✅	4s	5s	-20%
`shake128_absorb`	✅	4s	1s	+300%
`shake256`	✅	4s	2s	+100%
`shake256_init`	✅	4s	2s	+100%
`shake256_release`	✅	4s	2s	+100%
`shake256x4_absorb_once`	✅	4s	2s	+100%
`sign_keypair_internal`	✅	4s	11s	-64%
`sign_open`	✅	4s	3s	+33%
`sign_signature_extmu`	✅	4s	4s	+0%
`sign_signature_pre_hash_internal`	✅	4s	5s	-20%
`sign_signature_pre_hash_shake256`	✅	4s	4s	+0%
`decompose`	✅	3s	3s	+0%
`intt_native_x86_64`	✅	3s	4s	-25%
`keccak_f1600_x1_native_aarch64_v84a`	✅	3s	3s	+0%
`keccak_f1600_x4_native_aarch64_v84a`	✅	3s	2s	+50%
`keccakf1600_xor_bytes (big endian)`	✅	3s	3s	+0%
`keccakf1600x4_permute`	✅	3s	2s	+50%
`mld_ct_cmask_neg_i32`	✅	3s	4s	-25%
`mld_ct_cmask_nonzero_u32`	✅	3s	4s	-25%
`mld_ct_get_optblocker_u8`	✅	3s	3s	+0%
`mld_h`	✅	3s	4s	-25%
`mld_keccakf1600_extract_bytes`	✅	3s	1s	+200%
`mld_polymat_expand_entry`	✅	3s	4s	-25%
`pack_sig_c`	✅	3s	3s	+0%
`pack_sk_rho_key_tr_s2`	✅	3s	3s	+0%
`pack_sk_s1`	✅	3s	4s	-25%
`pointwise_native_aarch64`	✅	3s	3s	+0%
`poly_decompose`	✅	3s	2s	+50%
`poly_ntt`	✅	3s	5s	-40%
`poly_reduce`	✅	3s	3s	+0%
`poly_uniform_gamma1`	✅	3s	4s	-25%
`poly_uniform_gamma1_4x`	✅	3s	5s	-40%
`polyeta_pack`	✅	3s	4s	-25%
`polyt1_pack`	✅	3s	6s	-50%
`polyt1_unpack`	✅	3s	4s	-25%
`polyvecl_pointwise_acc_montgomery_native`	✅	3s	5s	-40%
`polyw1_pack`	✅	3s	3s	+0%
`polyz_unpack_19_native_aarch64`	✅	3s	4s	-25%
`reduce32`	✅	3s	3s	+0%
`rej_eta`	✅	3s	3s	+0%
`shake128_finalize`	✅	3s	3s	+0%
`sig_unpack_hints`	✅	3s	3s	+0%
`sign_keypair`	✅	3s	9s	-67%
`sk_s1hat_get_poly`	✅	3s	2s	+50%
`sk_s2hat_get_poly`	✅	3s	4s	-25%
`sk_t0hat_get_poly`	✅	3s	2s	+50%
`unpack_pk_t1`	✅	3s	3s	+0%
`unpack_sk`	✅	3s	5s	-40%
`unpack_sk_s2hat`	✅	3s	3s	+0%
`use_hint`	✅	3s	4s	-25%
`yvec_get_poly`	✅	3s	3s	+0%
`yvec_init`	✅	3s	4s	-25%
`fqscale`	✅	2s	2s	+0%
`keccak_squeeze`	✅	2s	2s	+0%
`keccakf1600x4_extract_bytes`	✅	2s	4s	-50%
`make_hint`	✅	2s	4s	-50%
`mld_ct_cmask_nonzero_u8`	✅	2s	3s	-33%
`mld_ct_get_optblocker_u32`	✅	2s	1s	+100%
`mld_value_barrier_i64`	✅	2s	3s	-33%
`mld_value_barrier_u32`	✅	2s	1s	+100%
`mld_value_barrier_u8`	✅	2s	4s	-50%
`ntt_native_aarch64`	✅	2s	5s	-60%
`pack_sig_h`	✅	2s	4s	-50%
`poly_caddq_native_aarch64`	✅	2s	6s	-67%
`poly_invntt_tomont`	✅	2s	6s	-67%
`poly_ntt_c`	✅	2s	3s	-33%
`poly_permute_bitrev_to_custom_optional`	✅	2s	3s	-33%
`poly_permute_bitrev_to_custom_optional_native`	✅	2s	3s	-33%
`poly_use_hint`	✅	2s	3s	-33%
`polyveck_reduce`	✅	2s	2s	+0%
`polyvecl_pack_eta`	✅	2s	4s	-50%
`polyvecl_pointwise_acc_montgomery`	✅	2s	4s	-50%
`polyvecl_uniform_gamma1_serial`	✅	2s	2s	+0%
`polyvecl_unpack_z`	✅	2s	1s	+100%
`polyz_pack`	✅	2s	5s	-60%
`polyz_unpack_17_native_aarch64`	✅	2s	3s	-33%
`shake128_init`	✅	2s	1s	+100%
`shake128x4_absorb_once`	✅	2s	3s	-33%
`shake256_finalize`	✅	2s	3s	-33%
`shake256_squeeze`	✅	2s	2s	+0%
`shake256x4_squeezeblocks`	✅	2s	3s	-33%
`sign_verify_pre_hash_internal`	✅	2s	3s	-33%
`sign_verify_pre_hash_shake256`	✅	2s	8s	-75%
`sys_check_capability`	✅	2s	5s	-60%
`unpack_sk_s1hat`	✅	2s	3s	-33%
`keccak_f1600_x1_native_aarch64`	✅	1s	3s	-67%
`keccak_f1600_x4_native_avx2`	✅	1s	3s	-67%
`mld_ct_sel_int32`	✅	1s	2s	-50%
`poly_chknorm`	✅	1s	4s	-75%
`poly_pointwise_montgomery`	✅	1s	1s	+0%
`power2round`	✅	1s	2s	-50%
`shake128_release`	✅	1s	2s	-50%
`shake128_squeeze`	✅	1s	2s	-50%
`shake128x4_squeezeblocks`	✅	1s	5s	-80%
`shake256_absorb`	✅	1s	4s	-75%

oqs-bot · 2026-04-03T04:31:33Z

CBMC Results (ML-DSA-65)

Full Results (194 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1800s	2033s	-11.5%
`polyvecl_pointwise_acc_montgomery_c`	✅	257s	332s	-23%
`polyvec_matrix_expand`	✅	142s	155s	-8%
`rej_uniform_native`	✅	120s	134s	-10%
`mld_invntt_layer`	✅	92s	100s	-8%
`poly_pointwise_montgomery_c`	✅	89s	111s	-20%
`mld_ct_memcmp`	✅	73s	87s	-16%
`sign_verify_internal`	✅	53s	56s	-5%
`sign_signature_internal`	✅	47s	49s	-4%
`mld_ntt_layer`	✅	39s	47s	-17%
`mld_attempt_signature_generation`	✅	38s	41s	-7%
`fqmul`	✅	29s	30s	-3%
`rej_uniform_native_x86_64`	✅	28s	-	new
`polyvec_matrix_pointwise_montgomery_yvec`	✅	27s	30s	-10%
`keccakf1600x4_permute_native`	✅	23s	25s	-8%
`polyvec_matrix_expand_serial`	✅	23s	25s	-8%
`polyt0_unpack`	✅	16s	16s	+0%
`rej_uniform_c`	✅	16s	19s	-16%
`mld_ntt_butterfly_block`	✅	15s	17s	-12%
`poly_chknorm_c`	✅	15s	17s	-12%
`rej_uniform`	✅	15s	19s	-21%
`poly_uniform_eta_4x`	✅	13s	13s	+0%
`polyveck_decompose`	✅	13s	16s	-19%
`compute_pack_t0_t1`	✅	12s	17s	-29%
`poly_uniform_4x`	✅	12s	14s	-14%
`poly_add`	✅	11s	12s	-8%
`mld_check_pct`	✅	10s	12s	-17%
`sign`	✅	10s	8s	+25%
`keccak_absorb_once_x4`	✅	9s	11s	-18%
`keccakf1600_permute_native`	✅	8s	9s	-11%
`pointwise_acc_native_x86_64`	✅	8s	5s	+60%
`poly_power2round`	✅	8s	10s	-20%
`polyveck_caddq`	✅	8s	7s	+14%
`polyveck_chknorm`	✅	8s	9s	-11%
`polyvecl_ntt`	✅	8s	5s	+60%
`mld_compute_pack_z`	✅	7s	7s	+0%
`pointwise_acc_native_aarch64`	✅	7s	7s	+0%
`poly_invntt_tomont_c`	✅	7s	9s	-22%
`polyveck_ntt`	✅	7s	11s	-36%
`intt_native_aarch64`	✅	6s	2s	+200%
`keccak_absorb`	✅	6s	6s	+0%
`keccakf1600_permute`	✅	6s	9s	-33%
`poly_caddq_c`	✅	6s	5s	+20%
`poly_decompose_c`	✅	6s	9s	-33%
`polyveck_invntt_tomont`	✅	6s	8s	-25%
`sign_open`	✅	6s	3s	+100%
`sign_pk_from_sk`	✅	6s	5s	+20%
`keccak_squeezeblocks_x4`	✅	5s	5s	+0%
`mld_sample_s1_s2_serial`	✅	5s	5s	+0%
`mld_value_barrier_i64`	✅	5s	3s	+67%
`poly_challenge`	✅	5s	6s	-17%
`poly_shiftl`	✅	5s	3s	+67%
`poly_uniform_gamma1_4x`	✅	5s	4s	+25%
`sign_keypair_internal`	✅	5s	4s	+25%
`sign_verify`	✅	5s	4s	+25%
`yvec_init`	✅	5s	3s	+67%
`keccak_f1600_x1_native_aarch64`	✅	4s	2s	+100%
`keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid`	✅	4s	2s	+100%
`keccak_squeeze`	✅	4s	4s	+0%
`mld_ct_get_optblocker_u32`	✅	4s	2s	+100%
`mld_ct_get_optblocker_u8`	✅	4s	6s	-33%
`mld_keccakf1600_extract_bytes`	✅	4s	2s	+100%
`mld_sample_s1_s2`	✅	4s	4s	+0%
`mld_value_barrier_u32`	✅	4s	2s	+100%
`mld_value_barrier_u8`	✅	4s	3s	+33%
`montgomery_reduce`	✅	4s	3s	+33%
`ntt_native_aarch64`	✅	4s	2s	+100%
`nttunpack_native_x86_64`	✅	4s	1s	+300%
`pack_sk_rho_key_tr_s2`	✅	4s	2s	+100%
`pack_sk_s1`	✅	4s	3s	+33%
`pointwise_native_aarch64`	✅	4s	3s	+33%
`pointwise_native_x86_64`	✅	4s	4s	+0%
`poly_caddq_native_aarch64`	✅	4s	3s	+33%
`poly_chknorm_native`	✅	4s	5s	-20%
`poly_decompose_88_native_aarch64`	✅	4s	4s	+0%
`poly_permute_bitrev_to_custom_optional`	✅	4s	1s	+300%
`poly_uniform`	✅	4s	5s	-20%
`poly_uniform_eta`	✅	4s	6s	-33%
`poly_use_hint_c`	✅	4s	2s	+100%
`poly_use_hint_native_aarch64`	✅	4s	3s	+33%
`polyeta_unpack`	✅	4s	3s	+33%
`polyvec_matrix_pointwise_montgomery_row`	✅	4s	4s	+0%
`polyvecl_chknorm`	✅	4s	4s	+0%
`polyz_unpack_17_native_aarch64`	✅	4s	5s	-20%
`polyz_unpack_c`	✅	4s	5s	-20%
`rej_eta_c`	✅	4s	5s	-20%
`shake128_finalize`	✅	4s	5s	-20%
`shake256_absorb`	✅	4s	3s	+33%
`shake256_init`	✅	4s	2s	+100%
`sign_keypair`	✅	4s	4s	+0%
`sign_signature_extmu`	✅	4s	4s	+0%
`sign_signature_pre_hash_internal`	✅	4s	3s	+33%
`sign_verify_pre_hash_shake256`	✅	4s	5s	-20%
`sk_t0hat_get_poly`	✅	4s	3s	+33%
`unpack_sk_t0hat`	✅	4s	4s	+0%
`use_hint`	✅	4s	3s	+33%
`caddq`	✅	3s	2s	+50%
`fqscale`	✅	3s	3s	+0%
`keccak_f1600_x4_native_aarch64_v84a`	✅	3s	3s	+0%
`keccak_init`	✅	3s	2s	+50%
`mld_ct_cmask_nonzero_u32`	✅	3s	4s	-25%
`mld_polymat_expand_entry`	✅	3s	2s	+50%
`mld_prepare_domain_separation_prefix`	✅	3s	4s	-25%
`pack_sig_c`	✅	3s	1s	+200%
`pack_sig_z`	✅	3s	3s	+0%
`poly_caddq`	✅	3s	4s	-25%
`poly_decompose`	✅	3s	4s	-25%
`poly_decompose_32_native_aarch64`	✅	3s	3s	+0%
`poly_invntt_tomont`	✅	3s	3s	+0%
`poly_ntt_c`	✅	3s	4s	-25%
`poly_permute_bitrev_to_custom_optional_native`	✅	3s	4s	-25%
`poly_pointwise_montgomery_native`	✅	3s	4s	-25%
`poly_reduce`	✅	3s	3s	+0%
`polyeta_pack`	✅	3s	4s	-25%
`polyt0_pack`	✅	3s	6s	-50%
`polyt1_unpack`	✅	3s	3s	+0%
`polyveck_pack_w1`	✅	3s	3s	+0%
`polyveck_reduce`	✅	3s	3s	+0%
`polyvecl_pack_eta`	✅	3s	5s	-40%
`polyvecl_pointwise_acc_montgomery`	✅	3s	3s	+0%
`polyvecl_pointwise_acc_montgomery_native`	✅	3s	3s	+0%
`polyvecl_uniform_gamma1`	✅	3s	5s	-40%
`polyvecl_unpack_eta`	✅	3s	6s	-50%
`polyvecl_unpack_z`	✅	3s	3s	+0%
`polyz_unpack_19_native_aarch64`	✅	3s	2s	+50%
`reduce32`	✅	3s	3s	+0%
`rej_eta`	✅	3s	3s	+0%
`rej_eta_native`	✅	3s	4s	-25%
`shake128_absorb`	✅	3s	2s	+50%
`shake128_release`	✅	3s	2s	+50%
`shake128x4_squeezeblocks`	✅	3s	4s	-25%
`shake256x4_absorb_once`	✅	3s	3s	+0%
`sign_signature`	✅	3s	2s	+50%
`sign_verify_extmu`	✅	3s	3s	+0%
`sign_verify_pre_hash_internal`	✅	3s	4s	-25%
`sk_s1hat_get_poly`	✅	3s	5s	-40%
`sk_s2hat_get_poly`	✅	3s	5s	-40%
`unpack_pk_t1`	✅	3s	6s	-50%
`intt_native_x86_64`	✅	2s	4s	-50%
`keccak_f1600_x1_native_aarch64_v84a`	✅	2s	4s	-50%
`keccakf1600_xor_bytes`	✅	2s	2s	+0%
`keccakf1600_xor_bytes (big endian)`	✅	2s	5s	-60%
`keccakf1600x4_xor_bytes`	✅	2s	4s	-50%
`mld_ct_cmask_neg_i32`	✅	2s	4s	-50%
`mld_ct_cmask_nonzero_u8`	✅	2s	2s	+0%
`mld_ct_get_optblocker_i64`	✅	2s	4s	-50%
`mld_ct_sel_int32`	✅	2s	4s	-50%
`mld_h`	✅	2s	2s	+0%
`ntt_native_x86_64`	✅	2s	3s	-33%
`pack_sig_h`	✅	2s	2s	+0%
`poly_caddq_native`	✅	2s	3s	-33%
`poly_chknorm`	✅	2s	2s	+0%
`poly_chknorm_native_aarch64`	✅	2s	3s	-33%
`poly_decompose_native`	✅	2s	5s	-60%
`poly_ntt`	✅	2s	2s	+0%
`poly_pointwise_montgomery`	✅	2s	4s	-50%
`poly_sub`	✅	2s	3s	-33%
`poly_uniform_gamma1`	✅	2s	4s	-50%
`poly_use_hint`	✅	2s	4s	-50%
`poly_use_hint_native`	✅	2s	4s	-50%
`polyveck_pack_eta`	✅	2s	3s	-33%
`polyveck_unpack_eta`	✅	2s	5s	-60%
`polyvecl_uniform_gamma1_serial`	✅	2s	2s	+0%
`polyw1_pack`	✅	2s	3s	-33%
`polyz_pack`	✅	2s	3s	-33%
`polyz_unpack`	✅	2s	2s	+0%
`power2round`	✅	2s	3s	-33%
`shake128_init`	✅	2s	2s	+0%
`shake256`	✅	2s	3s	-33%
`shake256_finalize`	✅	2s	3s	-33%
`shake256_release`	✅	2s	3s	-33%
`shake256_squeeze`	✅	2s	1s	+100%
`shake256x4_squeezeblocks`	✅	2s	4s	-50%
`sig_unpack_hints`	✅	2s	2s	+0%
`sign_signature_pre_hash_shake256`	✅	2s	5s	-60%
`sys_check_capability`	✅	2s	2s	+0%
`unpack_sk`	✅	2s	5s	-60%
`unpack_sk_s1hat`	✅	2s	3s	-33%
`unpack_sk_s2hat`	✅	2s	2s	+0%
`decompose`	✅	1s	4s	-75%
`keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid`	✅	1s	2s	-50%
`keccak_f1600_x4_native_avx2`	✅	1s	3s	-67%
`keccak_finalize`	✅	1s	2s	-50%
`keccakf1600_extract_bytes (big endian)`	✅	1s	2s	-50%
`keccakf1600x4_extract_bytes`	✅	1s	3s	-67%
`keccakf1600x4_permute`	✅	1s	2s	-50%
`make_hint`	✅	1s	4s	-75%
`mld_ct_abs_i32`	✅	1s	1s	+0%
`poly_invntt_tomont_native`	✅	1s	2s	-50%
`poly_ntt_native`	✅	1s	5s	-80%
`polyt1_pack`	✅	1s	4s	-75%
`polyz_unpack_native`	✅	1s	4s	-75%
`shake128_squeeze`	✅	1s	3s	-67%
`shake128x4_absorb_once`	✅	1s	1s	+0%
`yvec_get_poly`	✅	1s	4s	-75%

oqs-bot

Intel Xeon 4th gen (c7i)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`34764` cycles	`34374` cycles	`1.01`
`ML-DSA-44 sign`	`120113` cycles	`120132` cycles	`1.00`
`ML-DSA-44 verify`	`38092` cycles	`38166` cycles	`1.00`
`ML-DSA-65 keypair`	`61138` cycles	`60500` cycles	`1.01`
`ML-DSA-65 sign`	`201844` cycles	`199945` cycles	`1.01`
`ML-DSA-65 verify`	`62783` cycles	`62429` cycles	`1.01`
`ML-DSA-87 keypair`	`93501` cycles	`94486` cycles	`0.99`
`ML-DSA-87 sign`	`236815` cycles	`239500` cycles	`0.99`
`ML-DSA-87 verify`	`95619` cycles	`96894` cycles	`0.99`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

Intel Xeon 4th gen (c7i) (no-opt)

Details

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`93930` cycles	`93842` cycles	`1.00`
`ML-DSA-44 sign`	`333310` cycles	`333119` cycles	`1.00`
`ML-DSA-44 verify`	`100022` cycles	`100025` cycles	`1.00`
`ML-DSA-65 keypair`	`159902` cycles	`160115` cycles	`1.00`
`ML-DSA-65 sign`	`543114` cycles	`543227` cycles	`1.00`
`ML-DSA-65 verify`	`160989` cycles	`161060` cycles	`1.00`
`ML-DSA-87 keypair`	`266666` cycles	`266874` cycles	`1.00`
`ML-DSA-87 sign`	`704974` cycles	`706010` cycles	`1.00`
`ML-DSA-87 verify`	`270510` cycles	`269779` cycles	`1.00`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite	Current: `6539a79`	Previous: `9ee2f35`	Ratio
`ML-DSA-44 keypair`	`42142` cycles	`40662` cycles	`1.04`

This comment was automatically generated by workflow using github-action-benchmark.

oqs-bot · 2026-05-06T01:17:52Z

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (194 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1534s	1504s	+2.0%
`poly_pointwise_montgomery_c`	✅	168s	165s	+2%
`polyvec_matrix_pointwise_montgomery_yvec`	✅	154s	149s	+3%
`rej_uniform_native`	✅	109s	105s	+4%
`mld_invntt_layer`	✅	105s	101s	+4%
`mld_ct_memcmp`	✅	74s	73s	+1%
`rej_uniform_native_x86_64`	✅	58s	-	new
`mld_ntt_layer`	✅	43s	41s	+5%
`fqmul`	✅	28s	26s	+8%
`mld_attempt_signature_generation`	✅	23s	26s	-12%
`keccakf1600x4_permute_native`	✅	21s	23s	-9%
`rej_uniform`	✅	20s	20s	+0%
`polyvecl_chknorm`	✅	19s	20s	-5%
`rej_uniform_c`	✅	19s	17s	+12%
`sign_verify_internal`	✅	18s	17s	+6%
`mld_ntt_butterfly_block`	✅	17s	15s	+13%
`mld_check_pct`	✅	16s	14s	+14%
`poly_chknorm_c`	✅	14s	14s	+0%
`poly_add`	✅	12s	11s	+9%
`poly_uniform_eta_4x`	✅	12s	12s	+0%
`polyveck_decompose`	✅	12s	13s	-8%
`polyt0_unpack`	✅	11s	13s	-15%
`keccak_absorb_once_x4`	✅	10s	9s	+11%
`keccakf1600_permute_native`	✅	9s	7s	+29%
`poly_caddq_c`	✅	9s	8s	+12%
`poly_invntt_tomont_c`	✅	9s	11s	-18%
`polyvec_matrix_pointwise_montgomery_row`	✅	9s	8s	+12%
`compute_pack_t0_t1`	✅	8s	10s	-20%
`polyveck_caddq`	✅	8s	7s	+14%
`keccakf1600_permute`	✅	7s	7s	+0%
`poly_power2round`	✅	7s	9s	-22%
`polyveck_reduce`	✅	7s	6s	+17%
`polyvecl_ntt`	✅	7s	11s	-36%
`sign`	✅	7s	8s	-12%
`sign_pk_from_sk`	✅	7s	6s	+17%
`sign_verify_extmu`	✅	7s	4s	+75%
`caddq`	✅	6s	3s	+100%
`keccak_absorb`	✅	6s	6s	+0%
`mld_compute_pack_z`	✅	6s	6s	+0%
`pointwise_acc_native_aarch64`	✅	6s	8s	-25%
`poly_shiftl`	✅	6s	6s	+0%
`poly_uniform`	✅	6s	4s	+50%
`polyveck_invntt_tomont`	✅	6s	6s	+0%
`polyz_unpack_c`	✅	6s	6s	+0%
`intt_native_aarch64`	✅	5s	3s	+67%
`keccak_finalize`	✅	5s	4s	+25%
`mld_sample_s1_s2_serial`	✅	5s	5s	+0%
`pack_sig_h`	✅	5s	3s	+67%
`pointwise_acc_native_x86_64`	✅	5s	6s	-17%
`poly_decompose`	✅	5s	3s	+67%
`poly_decompose_c`	✅	5s	3s	+67%
`polyvecl_pointwise_acc_montgomery`	✅	5s	4s	+25%
`keccak_f1600_x4_native_avx2`	✅	4s	6s	-33%
`keccak_squeezeblocks_x4`	✅	4s	4s	+0%
`make_hint`	✅	4s	4s	+0%
`ntt_native_x86_64`	✅	4s	5s	-20%
`poly_caddq_native_aarch64`	✅	4s	3s	+33%
`poly_challenge`	✅	4s	6s	-33%
`poly_permute_bitrev_to_custom_optional`	✅	4s	3s	+33%
`poly_uniform_gamma1`	✅	4s	3s	+33%
`poly_uniform_gamma1_4x`	✅	4s	2s	+100%
`poly_use_hint_c`	✅	4s	2s	+100%
`polyt1_unpack`	✅	4s	4s	+0%
`polyvecl_pointwise_acc_montgomery_native`	✅	4s	2s	+100%
`polyz_unpack`	✅	4s	4s	+0%
`rej_eta_c`	✅	4s	3s	+33%
`rej_eta_native`	✅	4s	4s	+0%
`shake128_absorb`	✅	4s	2s	+100%
`shake256`	✅	4s	2s	+100%
`shake256x4_absorb_once`	✅	4s	4s	+0%
`sign_keypair_internal`	✅	4s	3s	+33%
`sign_open`	✅	4s	5s	-20%
`sign_signature`	✅	4s	5s	-20%
`sign_verify`	✅	4s	5s	-20%
`sk_s2hat_get_poly`	✅	4s	3s	+33%
`unpack_sk`	✅	4s	2s	+100%
`use_hint`	✅	4s	5s	-20%
`fqscale`	✅	3s	6s	-50%
`keccak_squeeze`	✅	3s	3s	+0%
`keccakf1600_extract_bytes (big endian)`	✅	3s	4s	-25%
`keccakf1600_xor_bytes`	✅	3s	2s	+50%
`keccakf1600x4_permute`	✅	3s	2s	+50%
`keccakf1600x4_xor_bytes`	✅	3s	2s	+50%
`mld_ct_cmask_nonzero_u32`	✅	3s	1s	+200%
`mld_ct_cmask_nonzero_u8`	✅	3s	3s	+0%
`mld_prepare_domain_separation_prefix`	✅	3s	2s	+50%
`mld_sample_s1_s2`	✅	3s	4s	-25%
`mld_value_barrier_i64`	✅	3s	2s	+50%
`pack_sig_c`	✅	3s	4s	-25%
`pack_sig_z`	✅	3s	2s	+50%
`pack_sk_rho_key_tr_s2`	✅	3s	4s	-25%
`pointwise_native_aarch64`	✅	3s	5s	-40%
`pointwise_native_x86_64`	✅	3s	3s	+0%
`poly_caddq`	✅	3s	4s	-25%
`poly_chknorm_native_aarch64`	✅	3s	2s	+50%
`poly_decompose_32_native_aarch64`	✅	3s	4s	-25%
`poly_invntt_tomont`	✅	3s	4s	-25%
`poly_ntt_native`	✅	3s	2s	+50%
`poly_permute_bitrev_to_custom_optional_native`	✅	3s	2s	+50%
`poly_uniform_eta`	✅	3s	2s	+50%
`poly_use_hint_native`	✅	3s	4s	-25%
`poly_use_hint_native_aarch64`	✅	3s	5s	-40%
`polyeta_pack`	✅	3s	2s	+50%
`polyeta_unpack`	✅	3s	4s	-25%
`polyt0_pack`	✅	3s	2s	+50%
`polyt1_pack`	✅	3s	3s	+0%
`polyvec_matrix_expand_serial`	✅	3s	4s	-25%
`polyveck_chknorm`	✅	3s	3s	+0%
`polyveck_ntt`	✅	3s	2s	+50%
`polyveck_pack_eta`	✅	3s	2s	+50%
`polyvecl_pack_eta`	✅	3s	4s	-25%
`polyvecl_pointwise_acc_montgomery_c`	✅	3s	3s	+0%
`polyvecl_uniform_gamma1_serial`	✅	3s	2s	+50%
`polyvecl_unpack_eta`	✅	3s	3s	+0%
`reduce32`	✅	3s	1s	+200%
`shake128_init`	✅	3s	3s	+0%
`shake128x4_absorb_once`	✅	3s	2s	+50%
`shake256_release`	✅	3s	2s	+50%
`shake256_squeeze`	✅	3s	2s	+50%
`shake256x4_squeezeblocks`	✅	3s	2s	+50%
`sign_keypair`	✅	3s	4s	-25%
`sign_signature_extmu`	✅	3s	3s	+0%
`sign_signature_internal`	✅	3s	4s	-25%
`sign_signature_pre_hash_internal`	✅	3s	4s	-25%
`sign_signature_pre_hash_shake256`	✅	3s	3s	+0%
`sign_verify_pre_hash_internal`	✅	3s	5s	-40%
`sign_verify_pre_hash_shake256`	✅	3s	5s	-40%
`sk_s1hat_get_poly`	✅	3s	3s	+0%
`unpack_sk_s1hat`	✅	3s	2s	+50%
`unpack_sk_s2hat`	✅	3s	3s	+0%
`unpack_sk_t0hat`	✅	3s	2s	+50%
`decompose`	✅	2s	4s	-50%
`intt_native_x86_64`	✅	2s	3s	-33%
`keccak_f1600_x1_native_aarch64`	✅	2s	2s	+0%
`keccak_f1600_x1_native_aarch64_v84a`	✅	2s	1s	+100%
`keccak_f1600_x4_native_aarch64_v84a`	✅	2s	3s	-33%
`keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid`	✅	2s	2s	+0%
`keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid`	✅	2s	2s	+0%
`keccak_init`	✅	2s	3s	-33%
`keccakf1600_xor_bytes (big endian)`	✅	2s	1s	+100%
`keccakf1600x4_extract_bytes`	✅	2s	2s	+0%
`mld_ct_cmask_neg_i32`	✅	2s	3s	-33%
`mld_ct_get_optblocker_i64`	✅	2s	1s	+100%
`mld_ct_get_optblocker_u32`	✅	2s	2s	+0%
`mld_ct_get_optblocker_u8`	✅	2s	1s	+100%
`mld_h`	✅	2s	5s	-60%
`mld_value_barrier_u32`	✅	2s	3s	-33%
`mld_value_barrier_u8`	✅	2s	3s	-33%
`montgomery_reduce`	✅	2s	4s	-50%
`ntt_native_aarch64`	✅	2s	3s	-33%
`pack_sk_s1`	✅	2s	2s	+0%
`poly_caddq_native`	✅	2s	3s	-33%
`poly_chknorm`	✅	2s	2s	+0%
`poly_chknorm_native`	✅	2s	3s	-33%
`poly_decompose_88_native_aarch64`	✅	2s	6s	-67%
`poly_decompose_native`	✅	2s	3s	-33%
`poly_invntt_tomont_native`	✅	2s	3s	-33%
`poly_ntt`	✅	2s	3s	-33%
`poly_ntt_c`	✅	2s	1s	+100%
`poly_pointwise_montgomery`	✅	2s	3s	-33%
`poly_pointwise_montgomery_native`	✅	2s	3s	-33%
`poly_reduce`	✅	2s	4s	-50%
`poly_sub`	✅	2s	4s	-50%
`poly_uniform_4x`	✅	2s	3s	-33%
`poly_use_hint`	✅	2s	4s	-50%
`polyvec_matrix_expand`	✅	2s	3s	-33%
`polyveck_pack_w1`	✅	2s	7s	-71%
`polyveck_unpack_eta`	✅	2s	3s	-33%
`polyvecl_uniform_gamma1`	✅	2s	2s	+0%
`polyvecl_unpack_z`	✅	2s	4s	-50%
`polyw1_pack`	✅	2s	4s	-50%
`polyz_pack`	✅	2s	4s	-50%
`polyz_unpack_17_native_aarch64`	✅	2s	3s	-33%
`polyz_unpack_19_native_aarch64`	✅	2s	2s	+0%
`polyz_unpack_native`	✅	2s	4s	-50%
`power2round`	✅	2s	3s	-33%
`shake128_finalize`	✅	2s	2s	+0%
`shake128_squeeze`	✅	2s	2s	+0%
`shake128x4_squeezeblocks`	✅	2s	4s	-50%
`shake256_absorb`	✅	2s	1s	+100%
`shake256_finalize`	✅	2s	4s	-50%
`shake256_init`	✅	2s	4s	-50%
`sig_unpack_hints`	✅	2s	3s	-33%
`sys_check_capability`	✅	2s	3s	-33%
`unpack_pk_t1`	✅	2s	3s	-33%
`yvec_get_poly`	✅	2s	2s	+0%
`yvec_init`	✅	2s	2s	+0%
`mld_ct_abs_i32`	✅	1s	5s	-80%
`mld_ct_sel_int32`	✅	1s	1s	+0%
`mld_keccakf1600_extract_bytes`	✅	1s	2s	-50%
`mld_polymat_expand_entry`	✅	1s	2s	-50%
`nttunpack_native_x86_64`	✅	1s	4s	-75%
`rej_eta`	✅	1s	5s	-80%
`shake128_release`	✅	1s	3s	-67%
`sk_t0hat_get_poly`	✅	1s	3s	-67%

oqs-bot · 2026-05-06T01:18:16Z

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (194 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1544s	1481s	+4.3%
`poly_pointwise_montgomery_c`	✅	166s	158s	+5%
`polyvec_matrix_pointwise_montgomery_yvec`	✅	127s	119s	+7%
`rej_uniform_native`	✅	105s	104s	+1%
`mld_invntt_layer`	✅	100s	100s	+0%
`mld_ct_memcmp`	✅	67s	71s	-6%
`rej_uniform_native_x86_64`	✅	57s	-	new
`mld_ntt_layer`	✅	42s	40s	+5%
`sign_verify_internal`	✅	40s	41s	-2%
`fqmul`	✅	28s	29s	-3%
`mld_attempt_signature_generation`	✅	27s	26s	+4%
`keccakf1600x4_permute_native`	✅	22s	23s	-4%
`rej_uniform`	✅	20s	21s	-5%
`rej_uniform_c`	✅	18s	18s	+0%
`polyeta_unpack`	✅	17s	16s	+6%
`mld_ntt_butterfly_block`	✅	16s	15s	+7%
`polyveck_decompose`	✅	16s	14s	+14%
`mld_check_pct`	✅	14s	16s	-12%
`poly_add`	✅	13s	10s	+30%
`poly_chknorm_c`	✅	13s	14s	-7%
`polyt0_unpack`	✅	11s	11s	+0%
`keccak_absorb_once_x4`	✅	10s	9s	+11%
`poly_uniform_eta_4x`	✅	10s	14s	-29%
`poly_caddq_c`	✅	9s	7s	+29%
`polyvec_matrix_pointwise_montgomery_row`	✅	9s	8s	+12%
`sign_pk_from_sk`	✅	9s	4s	+125%
`poly_invntt_tomont_c`	✅	8s	8s	+0%
`polyveck_invntt_tomont`	✅	8s	5s	+60%
`sign`	✅	8s	8s	+0%
`compute_pack_t0_t1`	✅	7s	8s	-12%
`keccak_absorb`	✅	7s	8s	-12%
`keccakf1600_permute_native`	✅	7s	6s	+17%
`mld_sample_s1_s2`	✅	7s	7s	+0%
`pointwise_acc_native_aarch64`	✅	7s	7s	+0%
`pointwise_acc_native_x86_64`	✅	7s	10s	-30%
`poly_power2round`	✅	7s	6s	+17%
`polyveck_caddq`	✅	7s	6s	+17%
`polyz_unpack_c`	✅	7s	8s	-12%
`rej_eta_native`	✅	7s	4s	+75%
`keccakf1600_permute`	✅	6s	6s	+0%
`mld_compute_pack_z`	✅	6s	4s	+50%
`mld_sample_s1_s2_serial`	✅	6s	5s	+20%
`ntt_native_x86_64`	✅	6s	3s	+100%
`poly_shiftl`	✅	6s	5s	+20%
`polyvecl_chknorm`	✅	6s	4s	+50%
`polyvecl_ntt`	✅	6s	8s	-25%
`sign_keypair`	✅	6s	5s	+20%
`sign_open`	✅	6s	5s	+20%
`sign_signature_pre_hash_internal`	✅	6s	3s	+100%
`pointwise_native_x86_64`	✅	5s	4s	+25%
`poly_caddq`	✅	5s	4s	+25%
`poly_use_hint_native`	✅	5s	3s	+67%
`polyeta_pack`	✅	5s	2s	+150%
`polyt0_pack`	✅	5s	3s	+67%
`polyveck_reduce`	✅	5s	6s	-17%
`shake128x4_absorb_once`	✅	5s	4s	+25%
`sign_signature_pre_hash_shake256`	✅	5s	4s	+25%
`keccak_f1600_x1_native_aarch64_v84a`	✅	4s	4s	+0%
`keccak_squeeze`	✅	4s	2s	+100%
`keccakf1600x4_xor_bytes`	✅	4s	2s	+100%
`make_hint`	✅	4s	2s	+100%
`ntt_native_aarch64`	✅	4s	2s	+100%
`pack_sk_s1`	✅	4s	5s	-20%
`pointwise_native_aarch64`	✅	4s	3s	+33%
`poly_caddq_native_aarch64`	✅	4s	5s	-20%
`poly_challenge`	✅	4s	5s	-20%
`poly_decompose_32_native_aarch64`	✅	4s	1s	+300%
`poly_invntt_tomont`	✅	4s	2s	+100%
`poly_ntt_c`	✅	4s	3s	+33%
`poly_pointwise_montgomery_native`	✅	4s	6s	-33%
`poly_sub`	✅	4s	4s	+0%
`poly_uniform_4x`	✅	4s	2s	+100%
`poly_uniform_eta`	✅	4s	4s	+0%
`polyt1_pack`	✅	4s	5s	-20%
`polyt1_unpack`	✅	4s	6s	-33%
`polyveck_chknorm`	✅	4s	4s	+0%
`polyveck_pack_eta`	✅	4s	3s	+33%
`polyvecl_uniform_gamma1_serial`	✅	4s	2s	+100%
`polyvecl_unpack_eta`	✅	4s	3s	+33%
`polyz_pack`	✅	4s	4s	+0%
`reduce32`	✅	4s	4s	+0%
`shake256_init`	✅	4s	4s	+0%
`sign_signature`	✅	4s	2s	+100%
`sign_signature_extmu`	✅	4s	3s	+33%
`sign_signature_internal`	✅	4s	6s	-33%
`sign_verify_extmu`	✅	4s	5s	-20%
`sk_s2hat_get_poly`	✅	4s	2s	+100%
`unpack_sk`	✅	4s	3s	+33%
`caddq`	✅	3s	3s	+0%
`intt_native_aarch64`	✅	3s	5s	-40%
`keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid`	✅	3s	2s	+50%
`keccak_f1600_x4_native_avx2`	✅	3s	2s	+50%
`keccak_finalize`	✅	3s	2s	+50%
`keccak_squeezeblocks_x4`	✅	3s	5s	-40%
`keccakf1600_extract_bytes (big endian)`	✅	3s	2s	+50%
`keccakf1600_xor_bytes (big endian)`	✅	3s	2s	+50%
`mld_ct_cmask_nonzero_u8`	✅	3s	3s	+0%
`mld_h`	✅	3s	3s	+0%
`mld_prepare_domain_separation_prefix`	✅	3s	3s	+0%
`montgomery_reduce`	✅	3s	3s	+0%
`nttunpack_native_x86_64`	✅	3s	2s	+50%
`pack_sig_c`	✅	3s	2s	+50%
`pack_sig_h`	✅	3s	2s	+50%
`pack_sig_z`	✅	3s	4s	-25%
`pack_sk_rho_key_tr_s2`	✅	3s	2s	+50%
`poly_caddq_native`	✅	3s	4s	-25%
`poly_chknorm`	✅	3s	2s	+50%
`poly_chknorm_native`	✅	3s	4s	-25%
`poly_chknorm_native_aarch64`	✅	3s	3s	+0%
`poly_decompose`	✅	3s	3s	+0%
`poly_decompose_c`	✅	3s	7s	-57%
`poly_invntt_tomont_native`	✅	3s	3s	+0%
`poly_pointwise_montgomery`	✅	3s	3s	+0%
`poly_reduce`	✅	3s	5s	-40%
`poly_uniform`	✅	3s	5s	-40%
`poly_uniform_gamma1`	✅	3s	4s	-25%
`poly_uniform_gamma1_4x`	✅	3s	4s	-25%
`poly_use_hint`	✅	3s	2s	+50%
`poly_use_hint_native_aarch64`	✅	3s	4s	-25%
`polyvec_matrix_expand`	✅	3s	2s	+50%
`polyveck_ntt`	✅	3s	3s	+0%
`polyvecl_unpack_z`	✅	3s	1s	+200%
`polyz_unpack`	✅	3s	3s	+0%
`polyz_unpack_19_native_aarch64`	✅	3s	3s	+0%
`power2round`	✅	3s	2s	+50%
`shake128_init`	✅	3s	3s	+0%
`shake128_squeeze`	✅	3s	2s	+50%
`shake256`	✅	3s	2s	+50%
`shake256_absorb`	✅	3s	2s	+50%
`shake256_release`	✅	3s	4s	-25%
`shake256x4_absorb_once`	✅	3s	4s	-25%
`shake256x4_squeezeblocks`	✅	3s	3s	+0%
`sig_unpack_hints`	✅	3s	2s	+50%
`sign_keypair_internal`	✅	3s	6s	-50%
`sign_verify_pre_hash_internal`	✅	3s	3s	+0%
`sign_verify_pre_hash_shake256`	✅	3s	4s	-25%
`sk_s1hat_get_poly`	✅	3s	3s	+0%
`sys_check_capability`	✅	3s	4s	-25%
`unpack_pk_t1`	✅	3s	3s	+0%
`unpack_sk_s1hat`	✅	3s	4s	-25%
`unpack_sk_s2hat`	✅	3s	2s	+50%
`use_hint`	✅	3s	2s	+50%
`yvec_get_poly`	✅	3s	3s	+0%
`decompose`	✅	2s	4s	-50%
`keccak_f1600_x1_native_aarch64`	✅	2s	2s	+0%
`keccak_f1600_x4_native_aarch64_v84a`	✅	2s	3s	-33%
`keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid`	✅	2s	2s	+0%
`keccak_init`	✅	2s	3s	-33%
`keccakf1600_xor_bytes`	✅	2s	3s	-33%
`keccakf1600x4_permute`	✅	2s	3s	-33%
`mld_ct_abs_i32`	✅	2s	2s	+0%
`mld_ct_cmask_neg_i32`	✅	2s	2s	+0%
`mld_ct_cmask_nonzero_u32`	✅	2s	1s	+100%
`mld_ct_get_optblocker_i64`	✅	2s	1s	+100%
`mld_ct_get_optblocker_u8`	✅	2s	1s	+100%
`mld_ct_sel_int32`	✅	2s	2s	+0%
`mld_keccakf1600_extract_bytes`	✅	2s	4s	-50%
`mld_value_barrier_i64`	✅	2s	1s	+100%
`mld_value_barrier_u8`	✅	2s	3s	-33%
`poly_decompose_88_native_aarch64`	✅	2s	4s	-50%
`poly_decompose_native`	✅	2s	5s	-60%
`poly_ntt`	✅	2s	5s	-60%
`poly_ntt_native`	✅	2s	2s	+0%
`poly_permute_bitrev_to_custom_optional`	✅	2s	3s	-33%
`poly_permute_bitrev_to_custom_optional_native`	✅	2s	3s	-33%
`poly_use_hint_c`	✅	2s	3s	-33%
`polyvec_matrix_expand_serial`	✅	2s	4s	-50%
`polyveck_pack_w1`	✅	2s	3s	-33%
`polyveck_unpack_eta`	✅	2s	4s	-50%
`polyvecl_pack_eta`	✅	2s	4s	-50%
`polyvecl_pointwise_acc_montgomery`	✅	2s	3s	-33%
`polyvecl_pointwise_acc_montgomery_c`	✅	2s	2s	+0%
`polyvecl_pointwise_acc_montgomery_native`	✅	2s	2s	+0%
`polyvecl_uniform_gamma1`	✅	2s	3s	-33%
`polyw1_pack`	✅	2s	3s	-33%
`polyz_unpack_17_native_aarch64`	✅	2s	3s	-33%
`polyz_unpack_native`	✅	2s	4s	-50%
`rej_eta`	✅	2s	2s	+0%
`rej_eta_c`	✅	2s	5s	-60%
`shake128_absorb`	✅	2s	2s	+0%
`shake128_finalize`	✅	2s	2s	+0%
`shake128x4_squeezeblocks`	✅	2s	2s	+0%
`shake256_finalize`	✅	2s	3s	-33%
`sign_verify`	✅	2s	3s	-33%
`sk_t0hat_get_poly`	✅	2s	4s	-50%
`unpack_sk_t0hat`	✅	2s	5s	-60%
`fqscale`	✅	1s	3s	-67%
`intt_native_x86_64`	✅	1s	3s	-67%
`keccakf1600x4_extract_bytes`	✅	1s	2s	-50%
`mld_ct_get_optblocker_u32`	✅	1s	1s	+0%
`mld_polymat_expand_entry`	✅	1s	3s	-67%
`mld_value_barrier_u32`	✅	1s	2s	-50%
`shake128_release`	✅	1s	2s	-50%
`shake256_squeeze`	✅	1s	3s	-67%
`yvec_init`	✅	1s	2s	-50%

oqs-bot · 2026-05-06T01:18:26Z

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (194 proofs)

Proof	Status	Current	Previous	Change
`TOTAL`	✅	1432s	1401s	+2.2%
`poly_pointwise_montgomery_c`	✅	168s	167s	+1%
`rej_uniform_native`	✅	101s	109s	-7%
`mld_invntt_layer`	✅	97s	105s	-8%
`polyvec_matrix_pointwise_montgomery_yvec`	✅	87s	88s	-1%
`mld_ct_memcmp`	✅	69s	74s	-7%
`rej_uniform_native_x86_64`	✅	54s	-	new
`mld_ntt_layer`	✅	40s	42s	-5%
`fqmul`	✅	28s	29s	-3%
`mld_attempt_signature_generation`	✅	25s	25s	+0%
`keccakf1600x4_permute_native`	✅	22s	23s	-4%
`rej_uniform`	✅	20s	20s	+0%
`sign_verify_internal`	✅	20s	19s	+5%
`rej_uniform_c`	✅	18s	21s	-14%
`mld_ntt_butterfly_block`	✅	14s	16s	-12%
`poly_chknorm_c`	✅	14s	15s	-7%
`polyeta_unpack`	✅	14s	17s	-18%
`mld_check_pct`	✅	13s	10s	+30%
`polyz_unpack_c`	✅	13s	11s	+18%
`poly_uniform_eta_4x`	✅	12s	13s	-8%
`polyt0_unpack`	✅	12s	13s	-8%
`poly_add`	✅	11s	13s	-15%
`polyveck_chknorm`	✅	11s	11s	+0%
`compute_pack_t0_t1`	✅	10s	7s	+43%
`poly_caddq_c`	✅	9s	9s	+0%
`keccak_absorb`	✅	7s	6s	+17%
`keccak_absorb_once_x4`	✅	7s	10s	-30%
`poly_invntt_tomont_c`	✅	7s	10s	-30%
`poly_power2round`	✅	7s	6s	+17%
`polyvec_matrix_pointwise_montgomery_row`	✅	7s	8s	-12%
`sign`	✅	7s	8s	-12%
`keccakf1600_permute`	✅	6s	7s	-14%
`keccakf1600_permute_native`	✅	6s	7s	-14%
`keccakf1600_xor_bytes (big endian)`	✅	6s	4s	+50%
`mld_compute_pack_z`	✅	6s	6s	+0%
`mld_h`	✅	6s	3s	+100%
`poly_decompose_88_native_aarch64`	✅	6s	2s	+200%
`poly_decompose_c`	✅	6s	8s	-25%
`poly_shiftl`	✅	6s	5s	+20%
`polyveck_decompose`	✅	6s	5s	+20%
`polyveck_reduce`	✅	6s	4s	+50%
`sign_open`	✅	6s	5s	+20%
`sign_pk_from_sk`	✅	6s	6s	+0%
`sign_verify_pre_hash_internal`	✅	6s	4s	+50%
`pack_sk_rho_key_tr_s2`	✅	5s	2s	+150%
`pointwise_acc_native_aarch64`	✅	5s	4s	+25%
`pointwise_acc_native_x86_64`	✅	5s	6s	-17%
`poly_ntt`	✅	5s	1s	+400%
`poly_permute_bitrev_to_custom_optional`	✅	5s	3s	+67%
`poly_permute_bitrev_to_custom_optional_native`	✅	5s	4s	+25%
`poly_use_hint_native_aarch64`	✅	5s	4s	+25%
`polyvec_matrix_expand`	✅	5s	2s	+150%
`polyveck_unpack_eta`	✅	5s	2s	+150%
`rej_eta_c`	✅	5s	6s	-17%
`shake256x4_absorb_once`	✅	5s	2s	+150%
`sign_keypair`	✅	5s	4s	+25%
`sign_signature_pre_hash_shake256`	✅	5s	4s	+25%
`sign_verify_extmu`	✅	5s	3s	+67%
`unpack_sk_t0hat`	✅	5s	5s	+0%
`use_hint`	✅	5s	3s	+67%
`decompose`	✅	4s	1s	+300%
`intt_native_x86_64`	✅	4s	1s	+300%
`keccak_init`	✅	4s	3s	+33%
`keccak_squeezeblocks_x4`	✅	4s	3s	+33%
`mld_ct_cmask_nonzero_u8`	✅	4s	3s	+33%
`mld_prepare_domain_separation_prefix`	✅	4s	5s	-20%
`poly_caddq`	✅	4s	2s	+100%
`poly_sub`	✅	4s	5s	-20%
`poly_uniform`	✅	4s	2s	+100%
`polyt1_pack`	✅	4s	3s	+33%
`polyveck_caddq`	✅	4s	4s	+0%
`polyvecl_chknorm`	✅	4s	4s	+0%
`polyvecl_pointwise_acc_montgomery`	✅	4s	3s	+33%
`polyvecl_uniform_gamma1`	✅	4s	3s	+33%
`polyvecl_unpack_eta`	✅	4s	3s	+33%
`polyz_pack`	✅	4s	4s	+0%
`polyz_unpack`	✅	4s	4s	+0%
`rej_eta`	✅	4s	4s	+0%
`shake128x4_squeezeblocks`	✅	4s	2s	+100%
`sign_keypair_internal`	✅	4s	5s	-20%
`sign_signature_pre_hash_internal`	✅	4s	6s	-33%
`sign_verify`	✅	4s	5s	-20%
`sign_verify_pre_hash_shake256`	✅	4s	5s	-20%
`sk_t0hat_get_poly`	✅	4s	1s	+300%
`unpack_pk_t1`	✅	4s	3s	+33%
`unpack_sk_s1hat`	✅	4s	2s	+100%
`keccak_f1600_x1_native_aarch64_v84a`	✅	3s	1s	+200%
`make_hint`	✅	3s	2s	+50%
`mld_ct_cmask_neg_i32`	✅	3s	3s	+0%
`mld_ct_cmask_nonzero_u32`	✅	3s	3s	+0%
`mld_ct_get_optblocker_u32`	✅	3s	2s	+50%
`mld_keccakf1600_extract_bytes`	✅	3s	2s	+50%
`mld_polymat_expand_entry`	✅	3s	3s	+0%
`mld_sample_s1_s2`	✅	3s	3s	+0%
`mld_sample_s1_s2_serial`	✅	3s	3s	+0%
`ntt_native_aarch64`	✅	3s	5s	-40%
`nttunpack_native_x86_64`	✅	3s	3s	+0%
`pack_sig_z`	✅	3s	2s	+50%
`poly_invntt_tomont_native`	✅	3s	2s	+50%
`poly_ntt_c`	✅	3s	2s	+50%
`poly_pointwise_montgomery_native`	✅	3s	4s	-25%
`poly_uniform_4x`	✅	3s	3s	+0%
`poly_uniform_gamma1`	✅	3s	3s	+0%
`poly_use_hint_c`	✅	3s	3s	+0%
`polyt0_pack`	✅	3s	2s	+50%
`polyt1_unpack`	✅	3s	3s	+0%
`polyvec_matrix_expand_serial`	✅	3s	4s	-25%
`polyveck_invntt_tomont`	✅	3s	4s	-25%
`polyveck_pack_eta`	✅	3s	4s	-25%
`polyvecl_pack_eta`	✅	3s	2s	+50%
`polyvecl_pointwise_acc_montgomery_c`	✅	3s	2s	+50%
`polyw1_pack`	✅	3s	3s	+0%
`power2round`	✅	3s	2s	+50%
`rej_eta_native`	✅	3s	3s	+0%
`shake128_finalize`	✅	3s	2s	+50%
`shake128_squeeze`	✅	3s	2s	+50%
`shake256`	✅	3s	2s	+50%
`shake256_init`	✅	3s	2s	+50%
`shake256_release`	✅	3s	2s	+50%
`shake256_squeeze`	✅	3s	4s	-25%
`sig_unpack_hints`	✅	3s	4s	-25%
`sign_signature`	✅	3s	6s	-50%
`sign_signature_extmu`	✅	3s	4s	-25%
`sign_signature_internal`	✅	3s	5s	-40%
`unpack_sk_s2hat`	✅	3s	3s	+0%
`yvec_get_poly`	✅	3s	3s	+0%
`caddq`	✅	2s	3s	-33%
`fqscale`	✅	2s	3s	-33%
`intt_native_aarch64`	✅	2s	4s	-50%
`keccak_f1600_x1_native_aarch64`	✅	2s	2s	+0%
`keccak_f1600_x4_native_aarch64_v84a`	✅	2s	4s	-50%
`keccak_f1600_x4_native_avx2`	✅	2s	4s	-50%
`keccak_finalize`	✅	2s	1s	+100%
`keccak_squeeze`	✅	2s	3s	-33%
`keccakf1600_extract_bytes (big endian)`	✅	2s	2s	+0%
`keccakf1600_xor_bytes`	✅	2s	1s	+100%
`keccakf1600x4_xor_bytes`	✅	2s	1s	+100%
`mld_ct_abs_i32`	✅	2s	2s	+0%
`mld_ct_get_optblocker_i64`	✅	2s	2s	+0%
`mld_ct_get_optblocker_u8`	✅	2s	1s	+100%
`mld_ct_sel_int32`	✅	2s	2s	+0%
`mld_value_barrier_i64`	✅	2s	1s	+100%
`mld_value_barrier_u32`	✅	2s	3s	-33%
`mld_value_barrier_u8`	✅	2s	2s	+0%
`ntt_native_x86_64`	✅	2s	2s	+0%
`pack_sig_h`	✅	2s	3s	-33%
`pointwise_native_aarch64`	✅	2s	4s	-50%
`pointwise_native_x86_64`	✅	2s	4s	-50%
`poly_caddq_native`	✅	2s	3s	-33%
`poly_caddq_native_aarch64`	✅	2s	2s	+0%
`poly_challenge`	✅	2s	4s	-50%
`poly_chknorm`	✅	2s	2s	+0%
`poly_chknorm_native`	✅	2s	3s	-33%
`poly_chknorm_native_aarch64`	✅	2s	2s	+0%
`poly_decompose_32_native_aarch64`	✅	2s	1s	+100%
`poly_invntt_tomont`	✅	2s	4s	-50%
`poly_ntt_native`	✅	2s	4s	-50%
`poly_pointwise_montgomery`	✅	2s	3s	-33%
`poly_reduce`	✅	2s	3s	-33%
`poly_uniform_eta`	✅	2s	2s	+0%
`poly_uniform_gamma1_4x`	✅	2s	4s	-50%
`poly_use_hint`	✅	2s	3s	-33%
`poly_use_hint_native`	✅	2s	3s	-33%
`polyeta_pack`	✅	2s	2s	+0%
`polyveck_ntt`	✅	2s	2s	+0%
`polyveck_pack_w1`	✅	2s	2s	+0%
`polyvecl_ntt`	✅	2s	4s	-50%
`polyvecl_pointwise_acc_montgomery_native`	✅	2s	2s	+0%
`polyvecl_unpack_z`	✅	2s	2s	+0%
`polyz_unpack_17_native_aarch64`	✅	2s	1s	+100%
`polyz_unpack_19_native_aarch64`	✅	2s	3s	-33%
`polyz_unpack_native`	✅	2s	1s	+100%
`reduce32`	✅	2s	2s	+0%
`shake128_absorb`	✅	2s	3s	-33%
`shake128_init`	✅	2s	1s	+100%
`shake128_release`	✅	2s	4s	-50%
`shake128x4_absorb_once`	✅	2s	2s	+0%
`shake256_absorb`	✅	2s	2s	+0%
`shake256x4_squeezeblocks`	✅	2s	2s	+0%
`sk_s1hat_get_poly`	✅	2s	2s	+0%
`sk_s2hat_get_poly`	✅	2s	4s	-50%
`sys_check_capability`	✅	2s	3s	-33%
`yvec_init`	✅	2s	3s	-33%
`keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid`	✅	1s	2s	-50%
`keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid`	✅	1s	2s	-50%
`keccakf1600x4_extract_bytes`	✅	1s	4s	-75%
`keccakf1600x4_permute`	✅	1s	2s	-50%
`montgomery_reduce`	✅	1s	4s	-75%
`pack_sig_c`	✅	1s	2s	-50%
`pack_sk_s1`	✅	1s	2s	-50%
`poly_decompose`	✅	1s	1s	+0%
`poly_decompose_native`	✅	1s	3s	-67%
`polyvecl_uniform_gamma1_serial`	✅	1s	3s	-67%
`shake256_finalize`	✅	1s	1s	+0%
`unpack_sk`	✅	1s	2s	-50%

mkannwischer · 2026-05-07T05:14:04Z

@jakemas, thanks for getting this into shape! ~~How far are you with the proof of it? Would it be an option to merge it together with the correctness proof?~~ Nevermind, this already has the proof. Sorry.

jakemas · 2026-05-07T05:20:19Z

@mkannwischer ok, ready for review. The instruction PR will need to land in s2n-bignum first, while we wait I'll try the constant time proof -- but I'm really happy to be able to PR the conversion with a hol-light proof. Let me know if anything is missing.

jakemas · 2026-05-07T05:20:48Z

ahh just saw you comment! Yes got the proof in, runs in ~12min!

jakemas · 2026-05-07T05:22:44Z

I left this name generic, as it could be shared between arm/x86 proof.

mkannwischer · 2026-05-07T05:31:47Z

@mkannwischer ok, ready for review. The instruction PR will need to land in s2n-bignum first, while we wait I'll try the constant time proof -- but I'm really happy to be able to PR the conversion with a hol-light proof. Let me know if anything is missing.

Thanks! Yes, I agree that it's great to get the conversion and the proof in at the same time. Would be great if we can do the same for the remaining proofs.
I'll review later today.

Note that a constant-time proof is not needed for this function (all inputs are public), but we do want a memory safety proof like here: https://github.com/pq-code-package/mlkem-native/blob/2bf8e59f4330697b3924c572924136c96eb96960/proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml#L1562

mkannwischer

Thanks @jakemas. Here is a first set of comments.

mkannwischer · 2026-05-07T07:33:24Z

+__contract__(
+  requires(memory_no_alias(r, sizeof(int32_t) * MLDSA_N))
+  requires(memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN))
+  requires(memory_no_alias(table, 256 * sizeof(uint64_t)))


the table actually needs to be == mld_rej_uniform_table

Fixed — requires(table == (const uint8_t *)mld_rej_uniform_table) in both dev/ and mldsa/src/native/ copies.

mkannwischer · 2026-05-07T07:35:28Z

+        jmp     rej_uniform_avx2_asm_scalar
+
+rej_uniform_avx2_asm_done:
+        vzeroupper


We don't have vzeroupper in any other routine. I don't know enough about x86_64 to know how important it is, but we should either have it everywhere or nowhere.

Fixed — dropped vzeroupper from both dev/ and mldsa/src/native/ copies of the .S for consistency with the other ML-DSA routines. Proof adjusted accordingly.

mkannwischer · 2026-05-07T07:36:11Z

-  version = "f3c5acff6948d559194245237f6aaa7ebf7fcae8";
+  # Pinned to https://github.com/awslabs/s2n-bignum/pull/387 head,
+  # which adds VMOVMSKPS, VPMOVZXBD, and VZEROUPPER instruction models
+  # required by the x86_64 rej_uniform proof.
+  version = "4c4fe1dfc8b79720013517a7b4dec9014c85fcf2";
  src = fetchFromGitHub {
    owner = "awslabs";
    repo = "s2n-bignum";
    rev = "${version}";
-    hash = "sha256-kfc8X2e+voefttshSUdifDc3Qn+dx0Gq5ENNLhWIdw0=";
+    hash = "sha256-64MJOqoDunpn6fx1j9P4+fDoRNZ8GRTB/d4C2JWvxFA=";


Leaving a comment here to remind us that we still have to change this.

Still pinned to s2n-bignum PR #387 / #401 (the branch that carries VMOVMSKPS, VPMOVZXBD, VZEROUPPER + the mldsa_rej_uniform proof). Will update the pin once those are merged into s2n-bignum main.

mkannwischer · 2026-05-07T07:36:41Z

This file should be autogenerated via autogen

Fixed — added rej_uniform_avx2_asm.S to the x86_64 joblist in scripts/autogen (joblist_x86_64), so proofs/hol_light/x86_64/mldsa/rej_uniform_avx2_asm.S is now regenerated by scripts/autogen.

mkannwischer · 2026-05-07T07:37:02Z

+(* Lookup table for ML-DSA rejection uniform sampling. *)
+(* Each entry is 8 bytes: permutation indices for VPERMD. *)
+
+let mldsa_rej_uniform_table = (REWRITE_RULE[MAP] o define)


This file should be autogenerated via autogen

Fixed — added gen_avx2_hol_light_rej_uniform_table to scripts/autogen, invoked from gen_zeta_tables. proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml is now regenerated alongside the C/aarch64 lookup tables (mirrors the mlkem-native pattern).

mkannwischer · 2026-05-07T07:42:32Z

+let REJ_SAMPLE = define
+ `REJ_SAMPLE l = FILTER (\x:int32. val x < 8380417)
+    (MAP (\x:24 word. word(val x MOD 2 EXP 23):int32) l)`;;


The spec should be moved to mldsa_specs.ml so we can re-use it for the aarch64 proof.

Fixed — REJ_SAMPLE, REJ_SAMPLE_EMPTY, REJ_SAMPLE_APPEND now live in proofs/hol_light/common/mldsa_specs.ml (shared between arches, matching the shape used by s2n-bignum #378 for aarch64). The x86-only derived lemmas (REJ_SAMPLE_SPLIT, REJ_SAMPLE_PREFIX_256, REJ_SAMPLE_STEP_LE) stay in rej_uniform_avx2_asm.ml since they're only used by the AVX2 scalar-tail analysis.

mkannwischer · 2026-05-07T07:43:41Z

+              (let outlist = SUB_LIST(0,256) (REJ_SAMPLE inlist) in
+               let outlen = LENGTH outlist in
+               C_RETURN s = word outlen /\
+               read(memory :> bytes(res,4 * outlen)) s =


Please add an explicit bound post-condition that all coefficients < q here to match the CBMC spec.

think i got this, testing now added:

(!i. i < outlen ==> val(read(memory :> bytes32 (word_add res (word(4 * i)))) s) < 8380417)))

mkannwischer · 2026-05-07T07:45:50Z

+    const uint8_t *table)
+__contract__(
+  requires(memory_no_alias(r, sizeof(int32_t) * MLDSA_N))
+  requires(memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN))


Let's change this to say 840 so it matches the HOL-light spec exactly.

Fixed — requires(memory_no_alias(buf, 840)) (literal 840 matches the HOL-Light spec exactly), in both dev/ and mldsa/src/native/ copies.

mkannwischer · 2026-05-07T07:46:09Z

+unsigned mld_rej_uniform_avx2_asm(
+    int32_t *r, const uint8_t buf[MLD_AVX2_REJ_UNIFORM_BUFLEN],
+    const uint8_t *table)
+__contract__(


Add the comment here that this needs to be kept in sync with the HOL-light spec.

Fixed — added /* This contract must be kept in sync with the HOL-Light specification in proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml */ above the __contract__ block in both dev/ and mldsa/src/native/ copies.

mkannwischer · 2026-05-07T07:46:43Z

+          MAYCHANGE [memory :> bytes(res,1024)])`,
+  X86_PROMOTE_RETURN_NOSTACK_TAC mldsa_rej_uniform_tmc
+    MLDSA_REJ_UNIFORM_CORRECT);;
+


Add a comment here that this needs to be kept in sync wityh the CBMC spec.

Fixed — added a comment block above SUBROUTINE_CORRECT variants noting these specifications must be kept in sync with the CBMC contract in dev/x86_64/src/arith_native_x86_64.h / mldsa/src/native/x86_64/src/arith_native_x86_64.h.

Reviewer-requested cleanup for the x86_64 rej_uniform assembly and HOL Light proof: Contract tightening (dev and mldsa copies of arith_native_x86_64.h): - requires(memory_no_alias(buf, 840)) instead of memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN) so the literal matches the HOL Light spec exactly. - requires(table == (const uint8_t *)mld_rej_uniform_table) pinning the table to the exported rejection-sampling table, replacing the looser memory_no_alias(table, 256 * sizeof(uint64_t)). - Clarify sync comment. vzeroupper removal: none of the other asm routines issue vzeroupper; drop it from rej_uniform for consistency. This shifts the function length by 3 bytes, so the HOL Light proof's nonoverlapping 246 / pc+245 references in mldsa_rej_uniform.ml become 243 / pc+242 accordingly, and the two X86_STEPS_TAC invocations that stepped the vzeroupper byte are removed. Bytecode regenerated via autogen --update-hol-light-bytecode. Autogen plumbing: register rej_uniform_avx2_asm.S in the x86_64 HOL Light asm joblist so the proofs/hol_light/x86_64/mldsa/ copy is regenerated by scripts/autogen. Add gen_avx2_hol_light_rej_uniform_table to regenerate proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml alongside the C/aarch64 lookup tables (matches mlkem-native's pattern). Cross-reference comment in proofs/hol_light/x86_64/proofs/ rej_uniform_avx2_asm.ml pointing at the CBMC contract. Proof runtime: ~5-6 min in the CI native build. Signed-off-by: Jake Massimo <jakemas@amazon.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the x86_64 AVX2 rej_uniform implementation (previously written in C with intrinsics) with a hand-written assembly routine, and adds a functional correctness proof in HOL Light on top of the s2n-bignum infrastructure. Highlights: - dev/x86_64/src/rej_uniform_avx2_asm.S and mldsa/src/native/x86_64/src/rej_uniform_avx2_asm.S: new .S file exposing mld_rej_uniform_avx2_asm (replaces the intrinsics-based rej_uniform_avx2.c). - proofs/hol_light/x86_64/mldsa/rej_uniform_avx2_asm.S and proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml: HOL Light proof of MLDSA_REJ_UNIFORM_{,NOIBT_}SUBROUTINE_CORRECT, with no remaining CHEATs. - proofs/cbmc/rej_uniform_native_x86_64/: CBMC contract proof (249/249 passing). - CI: hol_light.yml and Makefile updated for the new bytecode dump and autogen instruction-decode format; s2n-bignum pin bumped to include the supporting tactics. Naming follows the asm-suffix convention introduced on main (eada109 / e810d00): symbol mld_rej_uniform_avx2_asm, label prefix rej_uniform_avx2_asm_. Signed-off-by: Jake Massimo <jakemas@amazon.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reviewer-requested cleanup for the x86_64 rej_uniform assembly and HOL Light proof: Contract tightening (dev and mldsa copies of arith_native_x86_64.h): - requires(memory_no_alias(buf, 840)) instead of memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN) so the literal matches the HOL Light spec exactly. - requires(table == (const uint8_t *)mld_rej_uniform_table) pinning the table to the exported rejection-sampling table, replacing the looser memory_no_alias(table, 256 * sizeof(uint64_t)). - Clarify sync comment. vzeroupper removal: none of the other asm routines issue vzeroupper; drop it from rej_uniform for consistency. This shifts the function length by 3 bytes, so the HOL Light proof's nonoverlapping 246 / pc+245 references in mldsa_rej_uniform.ml become 243 / pc+242 accordingly, and the two X86_STEPS_TAC invocations that stepped the vzeroupper byte are removed. Bytecode regenerated via autogen --update-hol-light-bytecode. Autogen plumbing: register rej_uniform_avx2_asm.S in the x86_64 HOL Light asm joblist so the proofs/hol_light/x86_64/mldsa/ copy is regenerated by scripts/autogen. Add gen_avx2_hol_light_rej_uniform_table to regenerate proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml alongside the C/aarch64 lookup tables (matches mlkem-native's pattern). Cross-reference comment in proofs/hol_light/x86_64/proofs/ rej_uniform_avx2_asm.ml pointing at the CBMC contract. Proof runtime: ~5-6 min in the CI native build. Signed-off-by: Jake Massimo <jakemas@amazon.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- proofs/hol_light/README.md: add rej_uniform_avx2_asm.S to the x86_64 arithmetic proofs section. - proofs/hol_light/common/mldsa_specs.ml: add REJ_SAMPLE, REJ_SAMPLE_EMPTY, REJ_SAMPLE_APPEND. These match what's used in s2n-bignum #378 (aarch64) so the aarch64 rej_uniform proof can share the shape. - proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml: needs the new mldsa_specs dependency; drop the duplicate REJ_SAMPLE definition. The x86-only REJ_SAMPLE_SPLIT / REJ_SAMPLE_PREFIX_256 / REJ_SAMPLE_STEP_LE (scalar-tail analysis helpers) stay here. - .github/workflows/hol_light.yml: add mldsa_specs.ml to the rej_uniform_avx2_asm needs list. Proof still passes in ~8 min native build. Signed-off-by: Jake Massimo <jakemas@amazon.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Prove that every element of SUB_LIST (0,256) (REJ_SAMPLE inlist) has val c < 8380417 directly from the FILTER definition. Provides the coefficient bound property requested in the review; callers can specialize to per-index via EL / MEM_EL. Kept as a standalone lemma rather than adding a per-index postcondition to MLDSA_REJ_UNIFORM_CORRECT to avoid touching the inner Hoare triple. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Jake Massimo <jakemas@amazon.com>

Strengthen the postcondition of MLDSA_REJ_UNIFORM_CORRECT and MLDSA_REJ_UNIFORM_(NOIBT_)SUBROUTINE_CORRECT to include the per-coefficient bound !i. i < outlen ==> val(read(memory :> bytes32 (word_add res (word(4 * i)))) s) < 8380417 matching the CBMC contract ensures(array_bound(buf, 0, len, 0, 8380417)) in arith_native_x86_64.h. Uses the same layering pattern as poly_use_hint_32_aarch64_asm (ENSURES_STRENGTHEN_POST): introduces ENSURES_STRENGTHEN_POST_X86, a memory->list-element bridge VAL_READ_BYTES32_FROM_WORDLIST, and the combinatorial lemma REJ_SAMPLE_COEFF_BOUND, then derives MLDSA_REJ_UNIFORM_CORRECT_BOUND by showing the old num_of_wordlist-based postcondition implies the new per-index bound. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Jake Massimo <jakemas@amazon.com>

- Pin s2n-bignum to awslabs/s2n-bignum@ccef2456 (upstream main with USHLL/MOVI/VPCMPGTD instruction models merged) - Add HOL Light eta rejection table generation to autogen, matching the pattern from the x86 rej_uniform table in PR #1014 Signed-off-by: Jake Massimo <jakemas@amazon.com> Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>

jakemas requested a review from a team as a code owner April 3, 2026 04:11

jakemas marked this pull request as draft April 3, 2026 04:11

jakemas added the benchmark label Apr 3, 2026

github-actions Bot reviewed Apr 3, 2026

View reviewed changes

oqs-bot reviewed Apr 3, 2026

View reviewed changes

jakemas added benchmark and removed benchmark labels Apr 3, 2026

oqs-bot reviewed Apr 3, 2026

View reviewed changes

jakemas added benchmark and removed benchmark labels Apr 3, 2026

jakemas mentioned this pull request Apr 8, 2026

x86: Add VMOVMSKPS, VPMOVZXBD, VZEROUPPER instruction models awslabs/s2n-bignum#387

Merged

mkannwischer self-assigned this Apr 22, 2026

jakemas force-pushed the jakemas/rej-uniform-asm branch 2 times, most recently from 7951c08 to 5607508 Compare May 5, 2026 22:56

jakemas force-pushed the jakemas/rej-uniform-asm branch 4 times, most recently from 55f0028 to 991e2e9 Compare May 7, 2026 05:10

jakemas marked this pull request as ready for review May 7, 2026 05:10

jakemas commented May 7, 2026

View reviewed changes

mkannwischer requested changes May 7, 2026

View reviewed changes

jakemas force-pushed the jakemas/rej-uniform-asm branch from 31b4b4c to 967fa0b Compare May 8, 2026 06:52

jakemas force-pushed the jakemas/rej-uniform-asm branch from 967fa0b to b1b11e9 Compare May 8, 2026 06:58

jakemas and others added 3 commits May 8, 2026 07:44

jakemas force-pushed the jakemas/rej-uniform-asm branch from 2d393ab to b10421a Compare May 9, 2026 04:35

jakemas mentioned this pull request May 14, 2026

HOL-Light: Add HOL Light proof and CBMC contracts for x86 poly_caddq #1068

Merged

Conversation

jakemas commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance

AMD EPYC 3rd gen (c6a) — opt

Proof

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Uh oh!

github-actions Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton4

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 3rd gen (c6a) (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton4 (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton3

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Intel Xeon 3rd gen (c6i) (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

AMD EPYC 4th gen (c7a) (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton3 (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton2

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Graviton2 (no-opt)

Uh oh!

oqs-bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

jakemas commented Apr 3, 2026 •

edited

Loading

github-actions Bot left a comment •

edited

Loading

github-actions Bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot commented Apr 3, 2026 •

edited

Loading

oqs-bot commented Apr 3, 2026 •

edited

Loading

oqs-bot commented Apr 3, 2026 •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot left a comment •

edited

Loading

oqs-bot commented May 6, 2026 •

edited

Loading

oqs-bot commented May 6, 2026 •

edited

Loading

oqs-bot commented May 6, 2026 •

edited

Loading

mkannwischer commented May 7, 2026 •

edited

Loading

jakemas commented May 7, 2026 •

edited

Loading