Skip to content

x86_64: Replace rej_uniform intrinsics with assembly#1014

Open
jakemas wants to merge 5 commits into
mainfrom
jakemas/rej-uniform-asm
Open

x86_64: Replace rej_uniform intrinsics with assembly#1014
jakemas wants to merge 5 commits into
mainfrom
jakemas/rej-uniform-asm

Conversation

@jakemas
Copy link
Copy Markdown
Contributor

@jakemas jakemas commented Apr 3, 2026

Summary

Resolves #926 and #418 (?)

Hol-light proof needs instructions from awslabs/s2n-bignum#387

  • Replace AVX2 intrinsics implementation of rej_uniform with hand-written x86_64 assembly
  • Table passed as parameter (consistent with aarch64 approach), avoiding external symbol references for simpasm compatibility
  • All constants constructed from immediates (no .rodata section), enabling future HOL-Light formal verification
  • Register name #defines with #undef cleanup for SCU builds (following mlkem-native pattern)
  • Adds poly_uniform to component benchmark
  • HOL-Light proof infrastructure included (bytecode, table definition, proof skeleton, Makefile)

ML-DSA's 23-bit coefficients require 32-bit lanes, which naturally fills a 256-bit YMM register for 8 elements per iteration. This led to the choice of AVX2 over SSE — with SSE's 128-bit registers and 32-bit lanes, we'd only get 4 coefficients per iteration vs 8 with AVX2.

Performance

AMD EPYC 3rd gen (c6a) — opt

Benchmark Before After Change
ML-DSA-44 keypair 68,874 66,828 -3%
ML-DSA-44 sign 187,594 184,181 -2%
ML-DSA-44 verify 68,993 65,665 -5%
ML-DSA-65 keypair 119,089 112,640 -5%
ML-DSA-65 sign 299,488 294,836 -2%
ML-DSA-65 verify 115,385 108,494 -6%
ML-DSA-87 keypair 203,754 185,518 -9%
ML-DSA-87 sign 396,462 378,579 -5%
ML-DSA-87 verify 196,231 177,157 -10%

Proof

Includes HOL-Light and CBMC proofs, written by claude opus 4.7.

HOL-Light / x86_64 HOL Light proof for mldsa_rej_uniform.S (pull_request) Successful in 12m

No constant time/SAFE proof yet. Will continue to work on it as the instruction PR lands in s2n-bignum.

@jakemas jakemas requested a review from a team as a code owner April 3, 2026 04:11
@jakemas jakemas marked this pull request as draft April 3, 2026 04:11
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 113118 cycles 113013 cycles 1.00
ML-DSA-44 sign 355649 cycles 355605 cycles 1.00
ML-DSA-44 verify 117801 cycles 117682 cycles 1.00
ML-DSA-65 keypair 196381 cycles 196214 cycles 1.00
ML-DSA-65 sign 589557 cycles 588943 cycles 1.00
ML-DSA-65 verify 194604 cycles 194375 cycles 1.00
ML-DSA-87 keypair 322210 cycles 322148 cycles 1.00
ML-DSA-87 sign 752493 cycles 752763 cycles 1.00
ML-DSA-87 verify 320055 cycles 319900 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 212361 cycles 212622 cycles 1.00
ML-DSA-44 sign 760716 cycles 760066 cycles 1.00
ML-DSA-44 verify 228743 cycles 228987 cycles 1.00
ML-DSA-65 keypair 379384 cycles 379665 cycles 1.00
ML-DSA-65 sign 1250617 cycles 1249827 cycles 1.00
ML-DSA-65 verify 371531 cycles 372045 cycles 1.00
ML-DSA-87 keypair 604335 cycles 605426 cycles 1.00
ML-DSA-87 sign 1593243 cycles 1591413 cycles 1.00
ML-DSA-87 verify 618270 cycles 617375 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 66830 cycles 68874 cycles 0.97
ML-DSA-44 sign 184077 cycles 187594 cycles 0.98
ML-DSA-44 verify 65562 cycles 68993 cycles 0.95
ML-DSA-65 keypair 111959 cycles 119089 cycles 0.94
ML-DSA-65 sign 292002 cycles 299488 cycles 0.98
ML-DSA-65 verify 108472 cycles 115385 cycles 0.94
ML-DSA-87 keypair 185520 cycles 203754 cycles 0.91
ML-DSA-87 sign 379630 cycles 396462 cycles 0.96
ML-DSA-87 verify 177291 cycles 196231 cycles 0.90

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 68316 cycles 68121 cycles 1.00
ML-DSA-44 sign 202487 cycles 202429 cycles 1.00
ML-DSA-44 verify 70722 cycles 70691 cycles 1.00
ML-DSA-65 keypair 121061 cycles 121050 cycles 1.00
ML-DSA-65 sign 331574 cycles 332242 cycles 1.00
ML-DSA-65 verify 117810 cycles 118169 cycles 1.00
ML-DSA-87 keypair 198140 cycles 198283 cycles 1.00
ML-DSA-87 sign 427941 cycles 428124 cycles 1.00
ML-DSA-87 verify 194637 cycles 194645 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 134578 cycles 135123 cycles 1.00
ML-DSA-44 sign 523923 cycles 523989 cycles 1.00
ML-DSA-44 verify 147640 cycles 147421 cycles 1.00
ML-DSA-65 keypair 228634 cycles 227032 cycles 1.01
ML-DSA-65 sign 864042 cycles 860343 cycles 1.00
ML-DSA-65 verify 236700 cycles 234883 cycles 1.01
ML-DSA-87 keypair 371955 cycles 371568 cycles 1.00
ML-DSA-87 sign 1080535 cycles 1079389 cycles 1.00
ML-DSA-87 verify 383811 cycles 383403 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 56863 cycles 56287 cycles 1.01
ML-DSA-44 sign 181063 cycles 181562 cycles 1.00
ML-DSA-44 verify 61140 cycles 61061 cycles 1.00
ML-DSA-65 keypair 98291 cycles 98770 cycles 1.00
ML-DSA-65 sign 298368 cycles 299116 cycles 1.00
ML-DSA-65 verify 100343 cycles 100251 cycles 1.00
ML-DSA-87 keypair 152430 cycles 153265 cycles 0.99
ML-DSA-87 sign 354719 cycles 355417 cycles 1.00
ML-DSA-87 verify 153124 cycles 153884 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 128315 cycles 128272 cycles 1.00
ML-DSA-44 sign 447513 cycles 447600 cycles 1.00
ML-DSA-44 verify 138123 cycles 144678 cycles 0.95
ML-DSA-65 keypair 220541 cycles 220481 cycles 1.00
ML-DSA-65 sign 726484 cycles 726951 cycles 1.00
ML-DSA-65 verify 222926 cycles 223461 cycles 1.00
ML-DSA-87 keypair 366142 cycles 366604 cycles 1.00
ML-DSA-87 sign 927541 cycles 927414 cycles 1.00
ML-DSA-87 verify 374016 cycles 373875 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 72353 cycles 72235 cycles 1.00
ML-DSA-44 sign 212424 cycles 212375 cycles 1.00
ML-DSA-44 verify 75754 cycles 75714 cycles 1.00
ML-DSA-65 keypair 127646 cycles 127612 cycles 1.00
ML-DSA-65 sign 351030 cycles 350845 cycles 1.00
ML-DSA-65 verify 125627 cycles 125755 cycles 1.00
ML-DSA-87 keypair 205980 cycles 208476 cycles 0.99
ML-DSA-87 sign 444778 cycles 450018 cycles 0.99
ML-DSA-87 verify 205601 cycles 205843 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 157499 cycles 157541 cycles 1.00
ML-DSA-44 sign 549244 cycles 549413 cycles 1.00
ML-DSA-44 verify 169448 cycles 168865 cycles 1.00
ML-DSA-65 keypair 268437 cycles 268818 cycles 1.00
ML-DSA-65 sign 903422 cycles 903672 cycles 1.00
ML-DSA-65 verify 275283 cycles 274680 cycles 1.00
ML-DSA-87 keypair 448241 cycles 448464 cycles 1.00
ML-DSA-87 sign 1158654 cycles 1157970 cycles 1.00
ML-DSA-87 verify 458704 cycles 458043 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 42142 cycles 40662 cycles 1.04
ML-DSA-44 sign 134317 cycles 132808 cycles 1.01
ML-DSA-44 verify 44844 cycles 43607 cycles 1.03
ML-DSA-65 keypair 72940 cycles 71859 cycles 1.02
ML-DSA-65 sign 213861 cycles 213367 cycles 1.00
ML-DSA-65 verify 73729 cycles 72847 cycles 1.01
ML-DSA-87 keypair 107003 cycles 109237 cycles 0.98
ML-DSA-87 sign 250851 cycles 254550 cycles 0.99
ML-DSA-87 verify 107681 cycles 109371 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 120754 cycles 120325 cycles 1.00
ML-DSA-44 sign 447570 cycles 447576 cycles 1.00
ML-DSA-44 verify 130511 cycles 130561 cycles 1.00
ML-DSA-65 keypair 205040 cycles 205018 cycles 1.00
ML-DSA-65 sign 728790 cycles 729474 cycles 1.00
ML-DSA-65 verify 210029 cycles 209605 cycles 1.00
ML-DSA-87 keypair 337610 cycles 336678 cycles 1.00
ML-DSA-87 sign 925517 cycles 924223 cycles 1.00
ML-DSA-87 verify 347563 cycles 347399 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 138744 cycles 138561 cycles 1.00
ML-DSA-44 sign 483982 cycles 484140 cycles 1.00
ML-DSA-44 verify 148574 cycles 162388 cycles 0.91
ML-DSA-65 keypair 241921 cycles 241950 cycles 1.00
ML-DSA-65 sign 792702 cycles 792591 cycles 1.00
ML-DSA-65 verify 240763 cycles 241288 cycles 1.00
ML-DSA-87 keypair 396106 cycles 397138 cycles 1.00
ML-DSA-87 sign 1013453 cycles 1013569 cycles 1.00
ML-DSA-87 verify 403446 cycles 403178 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 113189 cycles 113255 cycles 1.00
ML-DSA-44 sign 355791 cycles 356042 cycles 1.00
ML-DSA-44 verify 117978 cycles 117969 cycles 1.00
ML-DSA-65 keypair 196342 cycles 196623 cycles 1.00
ML-DSA-65 sign 589183 cycles 589242 cycles 1.00
ML-DSA-65 verify 194553 cycles 194559 cycles 1.00
ML-DSA-87 keypair 322537 cycles 322281 cycles 1.00
ML-DSA-87 sign 753613 cycles 753546 cycles 1.00
ML-DSA-87 verify 320115 cycles 320070 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 213219 cycles 212521 cycles 1.00
ML-DSA-44 sign 761553 cycles 760970 cycles 1.00
ML-DSA-44 verify 241351 cycles 234237 cycles 1.03
ML-DSA-65 keypair 380573 cycles 379762 cycles 1.00
ML-DSA-65 sign 1252452 cycles 1252199 cycles 1.00
ML-DSA-65 verify 372839 cycles 371797 cycles 1.00
ML-DSA-87 keypair 607341 cycles 604584 cycles 1.00
ML-DSA-87 sign 1596680 cycles 1595561 cycles 1.00
ML-DSA-87 verify 619175 cycles 618927 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton2 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 verify 241351 cycles 234237 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 3, 2026

CBMC Results (ML-DSA-44)

Full Results (194 proofs)
Proof Status Current Previous Change
**TOTAL** 1724s 1629s +5.8%
polyvecl_pointwise_acc_montgomery_c 256s 245s +4%
rej_uniform_native 128s 120s +7%
mld_invntt_layer 99s 88s +12%
poly_pointwise_montgomery_c 97s 93s +4%
mld_ct_memcmp 75s 73s +3%
mld_attempt_signature_generation 44s 45s -2%
mld_ntt_layer 42s 42s +0%
sign_verify_internal 33s 28s +18%
rej_uniform_native_x86_64 31s - new
fqmul 30s 28s +7%
polyvec_matrix_expand 30s 28s +7%
sign_signature_internal 30s 28s +7%
keccakf1600x4_permute_native 24s 23s +4%
rej_uniform 19s 16s +19%
rej_uniform_c 19s 18s +6%
polyvecl_chknorm 17s 18s -6%
mld_ntt_butterfly_block 16s 15s +7%
poly_chknorm_c 15s 17s -12%
polyeta_unpack 15s 13s +15%
polyt0_unpack 15s 15s +0%
compute_pack_t0_t1 14s 13s +8%
mld_check_pct 14s 15s -7%
poly_uniform_4x 14s 11s +27%
polyvec_matrix_pointwise_montgomery_yvec 14s 15s -7%
polyz_unpack_c 13s 11s +18%
poly_add 12s 10s +20%
poly_uniform_eta_4x 12s 14s -14%
keccak_absorb_once_x4 10s 9s +11%
mld_compute_pack_z 10s 10s +0%
poly_invntt_tomont_c 10s 9s +11%
poly_power2round 9s 9s +0%
polyvec_matrix_expand_serial 9s 9s +0%
sign 9s 7s +29%
poly_decompose_c 8s 7s +14%
polyveck_decompose 8s 8s +0%
pointwise_acc_native_x86_64 7s 5s +40%
polyveck_invntt_tomont 7s 3s +133%
sign_verify_extmu 7s 6s +17%
keccakf1600_permute_native 6s 8s -25%
pointwise_acc_native_aarch64 6s 4s +50%
polyt0_pack 6s 7s -14%
sign_keypair_internal 6s 3s +100%
sign_signature_pre_hash_internal 6s 4s +50%
keccak_absorb 5s 6s -17%
keccakf1600_extract_bytes (big endian) 5s 1s +400%
keccakf1600_permute 5s 8s -38%
mld_ct_get_optblocker_i64 5s 2s +150%
mld_prepare_domain_separation_prefix 5s 6s -17%
ntt_native_aarch64 5s 3s +67%
pack_sig_c 5s 2s +150%
poly_challenge 5s 4s +25%
poly_invntt_tomont_native 5s 4s +25%
poly_permute_bitrev_to_custom_optional_native 5s 1s +400%
poly_use_hint_native_aarch64 5s 3s +67%
polyt1_pack 5s 2s +150%
polyveck_pack_eta 5s 3s +67%
polyvecl_pack_eta 5s 2s +150%
polyvecl_pointwise_acc_montgomery_native 5s 6s -17%
polyvecl_uniform_gamma1 5s 3s +67%
polyvecl_uniform_gamma1_serial 5s 3s +67%
shake256_squeeze 5s 2s +150%
sign_open 5s 3s +67%
sign_verify 5s 3s +67%
unpack_sk_t0hat 5s 4s +25%
decompose 4s 3s +33%
intt_native_x86_64 4s 3s +33%
keccak_f1600_x4_native_aarch64_v84a 4s 3s +33%
keccakf1600_xor_bytes (big endian) 4s 3s +33%
keccakf1600x4_xor_bytes 4s 2s +100%
mld_ct_cmask_neg_i32 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 4s +0%
mld_h 4s 4s +0%
mld_polymat_expand_entry 4s 2s +100%
mld_sample_s1_s2 4s 4s +0%
mld_value_barrier_u8 4s 3s +33%
pack_sk_rho_key_tr_s2 4s 3s +33%
poly_caddq_c 4s 2s +100%
poly_decompose_native 4s 7s -43%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_pointwise_montgomery 4s 2s +100%
poly_pointwise_montgomery_native 4s 1s +300%
poly_shiftl 4s 3s +33%
poly_sub 4s 3s +33%
poly_uniform 4s 4s +0%
poly_uniform_gamma1_4x 4s 4s +0%
poly_use_hint_c 4s 5s -20%
poly_use_hint_native 4s 2s +100%
polyt1_unpack 4s 2s +100%
polyveck_chknorm 4s 5s -20%
polyveck_ntt 4s 5s -20%
polyvecl_unpack_eta 4s 2s +100%
polyz_pack 4s 3s +33%
rej_eta 4s 4s +0%
rej_eta_c 4s 4s +0%
rej_eta_native 4s 5s -20%
sign_verify_pre_hash_internal 4s 4s +0%
sk_s1hat_get_poly 4s 6s -33%
yvec_init 4s 2s +100%
caddq 3s 3s +0%
intt_native_aarch64 3s 5s -40%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccak_finalize 3s 3s +0%
keccak_squeezeblocks_x4 3s 4s -25%
make_hint 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 2s +50%
mld_ct_get_optblocker_u8 3s 2s +50%
mld_ct_sel_int32 3s 2s +50%
montgomery_reduce 3s 2s +50%
ntt_native_x86_64 3s 3s +0%
nttunpack_native_x86_64 3s 3s +0%
pack_sig_h 3s 3s +0%
pack_sig_z 3s 2s +50%
pack_sk_s1 3s 4s -25%
pointwise_native_aarch64 3s 1s +200%
poly_caddq_native 3s 7s -57%
poly_caddq_native_aarch64 3s 3s +0%
poly_chknorm_native 3s 3s +0%
poly_chknorm_native_aarch64 3s 4s -25%
poly_decompose_32_native_aarch64 3s 3s +0%
poly_ntt 3s 4s -25%
poly_ntt_native 3s 3s +0%
poly_uniform_eta 3s 3s +0%
polyeta_pack 3s 1s +200%
polyvec_matrix_pointwise_montgomery_row 3s 2s +50%
polyveck_caddq 3s 4s -25%
polyveck_pack_w1 3s 4s -25%
polyvecl_ntt 3s 7s -57%
polyw1_pack 3s 4s -25%
polyz_unpack 3s 2s +50%
polyz_unpack_17_native_aarch64 3s 3s +0%
polyz_unpack_native 3s 2s +50%
shake128_absorb 3s 3s +0%
shake128_squeeze 3s 1s +200%
shake256_init 3s 3s +0%
sig_unpack_hints 3s 3s +0%
sign_keypair 3s 3s +0%
sign_pk_from_sk 3s 6s -50%
sign_signature 3s 3s +0%
sign_verify_pre_hash_shake256 3s 2s +50%
sk_s2hat_get_poly 3s 3s +0%
sys_check_capability 3s 3s +0%
unpack_sk_s1hat 3s 3s +0%
yvec_get_poly 3s 4s -25%
fqscale 2s 2s +0%
keccak_f1600_x1_native_aarch64 2s 4s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_squeeze 2s 2s +0%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600x4_extract_bytes 2s 3s -33%
keccakf1600x4_permute 2s 2s +0%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 3s -33%
mld_sample_s1_s2_serial 2s 2s +0%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 2s +0%
pointwise_native_x86_64 2s 3s -33%
poly_caddq 2s 1s +100%
poly_chknorm 2s 3s -33%
poly_decompose 2s 4s -50%
poly_decompose_88_native_aarch64 2s 5s -60%
poly_invntt_tomont 2s 4s -50%
poly_reduce 2s 3s -33%
poly_uniform_gamma1 2s 4s -50%
poly_use_hint 2s 3s -33%
polyveck_reduce 2s 2s +0%
polyveck_unpack_eta 2s 3s -33%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyvecl_unpack_z 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 4s -50%
power2round 2s 3s -33%
reduce32 2s 2s +0%
shake128_init 2s 3s -33%
shake128_release 2s 2s +0%
shake128x4_absorb_once 2s 5s -60%
shake256 2s 4s -50%
shake256_absorb 2s 4s -50%
shake256_finalize 2s 1s +100%
shake256_release 2s 2s +0%
shake256x4_absorb_once 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
sign_signature_extmu 2s 4s -50%
sign_signature_pre_hash_shake256 2s 5s -60%
sk_t0hat_get_poly 2s 3s -33%
unpack_sk 2s 4s -50%
unpack_sk_s2hat 2s 4s -50%
use_hint 2s 3s -33%
keccak_f1600_x1_native_aarch64_v84a 1s 2s -50%
keccak_init 1s 3s -67%
mld_ct_abs_i32 1s 3s -67%
poly_ntt_c 1s 5s -80%
shake128_finalize 1s 3s -67%
shake128x4_squeezeblocks 1s 4s -75%
unpack_pk_t1 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 3, 2026

CBMC Results (ML-DSA-87)

Full Results (194 proofs)
Proof Status Current Previous Change
**TOTAL** 2140s 2059s +3.9%
polyvecl_pointwise_acc_montgomery_c 326s 289s +13%
polyvec_matrix_expand 196s 184s +7%
rej_uniform_native 132s 127s +4%
poly_pointwise_montgomery_c 106s 104s +2%
mld_invntt_layer 99s 98s +1%
mld_ct_memcmp 86s 83s +4%
sign_verify_internal 61s 63s -3%
mld_attempt_signature_generation 58s 62s -6%
sign_signature_internal 56s 57s -2%
mld_ntt_layer 48s 46s +4%
polyvec_matrix_expand_serial 40s 40s +0%
rej_uniform_native_x86_64 31s - new
fqmul 30s 29s +3%
compute_pack_t0_t1 29s 28s +4%
keccakf1600x4_permute_native 23s 23s +0%
polyvec_matrix_pointwise_montgomery_yvec 23s 23s +0%
rej_uniform_c 19s 18s +6%
mld_check_pct 17s 17s +0%
rej_uniform 17s 16s +6%
polyt0_unpack 16s 16s +0%
mld_ntt_butterfly_block 15s 18s -17%
poly_chknorm_c 15s 15s +0%
poly_uniform_eta_4x 14s 13s +8%
poly_add 12s 12s +0%
poly_invntt_tomont_c 12s 9s +33%
poly_uniform_4x 12s 11s +9%
polyveck_decompose 12s 11s +9%
polyeta_unpack 11s 10s +10%
polyveck_ntt 11s 10s +10%
keccak_absorb_once_x4 10s 9s +11%
polyveck_invntt_tomont 10s 9s +11%
pointwise_acc_native_x86_64 9s 6s +50%
sign 9s 7s +29%
keccakf1600_permute_native 8s 7s +14%
mld_compute_pack_z 8s 9s -11%
poly_power2round 8s 8s +0%
polyveck_caddq 8s 7s +14%
polyz_unpack_c 8s 7s +14%
keccakf1600_permute 7s 7s +0%
pointwise_acc_native_aarch64 7s 7s +0%
sign_pk_from_sk 7s 6s +17%
sign_verify 7s 5s +40%
keccak_absorb 6s 6s +0%
keccak_squeezeblocks_x4 6s 5s +20%
mld_sample_s1_s2 6s 6s +0%
poly_caddq 6s 2s +200%
poly_uniform 6s 2s +200%
polyvecl_chknorm 6s 4s +50%
polyvecl_ntt 6s 5s +20%
unpack_sk_t0hat 6s 5s +20%
keccak_finalize 5s 3s +67%
keccakf1600_extract_bytes (big endian) 5s 3s +67%
keccakf1600_xor_bytes 5s 4s +25%
mld_ct_abs_i32 5s 2s +150%
poly_caddq_c 5s 3s +67%
poly_caddq_native 5s 3s +67%
poly_chknorm_native 5s 2s +150%
poly_decompose_32_native_aarch64 5s 3s +67%
poly_invntt_tomont_native 5s 3s +67%
poly_shiftl 5s 5s +0%
poly_use_hint_c 5s 5s +0%
polyt0_pack 5s 5s +0%
polyvec_matrix_pointwise_montgomery_row 5s 2s +150%
polyveck_chknorm 5s 5s +0%
polyveck_pack_eta 5s 5s +0%
polyveck_unpack_eta 5s 2s +150%
polyvecl_uniform_gamma1 5s 4s +25%
polyz_unpack 5s 3s +67%
sign_signature 5s 6s -17%
sign_verify_extmu 5s 4s +25%
caddq 4s 2s +100%
intt_native_aarch64 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 4s 3s +33%
keccak_init 4s 2s +100%
keccakf1600x4_xor_bytes 4s 3s +33%
mld_ct_get_optblocker_i64 4s 1s +300%
mld_prepare_domain_separation_prefix 4s 6s -33%
mld_sample_s1_s2_serial 4s 7s -43%
montgomery_reduce 4s 4s +0%
ntt_native_x86_64 4s 3s +33%
nttunpack_native_x86_64 4s 3s +33%
pack_sig_z 4s 3s +33%
pointwise_native_x86_64 4s 1s +300%
poly_challenge 4s 6s -33%
poly_chknorm_native_aarch64 4s 3s +33%
poly_decompose_88_native_aarch64 4s 3s +33%
poly_decompose_c 4s 3s +33%
poly_decompose_native 4s 4s +0%
poly_ntt_native 4s 4s +0%
poly_pointwise_montgomery_native 4s 2s +100%
poly_sub 4s 5s -20%
poly_uniform_eta 4s 7s -43%
poly_use_hint_native 4s 5s -20%
poly_use_hint_native_aarch64 4s 2s +100%
polyveck_pack_w1 4s 3s +33%
polyvecl_unpack_eta 4s 4s +0%
polyz_unpack_native 4s 4s +0%
rej_eta_c 4s 5s -20%
rej_eta_native 4s 5s -20%
shake128_absorb 4s 1s +300%
shake256 4s 2s +100%
shake256_init 4s 2s +100%
shake256_release 4s 2s +100%
shake256x4_absorb_once 4s 2s +100%
sign_keypair_internal 4s 11s -64%
sign_open 4s 3s +33%
sign_signature_extmu 4s 4s +0%
sign_signature_pre_hash_internal 4s 5s -20%
sign_signature_pre_hash_shake256 4s 4s +0%
decompose 3s 3s +0%
intt_native_x86_64 3s 4s -25%
keccak_f1600_x1_native_aarch64_v84a 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 2s +50%
keccakf1600_xor_bytes (big endian) 3s 3s +0%
keccakf1600x4_permute 3s 2s +50%
mld_ct_cmask_neg_i32 3s 4s -25%
mld_ct_cmask_nonzero_u32 3s 4s -25%
mld_ct_get_optblocker_u8 3s 3s +0%
mld_h 3s 4s -25%
mld_keccakf1600_extract_bytes 3s 1s +200%
mld_polymat_expand_entry 3s 4s -25%
pack_sig_c 3s 3s +0%
pack_sk_rho_key_tr_s2 3s 3s +0%
pack_sk_s1 3s 4s -25%
pointwise_native_aarch64 3s 3s +0%
poly_decompose 3s 2s +50%
poly_ntt 3s 5s -40%
poly_reduce 3s 3s +0%
poly_uniform_gamma1 3s 4s -25%
poly_uniform_gamma1_4x 3s 5s -40%
polyeta_pack 3s 4s -25%
polyt1_pack 3s 6s -50%
polyt1_unpack 3s 4s -25%
polyvecl_pointwise_acc_montgomery_native 3s 5s -40%
polyw1_pack 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 4s -25%
reduce32 3s 3s +0%
rej_eta 3s 3s +0%
shake128_finalize 3s 3s +0%
sig_unpack_hints 3s 3s +0%
sign_keypair 3s 9s -67%
sk_s1hat_get_poly 3s 2s +50%
sk_s2hat_get_poly 3s 4s -25%
sk_t0hat_get_poly 3s 2s +50%
unpack_pk_t1 3s 3s +0%
unpack_sk 3s 5s -40%
unpack_sk_s2hat 3s 3s +0%
use_hint 3s 4s -25%
yvec_get_poly 3s 3s +0%
yvec_init 3s 4s -25%
fqscale 2s 2s +0%
keccak_squeeze 2s 2s +0%
keccakf1600x4_extract_bytes 2s 4s -50%
make_hint 2s 4s -50%
mld_ct_cmask_nonzero_u8 2s 3s -33%
mld_ct_get_optblocker_u32 2s 1s +100%
mld_value_barrier_i64 2s 3s -33%
mld_value_barrier_u32 2s 1s +100%
mld_value_barrier_u8 2s 4s -50%
ntt_native_aarch64 2s 5s -60%
pack_sig_h 2s 4s -50%
poly_caddq_native_aarch64 2s 6s -67%
poly_invntt_tomont 2s 6s -67%
poly_ntt_c 2s 3s -33%
poly_permute_bitrev_to_custom_optional 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 3s -33%
poly_use_hint 2s 3s -33%
polyveck_reduce 2s 2s +0%
polyvecl_pack_eta 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 4s -50%
polyvecl_uniform_gamma1_serial 2s 2s +0%
polyvecl_unpack_z 2s 1s +100%
polyz_pack 2s 5s -60%
polyz_unpack_17_native_aarch64 2s 3s -33%
shake128_init 2s 1s +100%
shake128x4_absorb_once 2s 3s -33%
shake256_finalize 2s 3s -33%
shake256_squeeze 2s 2s +0%
shake256x4_squeezeblocks 2s 3s -33%
sign_verify_pre_hash_internal 2s 3s -33%
sign_verify_pre_hash_shake256 2s 8s -75%
sys_check_capability 2s 5s -60%
unpack_sk_s1hat 2s 3s -33%
keccak_f1600_x1_native_aarch64 1s 3s -67%
keccak_f1600_x4_native_avx2 1s 3s -67%
mld_ct_sel_int32 1s 2s -50%
poly_chknorm 1s 4s -75%
poly_pointwise_montgomery 1s 1s +0%
power2round 1s 2s -50%
shake128_release 1s 2s -50%
shake128_squeeze 1s 2s -50%
shake128x4_squeezeblocks 1s 5s -80%
shake256_absorb 1s 4s -75%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented Apr 3, 2026

CBMC Results (ML-DSA-65)

Full Results (194 proofs)
Proof Status Current Previous Change
**TOTAL** 1800s 2033s -11.5%
polyvecl_pointwise_acc_montgomery_c 257s 332s -23%
polyvec_matrix_expand 142s 155s -8%
rej_uniform_native 120s 134s -10%
mld_invntt_layer 92s 100s -8%
poly_pointwise_montgomery_c 89s 111s -20%
mld_ct_memcmp 73s 87s -16%
sign_verify_internal 53s 56s -5%
sign_signature_internal 47s 49s -4%
mld_ntt_layer 39s 47s -17%
mld_attempt_signature_generation 38s 41s -7%
fqmul 29s 30s -3%
rej_uniform_native_x86_64 28s - new
polyvec_matrix_pointwise_montgomery_yvec 27s 30s -10%
keccakf1600x4_permute_native 23s 25s -8%
polyvec_matrix_expand_serial 23s 25s -8%
polyt0_unpack 16s 16s +0%
rej_uniform_c 16s 19s -16%
mld_ntt_butterfly_block 15s 17s -12%
poly_chknorm_c 15s 17s -12%
rej_uniform 15s 19s -21%
poly_uniform_eta_4x 13s 13s +0%
polyveck_decompose 13s 16s -19%
compute_pack_t0_t1 12s 17s -29%
poly_uniform_4x 12s 14s -14%
poly_add 11s 12s -8%
mld_check_pct 10s 12s -17%
sign 10s 8s +25%
keccak_absorb_once_x4 9s 11s -18%
keccakf1600_permute_native 8s 9s -11%
pointwise_acc_native_x86_64 8s 5s +60%
poly_power2round 8s 10s -20%
polyveck_caddq 8s 7s +14%
polyveck_chknorm 8s 9s -11%
polyvecl_ntt 8s 5s +60%
mld_compute_pack_z 7s 7s +0%
pointwise_acc_native_aarch64 7s 7s +0%
poly_invntt_tomont_c 7s 9s -22%
polyveck_ntt 7s 11s -36%
intt_native_aarch64 6s 2s +200%
keccak_absorb 6s 6s +0%
keccakf1600_permute 6s 9s -33%
poly_caddq_c 6s 5s +20%
poly_decompose_c 6s 9s -33%
polyveck_invntt_tomont 6s 8s -25%
sign_open 6s 3s +100%
sign_pk_from_sk 6s 5s +20%
keccak_squeezeblocks_x4 5s 5s +0%
mld_sample_s1_s2_serial 5s 5s +0%
mld_value_barrier_i64 5s 3s +67%
poly_challenge 5s 6s -17%
poly_shiftl 5s 3s +67%
poly_uniform_gamma1_4x 5s 4s +25%
sign_keypair_internal 5s 4s +25%
sign_verify 5s 4s +25%
yvec_init 5s 3s +67%
keccak_f1600_x1_native_aarch64 4s 2s +100%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 4s 2s +100%
keccak_squeeze 4s 4s +0%
mld_ct_get_optblocker_u32 4s 2s +100%
mld_ct_get_optblocker_u8 4s 6s -33%
mld_keccakf1600_extract_bytes 4s 2s +100%
mld_sample_s1_s2 4s 4s +0%
mld_value_barrier_u32 4s 2s +100%
mld_value_barrier_u8 4s 3s +33%
montgomery_reduce 4s 3s +33%
ntt_native_aarch64 4s 2s +100%
nttunpack_native_x86_64 4s 1s +300%
pack_sk_rho_key_tr_s2 4s 2s +100%
pack_sk_s1 4s 3s +33%
pointwise_native_aarch64 4s 3s +33%
pointwise_native_x86_64 4s 4s +0%
poly_caddq_native_aarch64 4s 3s +33%
poly_chknorm_native 4s 5s -20%
poly_decompose_88_native_aarch64 4s 4s +0%
poly_permute_bitrev_to_custom_optional 4s 1s +300%
poly_uniform 4s 5s -20%
poly_uniform_eta 4s 6s -33%
poly_use_hint_c 4s 2s +100%
poly_use_hint_native_aarch64 4s 3s +33%
polyeta_unpack 4s 3s +33%
polyvec_matrix_pointwise_montgomery_row 4s 4s +0%
polyvecl_chknorm 4s 4s +0%
polyz_unpack_17_native_aarch64 4s 5s -20%
polyz_unpack_c 4s 5s -20%
rej_eta_c 4s 5s -20%
shake128_finalize 4s 5s -20%
shake256_absorb 4s 3s +33%
shake256_init 4s 2s +100%
sign_keypair 4s 4s +0%
sign_signature_extmu 4s 4s +0%
sign_signature_pre_hash_internal 4s 3s +33%
sign_verify_pre_hash_shake256 4s 5s -20%
sk_t0hat_get_poly 4s 3s +33%
unpack_sk_t0hat 4s 4s +0%
use_hint 4s 3s +33%
caddq 3s 2s +50%
fqscale 3s 3s +0%
keccak_f1600_x4_native_aarch64_v84a 3s 3s +0%
keccak_init 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 4s -25%
mld_polymat_expand_entry 3s 2s +50%
mld_prepare_domain_separation_prefix 3s 4s -25%
pack_sig_c 3s 1s +200%
pack_sig_z 3s 3s +0%
poly_caddq 3s 4s -25%
poly_decompose 3s 4s -25%
poly_decompose_32_native_aarch64 3s 3s +0%
poly_invntt_tomont 3s 3s +0%
poly_ntt_c 3s 4s -25%
poly_permute_bitrev_to_custom_optional_native 3s 4s -25%
poly_pointwise_montgomery_native 3s 4s -25%
poly_reduce 3s 3s +0%
polyeta_pack 3s 4s -25%
polyt0_pack 3s 6s -50%
polyt1_unpack 3s 3s +0%
polyveck_pack_w1 3s 3s +0%
polyveck_reduce 3s 3s +0%
polyvecl_pack_eta 3s 5s -40%
polyvecl_pointwise_acc_montgomery 3s 3s +0%
polyvecl_pointwise_acc_montgomery_native 3s 3s +0%
polyvecl_uniform_gamma1 3s 5s -40%
polyvecl_unpack_eta 3s 6s -50%
polyvecl_unpack_z 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 2s +50%
reduce32 3s 3s +0%
rej_eta 3s 3s +0%
rej_eta_native 3s 4s -25%
shake128_absorb 3s 2s +50%
shake128_release 3s 2s +50%
shake128x4_squeezeblocks 3s 4s -25%
shake256x4_absorb_once 3s 3s +0%
sign_signature 3s 2s +50%
sign_verify_extmu 3s 3s +0%
sign_verify_pre_hash_internal 3s 4s -25%
sk_s1hat_get_poly 3s 5s -40%
sk_s2hat_get_poly 3s 5s -40%
unpack_pk_t1 3s 6s -50%
intt_native_x86_64 2s 4s -50%
keccak_f1600_x1_native_aarch64_v84a 2s 4s -50%
keccakf1600_xor_bytes 2s 2s +0%
keccakf1600_xor_bytes (big endian) 2s 5s -60%
keccakf1600x4_xor_bytes 2s 4s -50%
mld_ct_cmask_neg_i32 2s 4s -50%
mld_ct_cmask_nonzero_u8 2s 2s +0%
mld_ct_get_optblocker_i64 2s 4s -50%
mld_ct_sel_int32 2s 4s -50%
mld_h 2s 2s +0%
ntt_native_x86_64 2s 3s -33%
pack_sig_h 2s 2s +0%
poly_caddq_native 2s 3s -33%
poly_chknorm 2s 2s +0%
poly_chknorm_native_aarch64 2s 3s -33%
poly_decompose_native 2s 5s -60%
poly_ntt 2s 2s +0%
poly_pointwise_montgomery 2s 4s -50%
poly_sub 2s 3s -33%
poly_uniform_gamma1 2s 4s -50%
poly_use_hint 2s 4s -50%
poly_use_hint_native 2s 4s -50%
polyveck_pack_eta 2s 3s -33%
polyveck_unpack_eta 2s 5s -60%
polyvecl_uniform_gamma1_serial 2s 2s +0%
polyw1_pack 2s 3s -33%
polyz_pack 2s 3s -33%
polyz_unpack 2s 2s +0%
power2round 2s 3s -33%
shake128_init 2s 2s +0%
shake256 2s 3s -33%
shake256_finalize 2s 3s -33%
shake256_release 2s 3s -33%
shake256_squeeze 2s 1s +100%
shake256x4_squeezeblocks 2s 4s -50%
sig_unpack_hints 2s 2s +0%
sign_signature_pre_hash_shake256 2s 5s -60%
sys_check_capability 2s 2s +0%
unpack_sk 2s 5s -60%
unpack_sk_s1hat 2s 3s -33%
unpack_sk_s2hat 2s 2s +0%
decompose 1s 4s -75%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
keccak_f1600_x4_native_avx2 1s 3s -67%
keccak_finalize 1s 2s -50%
keccakf1600_extract_bytes (big endian) 1s 2s -50%
keccakf1600x4_extract_bytes 1s 3s -67%
keccakf1600x4_permute 1s 2s -50%
make_hint 1s 4s -75%
mld_ct_abs_i32 1s 1s +0%
poly_invntt_tomont_native 1s 2s -50%
poly_ntt_native 1s 5s -80%
polyt1_pack 1s 4s -75%
polyz_unpack_native 1s 4s -75%
shake128_squeeze 1s 3s -67%
shake128x4_absorb_once 1s 1s +0%
yvec_get_poly 1s 4s -75%

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 34764 cycles 34374 cycles 1.01
ML-DSA-44 sign 120113 cycles 120132 cycles 1.00
ML-DSA-44 verify 38092 cycles 38166 cycles 1.00
ML-DSA-65 keypair 61138 cycles 60500 cycles 1.01
ML-DSA-65 sign 201844 cycles 199945 cycles 1.01
ML-DSA-65 verify 62783 cycles 62429 cycles 1.01
ML-DSA-87 keypair 93501 cycles 94486 cycles 0.99
ML-DSA-87 sign 236815 cycles 239500 cycles 0.99
ML-DSA-87 verify 95619 cycles 96894 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Details
Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 93930 cycles 93842 cycles 1.00
ML-DSA-44 sign 333310 cycles 333119 cycles 1.00
ML-DSA-44 verify 100022 cycles 100025 cycles 1.00
ML-DSA-65 keypair 159902 cycles 160115 cycles 1.00
ML-DSA-65 sign 543114 cycles 543227 cycles 1.00
ML-DSA-65 verify 160989 cycles 161060 cycles 1.00
ML-DSA-87 keypair 266666 cycles 266874 cycles 1.00
ML-DSA-87 sign 704974 cycles 706010 cycles 1.00
ML-DSA-87 verify 270510 cycles 269779 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link
Copy Markdown
Contributor

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 6539a79 Previous: 9ee2f35 Ratio
ML-DSA-44 keypair 42142 cycles 40662 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

@jakemas jakemas force-pushed the jakemas/rej-uniform-asm branch 2 times, most recently from 7951c08 to 5607508 Compare May 5, 2026 22:56
@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 6, 2026

CBMC Results (ML-DSA-65, REDUCE-RAM)

Full Results (194 proofs)
Proof Status Current Previous Change
**TOTAL** 1534s 1504s +2.0%
poly_pointwise_montgomery_c 168s 165s +2%
polyvec_matrix_pointwise_montgomery_yvec 154s 149s +3%
rej_uniform_native 109s 105s +4%
mld_invntt_layer 105s 101s +4%
mld_ct_memcmp 74s 73s +1%
rej_uniform_native_x86_64 58s - new
mld_ntt_layer 43s 41s +5%
fqmul 28s 26s +8%
mld_attempt_signature_generation 23s 26s -12%
keccakf1600x4_permute_native 21s 23s -9%
rej_uniform 20s 20s +0%
polyvecl_chknorm 19s 20s -5%
rej_uniform_c 19s 17s +12%
sign_verify_internal 18s 17s +6%
mld_ntt_butterfly_block 17s 15s +13%
mld_check_pct 16s 14s +14%
poly_chknorm_c 14s 14s +0%
poly_add 12s 11s +9%
poly_uniform_eta_4x 12s 12s +0%
polyveck_decompose 12s 13s -8%
polyt0_unpack 11s 13s -15%
keccak_absorb_once_x4 10s 9s +11%
keccakf1600_permute_native 9s 7s +29%
poly_caddq_c 9s 8s +12%
poly_invntt_tomont_c 9s 11s -18%
polyvec_matrix_pointwise_montgomery_row 9s 8s +12%
compute_pack_t0_t1 8s 10s -20%
polyveck_caddq 8s 7s +14%
keccakf1600_permute 7s 7s +0%
poly_power2round 7s 9s -22%
polyveck_reduce 7s 6s +17%
polyvecl_ntt 7s 11s -36%
sign 7s 8s -12%
sign_pk_from_sk 7s 6s +17%
sign_verify_extmu 7s 4s +75%
caddq 6s 3s +100%
keccak_absorb 6s 6s +0%
mld_compute_pack_z 6s 6s +0%
pointwise_acc_native_aarch64 6s 8s -25%
poly_shiftl 6s 6s +0%
poly_uniform 6s 4s +50%
polyveck_invntt_tomont 6s 6s +0%
polyz_unpack_c 6s 6s +0%
intt_native_aarch64 5s 3s +67%
keccak_finalize 5s 4s +25%
mld_sample_s1_s2_serial 5s 5s +0%
pack_sig_h 5s 3s +67%
pointwise_acc_native_x86_64 5s 6s -17%
poly_decompose 5s 3s +67%
poly_decompose_c 5s 3s +67%
polyvecl_pointwise_acc_montgomery 5s 4s +25%
keccak_f1600_x4_native_avx2 4s 6s -33%
keccak_squeezeblocks_x4 4s 4s +0%
make_hint 4s 4s +0%
ntt_native_x86_64 4s 5s -20%
poly_caddq_native_aarch64 4s 3s +33%
poly_challenge 4s 6s -33%
poly_permute_bitrev_to_custom_optional 4s 3s +33%
poly_uniform_gamma1 4s 3s +33%
poly_uniform_gamma1_4x 4s 2s +100%
poly_use_hint_c 4s 2s +100%
polyt1_unpack 4s 4s +0%
polyvecl_pointwise_acc_montgomery_native 4s 2s +100%
polyz_unpack 4s 4s +0%
rej_eta_c 4s 3s +33%
rej_eta_native 4s 4s +0%
shake128_absorb 4s 2s +100%
shake256 4s 2s +100%
shake256x4_absorb_once 4s 4s +0%
sign_keypair_internal 4s 3s +33%
sign_open 4s 5s -20%
sign_signature 4s 5s -20%
sign_verify 4s 5s -20%
sk_s2hat_get_poly 4s 3s +33%
unpack_sk 4s 2s +100%
use_hint 4s 5s -20%
fqscale 3s 6s -50%
keccak_squeeze 3s 3s +0%
keccakf1600_extract_bytes (big endian) 3s 4s -25%
keccakf1600_xor_bytes 3s 2s +50%
keccakf1600x4_permute 3s 2s +50%
keccakf1600x4_xor_bytes 3s 2s +50%
mld_ct_cmask_nonzero_u32 3s 1s +200%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 2s +50%
mld_sample_s1_s2 3s 4s -25%
mld_value_barrier_i64 3s 2s +50%
pack_sig_c 3s 4s -25%
pack_sig_z 3s 2s +50%
pack_sk_rho_key_tr_s2 3s 4s -25%
pointwise_native_aarch64 3s 5s -40%
pointwise_native_x86_64 3s 3s +0%
poly_caddq 3s 4s -25%
poly_chknorm_native_aarch64 3s 2s +50%
poly_decompose_32_native_aarch64 3s 4s -25%
poly_invntt_tomont 3s 4s -25%
poly_ntt_native 3s 2s +50%
poly_permute_bitrev_to_custom_optional_native 3s 2s +50%
poly_uniform_eta 3s 2s +50%
poly_use_hint_native 3s 4s -25%
poly_use_hint_native_aarch64 3s 5s -40%
polyeta_pack 3s 2s +50%
polyeta_unpack 3s 4s -25%
polyt0_pack 3s 2s +50%
polyt1_pack 3s 3s +0%
polyvec_matrix_expand_serial 3s 4s -25%
polyveck_chknorm 3s 3s +0%
polyveck_ntt 3s 2s +50%
polyveck_pack_eta 3s 2s +50%
polyvecl_pack_eta 3s 4s -25%
polyvecl_pointwise_acc_montgomery_c 3s 3s +0%
polyvecl_uniform_gamma1_serial 3s 2s +50%
polyvecl_unpack_eta 3s 3s +0%
reduce32 3s 1s +200%
shake128_init 3s 3s +0%
shake128x4_absorb_once 3s 2s +50%
shake256_release 3s 2s +50%
shake256_squeeze 3s 2s +50%
shake256x4_squeezeblocks 3s 2s +50%
sign_keypair 3s 4s -25%
sign_signature_extmu 3s 3s +0%
sign_signature_internal 3s 4s -25%
sign_signature_pre_hash_internal 3s 4s -25%
sign_signature_pre_hash_shake256 3s 3s +0%
sign_verify_pre_hash_internal 3s 5s -40%
sign_verify_pre_hash_shake256 3s 5s -40%
sk_s1hat_get_poly 3s 3s +0%
unpack_sk_s1hat 3s 2s +50%
unpack_sk_s2hat 3s 3s +0%
unpack_sk_t0hat 3s 2s +50%
decompose 2s 4s -50%
intt_native_x86_64 2s 3s -33%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x1_native_aarch64_v84a 2s 1s +100%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 2s 2s +0%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_init 2s 3s -33%
keccakf1600_xor_bytes (big endian) 2s 1s +100%
keccakf1600x4_extract_bytes 2s 2s +0%
mld_ct_cmask_neg_i32 2s 3s -33%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u32 2s 2s +0%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_h 2s 5s -60%
mld_value_barrier_u32 2s 3s -33%
mld_value_barrier_u8 2s 3s -33%
montgomery_reduce 2s 4s -50%
ntt_native_aarch64 2s 3s -33%
pack_sk_s1 2s 2s +0%
poly_caddq_native 2s 3s -33%
poly_chknorm 2s 2s +0%
poly_chknorm_native 2s 3s -33%
poly_decompose_88_native_aarch64 2s 6s -67%
poly_decompose_native 2s 3s -33%
poly_invntt_tomont_native 2s 3s -33%
poly_ntt 2s 3s -33%
poly_ntt_c 2s 1s +100%
poly_pointwise_montgomery 2s 3s -33%
poly_pointwise_montgomery_native 2s 3s -33%
poly_reduce 2s 4s -50%
poly_sub 2s 4s -50%
poly_uniform_4x 2s 3s -33%
poly_use_hint 2s 4s -50%
polyvec_matrix_expand 2s 3s -33%
polyveck_pack_w1 2s 7s -71%
polyveck_unpack_eta 2s 3s -33%
polyvecl_uniform_gamma1 2s 2s +0%
polyvecl_unpack_z 2s 4s -50%
polyw1_pack 2s 4s -50%
polyz_pack 2s 4s -50%
polyz_unpack_17_native_aarch64 2s 3s -33%
polyz_unpack_19_native_aarch64 2s 2s +0%
polyz_unpack_native 2s 4s -50%
power2round 2s 3s -33%
shake128_finalize 2s 2s +0%
shake128_squeeze 2s 2s +0%
shake128x4_squeezeblocks 2s 4s -50%
shake256_absorb 2s 1s +100%
shake256_finalize 2s 4s -50%
shake256_init 2s 4s -50%
sig_unpack_hints 2s 3s -33%
sys_check_capability 2s 3s -33%
unpack_pk_t1 2s 3s -33%
yvec_get_poly 2s 2s +0%
yvec_init 2s 2s +0%
mld_ct_abs_i32 1s 5s -80%
mld_ct_sel_int32 1s 1s +0%
mld_keccakf1600_extract_bytes 1s 2s -50%
mld_polymat_expand_entry 1s 2s -50%
nttunpack_native_x86_64 1s 4s -75%
rej_eta 1s 5s -80%
shake128_release 1s 3s -67%
sk_t0hat_get_poly 1s 3s -67%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 6, 2026

CBMC Results (ML-DSA-87, REDUCE-RAM)

Full Results (194 proofs)
Proof Status Current Previous Change
**TOTAL** 1544s 1481s +4.3%
poly_pointwise_montgomery_c 166s 158s +5%
polyvec_matrix_pointwise_montgomery_yvec 127s 119s +7%
rej_uniform_native 105s 104s +1%
mld_invntt_layer 100s 100s +0%
mld_ct_memcmp 67s 71s -6%
rej_uniform_native_x86_64 57s - new
mld_ntt_layer 42s 40s +5%
sign_verify_internal 40s 41s -2%
fqmul 28s 29s -3%
mld_attempt_signature_generation 27s 26s +4%
keccakf1600x4_permute_native 22s 23s -4%
rej_uniform 20s 21s -5%
rej_uniform_c 18s 18s +0%
polyeta_unpack 17s 16s +6%
mld_ntt_butterfly_block 16s 15s +7%
polyveck_decompose 16s 14s +14%
mld_check_pct 14s 16s -12%
poly_add 13s 10s +30%
poly_chknorm_c 13s 14s -7%
polyt0_unpack 11s 11s +0%
keccak_absorb_once_x4 10s 9s +11%
poly_uniform_eta_4x 10s 14s -29%
poly_caddq_c 9s 7s +29%
polyvec_matrix_pointwise_montgomery_row 9s 8s +12%
sign_pk_from_sk 9s 4s +125%
poly_invntt_tomont_c 8s 8s +0%
polyveck_invntt_tomont 8s 5s +60%
sign 8s 8s +0%
compute_pack_t0_t1 7s 8s -12%
keccak_absorb 7s 8s -12%
keccakf1600_permute_native 7s 6s +17%
mld_sample_s1_s2 7s 7s +0%
pointwise_acc_native_aarch64 7s 7s +0%
pointwise_acc_native_x86_64 7s 10s -30%
poly_power2round 7s 6s +17%
polyveck_caddq 7s 6s +17%
polyz_unpack_c 7s 8s -12%
rej_eta_native 7s 4s +75%
keccakf1600_permute 6s 6s +0%
mld_compute_pack_z 6s 4s +50%
mld_sample_s1_s2_serial 6s 5s +20%
ntt_native_x86_64 6s 3s +100%
poly_shiftl 6s 5s +20%
polyvecl_chknorm 6s 4s +50%
polyvecl_ntt 6s 8s -25%
sign_keypair 6s 5s +20%
sign_open 6s 5s +20%
sign_signature_pre_hash_internal 6s 3s +100%
pointwise_native_x86_64 5s 4s +25%
poly_caddq 5s 4s +25%
poly_use_hint_native 5s 3s +67%
polyeta_pack 5s 2s +150%
polyt0_pack 5s 3s +67%
polyveck_reduce 5s 6s -17%
shake128x4_absorb_once 5s 4s +25%
sign_signature_pre_hash_shake256 5s 4s +25%
keccak_f1600_x1_native_aarch64_v84a 4s 4s +0%
keccak_squeeze 4s 2s +100%
keccakf1600x4_xor_bytes 4s 2s +100%
make_hint 4s 2s +100%
ntt_native_aarch64 4s 2s +100%
pack_sk_s1 4s 5s -20%
pointwise_native_aarch64 4s 3s +33%
poly_caddq_native_aarch64 4s 5s -20%
poly_challenge 4s 5s -20%
poly_decompose_32_native_aarch64 4s 1s +300%
poly_invntt_tomont 4s 2s +100%
poly_ntt_c 4s 3s +33%
poly_pointwise_montgomery_native 4s 6s -33%
poly_sub 4s 4s +0%
poly_uniform_4x 4s 2s +100%
poly_uniform_eta 4s 4s +0%
polyt1_pack 4s 5s -20%
polyt1_unpack 4s 6s -33%
polyveck_chknorm 4s 4s +0%
polyveck_pack_eta 4s 3s +33%
polyvecl_uniform_gamma1_serial 4s 2s +100%
polyvecl_unpack_eta 4s 3s +33%
polyz_pack 4s 4s +0%
reduce32 4s 4s +0%
shake256_init 4s 4s +0%
sign_signature 4s 2s +100%
sign_signature_extmu 4s 3s +33%
sign_signature_internal 4s 6s -33%
sign_verify_extmu 4s 5s -20%
sk_s2hat_get_poly 4s 2s +100%
unpack_sk 4s 3s +33%
caddq 3s 3s +0%
intt_native_aarch64 3s 5s -40%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 3s 2s +50%
keccak_f1600_x4_native_avx2 3s 2s +50%
keccak_finalize 3s 2s +50%
keccak_squeezeblocks_x4 3s 5s -40%
keccakf1600_extract_bytes (big endian) 3s 2s +50%
keccakf1600_xor_bytes (big endian) 3s 2s +50%
mld_ct_cmask_nonzero_u8 3s 3s +0%
mld_h 3s 3s +0%
mld_prepare_domain_separation_prefix 3s 3s +0%
montgomery_reduce 3s 3s +0%
nttunpack_native_x86_64 3s 2s +50%
pack_sig_c 3s 2s +50%
pack_sig_h 3s 2s +50%
pack_sig_z 3s 4s -25%
pack_sk_rho_key_tr_s2 3s 2s +50%
poly_caddq_native 3s 4s -25%
poly_chknorm 3s 2s +50%
poly_chknorm_native 3s 4s -25%
poly_chknorm_native_aarch64 3s 3s +0%
poly_decompose 3s 3s +0%
poly_decompose_c 3s 7s -57%
poly_invntt_tomont_native 3s 3s +0%
poly_pointwise_montgomery 3s 3s +0%
poly_reduce 3s 5s -40%
poly_uniform 3s 5s -40%
poly_uniform_gamma1 3s 4s -25%
poly_uniform_gamma1_4x 3s 4s -25%
poly_use_hint 3s 2s +50%
poly_use_hint_native_aarch64 3s 4s -25%
polyvec_matrix_expand 3s 2s +50%
polyveck_ntt 3s 3s +0%
polyvecl_unpack_z 3s 1s +200%
polyz_unpack 3s 3s +0%
polyz_unpack_19_native_aarch64 3s 3s +0%
power2round 3s 2s +50%
shake128_init 3s 3s +0%
shake128_squeeze 3s 2s +50%
shake256 3s 2s +50%
shake256_absorb 3s 2s +50%
shake256_release 3s 4s -25%
shake256x4_absorb_once 3s 4s -25%
shake256x4_squeezeblocks 3s 3s +0%
sig_unpack_hints 3s 2s +50%
sign_keypair_internal 3s 6s -50%
sign_verify_pre_hash_internal 3s 3s +0%
sign_verify_pre_hash_shake256 3s 4s -25%
sk_s1hat_get_poly 3s 3s +0%
sys_check_capability 3s 4s -25%
unpack_pk_t1 3s 3s +0%
unpack_sk_s1hat 3s 4s -25%
unpack_sk_s2hat 3s 2s +50%
use_hint 3s 2s +50%
yvec_get_poly 3s 3s +0%
decompose 2s 4s -50%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 2s 2s +0%
keccak_init 2s 3s -33%
keccakf1600_xor_bytes 2s 3s -33%
keccakf1600x4_permute 2s 3s -33%
mld_ct_abs_i32 2s 2s +0%
mld_ct_cmask_neg_i32 2s 2s +0%
mld_ct_cmask_nonzero_u32 2s 1s +100%
mld_ct_get_optblocker_i64 2s 1s +100%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_ct_sel_int32 2s 2s +0%
mld_keccakf1600_extract_bytes 2s 4s -50%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u8 2s 3s -33%
poly_decompose_88_native_aarch64 2s 4s -50%
poly_decompose_native 2s 5s -60%
poly_ntt 2s 5s -60%
poly_ntt_native 2s 2s +0%
poly_permute_bitrev_to_custom_optional 2s 3s -33%
poly_permute_bitrev_to_custom_optional_native 2s 3s -33%
poly_use_hint_c 2s 3s -33%
polyvec_matrix_expand_serial 2s 4s -50%
polyveck_pack_w1 2s 3s -33%
polyveck_unpack_eta 2s 4s -50%
polyvecl_pack_eta 2s 4s -50%
polyvecl_pointwise_acc_montgomery 2s 3s -33%
polyvecl_pointwise_acc_montgomery_c 2s 2s +0%
polyvecl_pointwise_acc_montgomery_native 2s 2s +0%
polyvecl_uniform_gamma1 2s 3s -33%
polyw1_pack 2s 3s -33%
polyz_unpack_17_native_aarch64 2s 3s -33%
polyz_unpack_native 2s 4s -50%
rej_eta 2s 2s +0%
rej_eta_c 2s 5s -60%
shake128_absorb 2s 2s +0%
shake128_finalize 2s 2s +0%
shake128x4_squeezeblocks 2s 2s +0%
shake256_finalize 2s 3s -33%
sign_verify 2s 3s -33%
sk_t0hat_get_poly 2s 4s -50%
unpack_sk_t0hat 2s 5s -60%
fqscale 1s 3s -67%
intt_native_x86_64 1s 3s -67%
keccakf1600x4_extract_bytes 1s 2s -50%
mld_ct_get_optblocker_u32 1s 1s +0%
mld_polymat_expand_entry 1s 3s -67%
mld_value_barrier_u32 1s 2s -50%
shake128_release 1s 2s -50%
shake256_squeeze 1s 3s -67%
yvec_init 1s 2s -50%

@oqs-bot
Copy link
Copy Markdown
Contributor

oqs-bot commented May 6, 2026

CBMC Results (ML-DSA-44, REDUCE-RAM)

Full Results (194 proofs)
Proof Status Current Previous Change
**TOTAL** 1432s 1401s +2.2%
poly_pointwise_montgomery_c 168s 167s +1%
rej_uniform_native 101s 109s -7%
mld_invntt_layer 97s 105s -8%
polyvec_matrix_pointwise_montgomery_yvec 87s 88s -1%
mld_ct_memcmp 69s 74s -7%
rej_uniform_native_x86_64 54s - new
mld_ntt_layer 40s 42s -5%
fqmul 28s 29s -3%
mld_attempt_signature_generation 25s 25s +0%
keccakf1600x4_permute_native 22s 23s -4%
rej_uniform 20s 20s +0%
sign_verify_internal 20s 19s +5%
rej_uniform_c 18s 21s -14%
mld_ntt_butterfly_block 14s 16s -12%
poly_chknorm_c 14s 15s -7%
polyeta_unpack 14s 17s -18%
mld_check_pct 13s 10s +30%
polyz_unpack_c 13s 11s +18%
poly_uniform_eta_4x 12s 13s -8%
polyt0_unpack 12s 13s -8%
poly_add 11s 13s -15%
polyveck_chknorm 11s 11s +0%
compute_pack_t0_t1 10s 7s +43%
poly_caddq_c 9s 9s +0%
keccak_absorb 7s 6s +17%
keccak_absorb_once_x4 7s 10s -30%
poly_invntt_tomont_c 7s 10s -30%
poly_power2round 7s 6s +17%
polyvec_matrix_pointwise_montgomery_row 7s 8s -12%
sign 7s 8s -12%
keccakf1600_permute 6s 7s -14%
keccakf1600_permute_native 6s 7s -14%
keccakf1600_xor_bytes (big endian) 6s 4s +50%
mld_compute_pack_z 6s 6s +0%
mld_h 6s 3s +100%
poly_decompose_88_native_aarch64 6s 2s +200%
poly_decompose_c 6s 8s -25%
poly_shiftl 6s 5s +20%
polyveck_decompose 6s 5s +20%
polyveck_reduce 6s 4s +50%
sign_open 6s 5s +20%
sign_pk_from_sk 6s 6s +0%
sign_verify_pre_hash_internal 6s 4s +50%
pack_sk_rho_key_tr_s2 5s 2s +150%
pointwise_acc_native_aarch64 5s 4s +25%
pointwise_acc_native_x86_64 5s 6s -17%
poly_ntt 5s 1s +400%
poly_permute_bitrev_to_custom_optional 5s 3s +67%
poly_permute_bitrev_to_custom_optional_native 5s 4s +25%
poly_use_hint_native_aarch64 5s 4s +25%
polyvec_matrix_expand 5s 2s +150%
polyveck_unpack_eta 5s 2s +150%
rej_eta_c 5s 6s -17%
shake256x4_absorb_once 5s 2s +150%
sign_keypair 5s 4s +25%
sign_signature_pre_hash_shake256 5s 4s +25%
sign_verify_extmu 5s 3s +67%
unpack_sk_t0hat 5s 5s +0%
use_hint 5s 3s +67%
decompose 4s 1s +300%
intt_native_x86_64 4s 1s +300%
keccak_init 4s 3s +33%
keccak_squeezeblocks_x4 4s 3s +33%
mld_ct_cmask_nonzero_u8 4s 3s +33%
mld_prepare_domain_separation_prefix 4s 5s -20%
poly_caddq 4s 2s +100%
poly_sub 4s 5s -20%
poly_uniform 4s 2s +100%
polyt1_pack 4s 3s +33%
polyveck_caddq 4s 4s +0%
polyvecl_chknorm 4s 4s +0%
polyvecl_pointwise_acc_montgomery 4s 3s +33%
polyvecl_uniform_gamma1 4s 3s +33%
polyvecl_unpack_eta 4s 3s +33%
polyz_pack 4s 4s +0%
polyz_unpack 4s 4s +0%
rej_eta 4s 4s +0%
shake128x4_squeezeblocks 4s 2s +100%
sign_keypair_internal 4s 5s -20%
sign_signature_pre_hash_internal 4s 6s -33%
sign_verify 4s 5s -20%
sign_verify_pre_hash_shake256 4s 5s -20%
sk_t0hat_get_poly 4s 1s +300%
unpack_pk_t1 4s 3s +33%
unpack_sk_s1hat 4s 2s +100%
keccak_f1600_x1_native_aarch64_v84a 3s 1s +200%
make_hint 3s 2s +50%
mld_ct_cmask_neg_i32 3s 3s +0%
mld_ct_cmask_nonzero_u32 3s 3s +0%
mld_ct_get_optblocker_u32 3s 2s +50%
mld_keccakf1600_extract_bytes 3s 2s +50%
mld_polymat_expand_entry 3s 3s +0%
mld_sample_s1_s2 3s 3s +0%
mld_sample_s1_s2_serial 3s 3s +0%
ntt_native_aarch64 3s 5s -40%
nttunpack_native_x86_64 3s 3s +0%
pack_sig_z 3s 2s +50%
poly_invntt_tomont_native 3s 2s +50%
poly_ntt_c 3s 2s +50%
poly_pointwise_montgomery_native 3s 4s -25%
poly_uniform_4x 3s 3s +0%
poly_uniform_gamma1 3s 3s +0%
poly_use_hint_c 3s 3s +0%
polyt0_pack 3s 2s +50%
polyt1_unpack 3s 3s +0%
polyvec_matrix_expand_serial 3s 4s -25%
polyveck_invntt_tomont 3s 4s -25%
polyveck_pack_eta 3s 4s -25%
polyvecl_pack_eta 3s 2s +50%
polyvecl_pointwise_acc_montgomery_c 3s 2s +50%
polyw1_pack 3s 3s +0%
power2round 3s 2s +50%
rej_eta_native 3s 3s +0%
shake128_finalize 3s 2s +50%
shake128_squeeze 3s 2s +50%
shake256 3s 2s +50%
shake256_init 3s 2s +50%
shake256_release 3s 2s +50%
shake256_squeeze 3s 4s -25%
sig_unpack_hints 3s 4s -25%
sign_signature 3s 6s -50%
sign_signature_extmu 3s 4s -25%
sign_signature_internal 3s 5s -40%
unpack_sk_s2hat 3s 3s +0%
yvec_get_poly 3s 3s +0%
caddq 2s 3s -33%
fqscale 2s 3s -33%
intt_native_aarch64 2s 4s -50%
keccak_f1600_x1_native_aarch64 2s 2s +0%
keccak_f1600_x4_native_aarch64_v84a 2s 4s -50%
keccak_f1600_x4_native_avx2 2s 4s -50%
keccak_finalize 2s 1s +100%
keccak_squeeze 2s 3s -33%
keccakf1600_extract_bytes (big endian) 2s 2s +0%
keccakf1600_xor_bytes 2s 1s +100%
keccakf1600x4_xor_bytes 2s 1s +100%
mld_ct_abs_i32 2s 2s +0%
mld_ct_get_optblocker_i64 2s 2s +0%
mld_ct_get_optblocker_u8 2s 1s +100%
mld_ct_sel_int32 2s 2s +0%
mld_value_barrier_i64 2s 1s +100%
mld_value_barrier_u32 2s 3s -33%
mld_value_barrier_u8 2s 2s +0%
ntt_native_x86_64 2s 2s +0%
pack_sig_h 2s 3s -33%
pointwise_native_aarch64 2s 4s -50%
pointwise_native_x86_64 2s 4s -50%
poly_caddq_native 2s 3s -33%
poly_caddq_native_aarch64 2s 2s +0%
poly_challenge 2s 4s -50%
poly_chknorm 2s 2s +0%
poly_chknorm_native 2s 3s -33%
poly_chknorm_native_aarch64 2s 2s +0%
poly_decompose_32_native_aarch64 2s 1s +100%
poly_invntt_tomont 2s 4s -50%
poly_ntt_native 2s 4s -50%
poly_pointwise_montgomery 2s 3s -33%
poly_reduce 2s 3s -33%
poly_uniform_eta 2s 2s +0%
poly_uniform_gamma1_4x 2s 4s -50%
poly_use_hint 2s 3s -33%
poly_use_hint_native 2s 3s -33%
polyeta_pack 2s 2s +0%
polyveck_ntt 2s 2s +0%
polyveck_pack_w1 2s 2s +0%
polyvecl_ntt 2s 4s -50%
polyvecl_pointwise_acc_montgomery_native 2s 2s +0%
polyvecl_unpack_z 2s 2s +0%
polyz_unpack_17_native_aarch64 2s 1s +100%
polyz_unpack_19_native_aarch64 2s 3s -33%
polyz_unpack_native 2s 1s +100%
reduce32 2s 2s +0%
shake128_absorb 2s 3s -33%
shake128_init 2s 1s +100%
shake128_release 2s 4s -50%
shake128x4_absorb_once 2s 2s +0%
shake256_absorb 2s 2s +0%
shake256x4_squeezeblocks 2s 2s +0%
sk_s1hat_get_poly 2s 2s +0%
sk_s2hat_get_poly 2s 4s -50%
sys_check_capability 2s 3s -33%
yvec_init 2s 3s -33%
keccak_f1600_x4_native_aarch64_v8a_scalar_hybrid 1s 2s -50%
keccak_f1600_x4_native_aarch64_v8a_v84a_scalar_hybrid 1s 2s -50%
keccakf1600x4_extract_bytes 1s 4s -75%
keccakf1600x4_permute 1s 2s -50%
montgomery_reduce 1s 4s -75%
pack_sig_c 1s 2s -50%
pack_sk_s1 1s 2s -50%
poly_decompose 1s 1s +0%
poly_decompose_native 1s 3s -67%
polyvecl_uniform_gamma1_serial 1s 3s -67%
shake256_finalize 1s 1s +0%
unpack_sk 1s 2s -50%

@jakemas jakemas force-pushed the jakemas/rej-uniform-asm branch 4 times, most recently from 55f0028 to 991e2e9 Compare May 7, 2026 05:10
@jakemas jakemas marked this pull request as ready for review May 7, 2026 05:10
@mkannwischer
Copy link
Copy Markdown
Contributor

mkannwischer commented May 7, 2026

@jakemas, thanks for getting this into shape! How far are you with the proof of it? Would it be an option to merge it together with the correctness proof? Nevermind, this already has the proof. Sorry.

@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 7, 2026

@mkannwischer ok, ready for review. The instruction PR will need to land in s2n-bignum first, while we wait I'll try the constant time proof -- but I'm really happy to be able to PR the conversion with a hol-light proof. Let me know if anything is missing.

@jakemas
Copy link
Copy Markdown
Contributor Author

jakemas commented May 7, 2026

ahh just saw you comment! Yes got the proof in, runs in ~12min!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this name generic, as it could be shared between arm/x86 proof.

@mkannwischer
Copy link
Copy Markdown
Contributor

@mkannwischer ok, ready for review. The instruction PR will need to land in s2n-bignum first, while we wait I'll try the constant time proof -- but I'm really happy to be able to PR the conversion with a hol-light proof. Let me know if anything is missing.

Thanks! Yes, I agree that it's great to get the conversion and the proof in at the same time. Would be great if we can do the same for the remaining proofs.
I'll review later today.

Note that a constant-time proof is not needed for this function (all inputs are public), but we do want a memory safety proof like here: https://github.com/pq-code-package/mlkem-native/blob/2bf8e59f4330697b3924c572924136c96eb96960/proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml#L1562

Copy link
Copy Markdown
Contributor

@mkannwischer mkannwischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jakemas. Here is a first set of comments.

Comment thread dev/x86_64/src/arith_native_x86_64.h Outdated
__contract__(
requires(memory_no_alias(r, sizeof(int32_t) * MLDSA_N))
requires(memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN))
requires(memory_no_alias(table, 256 * sizeof(uint64_t)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the table actually needs to be == mld_rej_uniform_table

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — requires(table == (const uint8_t *)mld_rej_uniform_table) in both dev/ and mldsa/src/native/ copies.

Comment thread dev/x86_64/src/rej_uniform_avx2_asm.S Outdated
jmp rej_uniform_avx2_asm_scalar

rej_uniform_avx2_asm_done:
vzeroupper
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have vzeroupper in any other routine. I don't know enough about x86_64 to know how important it is, but we should either have it everywhere or nowhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — dropped vzeroupper from both dev/ and mldsa/src/native/ copies of the .S for consistency with the other ML-DSA routines. Proof adjusted accordingly.

Comment on lines -7 to +15
version = "f3c5acff6948d559194245237f6aaa7ebf7fcae8";
# Pinned to https://github.com/awslabs/s2n-bignum/pull/387 head,
# which adds VMOVMSKPS, VPMOVZXBD, and VZEROUPPER instruction models
# required by the x86_64 rej_uniform proof.
version = "4c4fe1dfc8b79720013517a7b4dec9014c85fcf2";
src = fetchFromGitHub {
owner = "awslabs";
repo = "s2n-bignum";
rev = "${version}";
hash = "sha256-kfc8X2e+voefttshSUdifDc3Qn+dx0Gq5ENNLhWIdw0=";
hash = "sha256-64MJOqoDunpn6fx1j9P4+fDoRNZ8GRTB/d4C2JWvxFA=";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a comment here to remind us that we still have to change this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still pinned to s2n-bignum PR #387 / #401 (the branch that carries VMOVMSKPS, VPMOVZXBD, VZEROUPPER + the mldsa_rej_uniform proof). Will update the pin once those are merged into s2n-bignum main.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be autogenerated via autogen

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added rej_uniform_avx2_asm.S to the x86_64 joblist in scripts/autogen (joblist_x86_64), so proofs/hol_light/x86_64/mldsa/rej_uniform_avx2_asm.S is now regenerated by scripts/autogen.

(* Lookup table for ML-DSA rejection uniform sampling. *)
(* Each entry is 8 bytes: permutation indices for VPERMD. *)

let mldsa_rej_uniform_table = (REWRITE_RULE[MAP] o define)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be autogenerated via autogen

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added gen_avx2_hol_light_rej_uniform_table to scripts/autogen, invoked from gen_zeta_tables. proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml is now regenerated alongside the C/aarch64 lookup tables (mirrors the mlkem-native pattern).

Comment on lines +697 to +699
let REJ_SAMPLE = define
`REJ_SAMPLE l = FILTER (\x:int32. val x < 8380417)
(MAP (\x:24 word. word(val x MOD 2 EXP 23):int32) l)`;;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec should be moved to mldsa_specs.ml so we can re-use it for the aarch64 proof.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — REJ_SAMPLE, REJ_SAMPLE_EMPTY, REJ_SAMPLE_APPEND now live in proofs/hol_light/common/mldsa_specs.ml (shared between arches, matching the shape used by s2n-bignum #378 for aarch64). The x86-only derived lemmas (REJ_SAMPLE_SPLIT, REJ_SAMPLE_PREFIX_256, REJ_SAMPLE_STEP_LE) stay in rej_uniform_avx2_asm.ml since they're only used by the AVX2 scalar-tail analysis.

(let outlist = SUB_LIST(0,256) (REJ_SAMPLE inlist) in
let outlen = LENGTH outlist in
C_RETURN s = word outlen /\
read(memory :> bytes(res,4 * outlen)) s =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an explicit bound post-condition that all coefficients < q here to match the CBMC spec.

Copy link
Copy Markdown
Contributor Author

@jakemas jakemas May 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think i got this, testing now added:

(!i. i < outlen
    ==> val(read(memory :> bytes32
                  (word_add res (word(4 * i)))) s) < 8380417)))

const uint8_t *table)
__contract__(
requires(memory_no_alias(r, sizeof(int32_t) * MLDSA_N))
requires(memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's change this to say 840 so it matches the HOL-light spec exactly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — requires(memory_no_alias(buf, 840)) (literal 840 matches the HOL-Light spec exactly), in both dev/ and mldsa/src/native/ copies.

unsigned mld_rej_uniform_avx2_asm(
int32_t *r, const uint8_t buf[MLD_AVX2_REJ_UNIFORM_BUFLEN],
const uint8_t *table)
__contract__(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the comment here that this needs to be kept in sync with the HOL-light spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added /* This contract must be kept in sync with the HOL-Light specification in proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml */ above the __contract__ block in both dev/ and mldsa/src/native/ copies.

MAYCHANGE [memory :> bytes(res,1024)])`,
X86_PROMOTE_RETURN_NOSTACK_TAC mldsa_rej_uniform_tmc
MLDSA_REJ_UNIFORM_CORRECT);;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here that this needs to be kept in sync wityh the CBMC spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — added a comment block above SUBROUTINE_CORRECT variants noting these specifications must be kept in sync with the CBMC contract in dev/x86_64/src/arith_native_x86_64.h / mldsa/src/native/x86_64/src/arith_native_x86_64.h.

jakemas added a commit that referenced this pull request May 8, 2026
Reviewer-requested cleanup for the x86_64 rej_uniform assembly and
HOL Light proof:

Contract tightening (dev and mldsa copies of arith_native_x86_64.h):
  - requires(memory_no_alias(buf, 840)) instead of
    memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN) so the literal
    matches the HOL Light spec exactly.
  - requires(table == (const uint8_t *)mld_rej_uniform_table) pinning
    the table to the exported rejection-sampling table, replacing the
    looser memory_no_alias(table, 256 * sizeof(uint64_t)).
  - Clarify sync comment.

vzeroupper removal: none of the other asm routines issue vzeroupper;
drop it from rej_uniform for consistency. This shifts the function
length by 3 bytes, so the HOL Light proof's nonoverlapping 246 / pc+245
references in mldsa_rej_uniform.ml become 243 / pc+242 accordingly, and
the two X86_STEPS_TAC invocations that stepped the vzeroupper byte are
removed. Bytecode regenerated via autogen --update-hol-light-bytecode.

Autogen plumbing: register rej_uniform_avx2_asm.S in the x86_64 HOL
Light asm joblist so the proofs/hol_light/x86_64/mldsa/ copy is
regenerated by scripts/autogen. Add gen_avx2_hol_light_rej_uniform_table
to regenerate proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml
alongside the C/aarch64 lookup tables (matches mlkem-native's pattern).

Cross-reference comment in proofs/hol_light/x86_64/proofs/
rej_uniform_avx2_asm.ml pointing at the CBMC contract.

Proof runtime: ~5-6 min in the CI native build.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the x86_64 AVX2 rej_uniform implementation (previously written
in C with intrinsics) with a hand-written assembly routine, and adds a
functional correctness proof in HOL Light on top of the s2n-bignum
infrastructure.

Highlights:
  - dev/x86_64/src/rej_uniform_avx2_asm.S and
    mldsa/src/native/x86_64/src/rej_uniform_avx2_asm.S: new .S file
    exposing mld_rej_uniform_avx2_asm (replaces the intrinsics-based
    rej_uniform_avx2.c).
  - proofs/hol_light/x86_64/mldsa/rej_uniform_avx2_asm.S and
    proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml: HOL Light
    proof of MLDSA_REJ_UNIFORM_{,NOIBT_}SUBROUTINE_CORRECT, with no
    remaining CHEATs.
  - proofs/cbmc/rej_uniform_native_x86_64/: CBMC contract proof (249/249
    passing).
  - CI: hol_light.yml and Makefile updated for the new bytecode dump and
    autogen instruction-decode format; s2n-bignum pin bumped to include
    the supporting tactics.

Naming follows the asm-suffix convention introduced on main
(eada109 / e810d00): symbol mld_rej_uniform_avx2_asm, label prefix
rej_uniform_avx2_asm_.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jakemas added a commit that referenced this pull request May 8, 2026
Reviewer-requested cleanup for the x86_64 rej_uniform assembly and
HOL Light proof:

Contract tightening (dev and mldsa copies of arith_native_x86_64.h):
  - requires(memory_no_alias(buf, 840)) instead of
    memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN) so the literal
    matches the HOL Light spec exactly.
  - requires(table == (const uint8_t *)mld_rej_uniform_table) pinning
    the table to the exported rejection-sampling table, replacing the
    looser memory_no_alias(table, 256 * sizeof(uint64_t)).
  - Clarify sync comment.

vzeroupper removal: none of the other asm routines issue vzeroupper;
drop it from rej_uniform for consistency. This shifts the function
length by 3 bytes, so the HOL Light proof's nonoverlapping 246 / pc+245
references in mldsa_rej_uniform.ml become 243 / pc+242 accordingly, and
the two X86_STEPS_TAC invocations that stepped the vzeroupper byte are
removed. Bytecode regenerated via autogen --update-hol-light-bytecode.

Autogen plumbing: register rej_uniform_avx2_asm.S in the x86_64 HOL
Light asm joblist so the proofs/hol_light/x86_64/mldsa/ copy is
regenerated by scripts/autogen. Add gen_avx2_hol_light_rej_uniform_table
to regenerate proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml
alongside the C/aarch64 lookup tables (matches mlkem-native's pattern).

Cross-reference comment in proofs/hol_light/x86_64/proofs/
rej_uniform_avx2_asm.ml pointing at the CBMC contract.

Proof runtime: ~5-6 min in the CI native build.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jakemas jakemas force-pushed the jakemas/rej-uniform-asm branch from 31b4b4c to 967fa0b Compare May 8, 2026 06:52
Reviewer-requested cleanup for the x86_64 rej_uniform assembly and
HOL Light proof:

Contract tightening (dev and mldsa copies of arith_native_x86_64.h):
  - requires(memory_no_alias(buf, 840)) instead of
    memory_no_alias(buf, MLD_AVX2_REJ_UNIFORM_BUFLEN) so the literal
    matches the HOL Light spec exactly.
  - requires(table == (const uint8_t *)mld_rej_uniform_table) pinning
    the table to the exported rejection-sampling table, replacing the
    looser memory_no_alias(table, 256 * sizeof(uint64_t)).
  - Clarify sync comment.

vzeroupper removal: none of the other asm routines issue vzeroupper;
drop it from rej_uniform for consistency. This shifts the function
length by 3 bytes, so the HOL Light proof's nonoverlapping 246 / pc+245
references in mldsa_rej_uniform.ml become 243 / pc+242 accordingly, and
the two X86_STEPS_TAC invocations that stepped the vzeroupper byte are
removed. Bytecode regenerated via autogen --update-hol-light-bytecode.

Autogen plumbing: register rej_uniform_avx2_asm.S in the x86_64 HOL
Light asm joblist so the proofs/hol_light/x86_64/mldsa/ copy is
regenerated by scripts/autogen. Add gen_avx2_hol_light_rej_uniform_table
to regenerate proofs/hol_light/x86_64/proofs/mldsa_rej_uniform_table.ml
alongside the C/aarch64 lookup tables (matches mlkem-native's pattern).

Cross-reference comment in proofs/hol_light/x86_64/proofs/
rej_uniform_avx2_asm.ml pointing at the CBMC contract.

Proof runtime: ~5-6 min in the CI native build.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jakemas jakemas force-pushed the jakemas/rej-uniform-asm branch from 967fa0b to b1b11e9 Compare May 8, 2026 06:58
jakemas and others added 3 commits May 8, 2026 07:44
- proofs/hol_light/README.md: add rej_uniform_avx2_asm.S to the x86_64
  arithmetic proofs section.
- proofs/hol_light/common/mldsa_specs.ml: add REJ_SAMPLE, REJ_SAMPLE_EMPTY,
  REJ_SAMPLE_APPEND. These match what's used in s2n-bignum #378
  (aarch64) so the aarch64 rej_uniform proof can share the shape.
- proofs/hol_light/x86_64/proofs/rej_uniform_avx2_asm.ml: needs the
  new mldsa_specs dependency; drop the duplicate REJ_SAMPLE definition.
  The x86-only REJ_SAMPLE_SPLIT / REJ_SAMPLE_PREFIX_256 /
  REJ_SAMPLE_STEP_LE (scalar-tail analysis helpers) stay here.
- .github/workflows/hol_light.yml: add mldsa_specs.ml to the
  rej_uniform_avx2_asm needs list.

Proof still passes in ~8 min native build.

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prove that every element of SUB_LIST (0,256) (REJ_SAMPLE inlist) has
val c < 8380417 directly from the FILTER definition. Provides the
coefficient bound property requested in the review; callers can
specialize to per-index via EL / MEM_EL.

Kept as a standalone lemma rather than adding a per-index postcondition
to MLDSA_REJ_UNIFORM_CORRECT to avoid touching the inner Hoare triple.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Jake Massimo <jakemas@amazon.com>
Strengthen the postcondition of MLDSA_REJ_UNIFORM_CORRECT and
MLDSA_REJ_UNIFORM_(NOIBT_)SUBROUTINE_CORRECT to include the
per-coefficient bound

    !i. i < outlen ==>
        val(read(memory :> bytes32 (word_add res (word(4 * i)))) s) < 8380417

matching the CBMC contract
    ensures(array_bound(buf, 0, len, 0, 8380417))

in arith_native_x86_64.h.

Uses the same layering pattern as poly_use_hint_32_aarch64_asm
(ENSURES_STRENGTHEN_POST): introduces ENSURES_STRENGTHEN_POST_X86,
a memory->list-element bridge VAL_READ_BYTES32_FROM_WORDLIST, and
the combinatorial lemma REJ_SAMPLE_COEFF_BOUND, then derives
MLDSA_REJ_UNIFORM_CORRECT_BOUND by showing the old
num_of_wordlist-based postcondition implies the new per-index bound.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Jake Massimo <jakemas@amazon.com>
@jakemas jakemas force-pushed the jakemas/rej-uniform-asm branch from 2d393ab to b10421a Compare May 9, 2026 04:35
jakemas added a commit that referenced this pull request May 15, 2026
- Pin s2n-bignum to awslabs/s2n-bignum@ccef2456 (upstream main with
  USHLL/MOVI/VPCMPGTD instruction models merged)
- Add HOL Light eta rejection table generation to autogen, matching
  the pattern from the x86 rej_uniform table in PR #1014

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>
jakemas added a commit that referenced this pull request May 15, 2026
- Pin s2n-bignum to awslabs/s2n-bignum@ccef2456 (upstream main with
  USHLL/MOVI/VPCMPGTD instruction models merged)
- Add HOL Light eta rejection table generation to autogen, matching
  the pattern from the x86 rej_uniform table in PR #1014

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>
jakemas added a commit that referenced this pull request May 15, 2026
- Pin s2n-bignum to awslabs/s2n-bignum@ccef2456 (upstream main with
  USHLL/MOVI/VPCMPGTD instruction models merged)
- Add HOL Light eta rejection table generation to autogen, matching
  the pattern from the x86 rej_uniform table in PR #1014

Signed-off-by: Jake Massimo <jakemas@amazon.com>
Signed-off-by: Ubuntu <ubuntu@ip-172-31-31-118.us-west-2.compute.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AVX2: Replace intrinsics implementation of rej_uniform with assembly

3 participants