RISC-V: Add RVV vectorized FindMatchLength optimization#233
RISC-V: Add RVV vectorized FindMatchLength optimization#233zhanchangbao-sanechips wants to merge 1 commit intogoogle:mainfrom
Conversation
|
Why are you artificially restricting the vector length to 16 bytes? |
6078470 to
2259a3d
Compare
|
Thanks for the pointer! Updated to use vsetvl_e8m1(len) without the 16-byte restriction and removed the scalar tail loop entirely. |
Summary
This PR adds RISC-V Vector (RVV) optimization for the
FindMatchLength()function in the Snappy compression library. The optimization leverages RVV instructions to compare 16 bytes in parallel, resulting in improved compression performance on RISC-V platforms.Motivation
The Snappy compression algorithm spends a significant portion of its time in
FindMatchLength()during the compression phase. On RISC-V platforms with RVV support, we can accelerate this critical path by using vector instructions to perform parallel byte comparisons.Changes Made
FindMatchLength()to process 16-byte blocks in parallel__riscv_vsetvl_e8m1(),__riscv_vle8_v_u8m1(),__riscv_vmsne_vv_u8m1_b8(),__riscv_vfirst_m_b8()Implementation Details
The RVV optimization is strategically placed between
SNAPPY_PREFETCHand the scalar 8-byte loop:This layered approach ensures optimal performance across all input sizes while maintaining code clarity.
Performance Results
Test Environment
ZFlat (Compression) - Key Improvements
Other Operations
Key Observations
Test Repeatability
Three independent test runs confirm consistent and reproducible results:
Stability: 21 out of 24 test cases showed <1% variance across all three runs, indicating high test-retest reliability.
Compatibility and Portability
RISC-V with RVV Support
__riscv && SNAPPY_HAVE_RVVRISC-V without RVV Support
Non-RISC-V Platforms (x86_64, ARM64, etc.)
Testing
snappy_unittestpasses all testssnappy_benchmarkverified on RISC-V hardware (Banana Pi K1)Checklist
Future Work (Out of Scope for This PR)
MemCopy64operations (separate PR)Screenshots
Unit Tests - All Pass
Benchmark - Before Optimization
Benchmark - After Optimization