-
Notifications
You must be signed in to change notification settings - Fork 37
feat(binary): add git binary patch support #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
cc952d3
feat(binary): binary patch types and parser
weihanglo 7c505f6
refactor(utils): add `Text::as_str_prefix` method
weihanglo 2697352
feat(patch_set): wire binary diff parsing
weihanglo 1c6a9e2
refactor: clippy manual_div_ceil
weihanglo ff60fe7
chore: dependency flate2 behind binary feature
weihanglo d2b9720
feat(binary): base85/delta decode and patch application
weihanglo d43d756
test(compat): binary patch tests
weihanglo b135714
test(replay): wire binary patch into replay tests
weihanglo 3622334
test(binary): add zero control byte delta test
weihanglo 99042b7
fix(binary): reject zero control byte in delta
weihanglo ceb0dfb
test(binary): add literal size mismatch test
weihanglo d506a5e
fix(binary): check decompressed size against declared
weihanglo c5f8447
fix(binary): cap pre-allocation to prevent OOM
weihanglo 54f4e95
binary: convert parse_binary_patch to take &[u8]
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,227 @@ | ||
| //! Base85 encoding and decoding using the character set defined in [RFC 1924]. | ||
| //! | ||
| //! ## References | ||
| //! | ||
| //! * [RFC 1924] | ||
| //! * [Wikipedia: Ascii85 § RFC 1924 version](https://en.wikipedia.org/wiki/Ascii85#RFC_1924_version) | ||
| //! | ||
| //! [RFC 1924]: https://datatracker.ietf.org/doc/html/rfc1924 | ||
|
|
||
| use std::fmt; | ||
|
|
||
| /// Base85 character set (RFC 1924). | ||
| const ALPHABET: &[u8; 85] = b"0123456789\ | ||
| ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz\ | ||
| !#$%&()*+-;<=>?@^_`{|}~"; | ||
|
|
||
| /// Pre-computed lookup table for Base85 decoding. | ||
| /// | ||
| /// Maps ASCII byte value → digit value or `0xFF` for invalid characters. | ||
| /// This provides O(1) lookup. | ||
| const TABLE: [u8; 256] = { | ||
| let mut table = [0xFFu8; 256]; | ||
| let mut i = 0usize; | ||
| while i < 85 { | ||
| table[ALPHABET[i] as usize] = i as u8; | ||
| i += 1; | ||
| } | ||
| table | ||
| }; | ||
|
|
||
| /// Error type for Base85 operations. | ||
| #[derive(Debug, Clone, PartialEq, Eq)] | ||
| pub enum Base85Error { | ||
| /// Invalid character that is not in RFC 1924 alphabet. | ||
| InvalidCharacter(char), | ||
| /// Invalid input length for the operation. | ||
| InvalidLength, | ||
| } | ||
|
|
||
| impl fmt::Display for Base85Error { | ||
| fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { | ||
| match self { | ||
| Base85Error::InvalidCharacter(c) => write!(f, "invalid base85 character: {:?}", c), | ||
| Base85Error::InvalidLength => write!(f, "invalid input length"), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl std::error::Error for Base85Error {} | ||
|
|
||
| /// Decodes a Base85 string to the provided output. | ||
| /// | ||
| /// ## Limitations | ||
| /// | ||
| /// The input length must be a multiple of 5. | ||
| /// | ||
| /// This function does not handle padding for partial chunks. | ||
| /// When decoding data where the original byte count isn't a multiple of 4, | ||
| /// callers must handle truncation at a higher level. | ||
| /// For example, via a length indicator in Git binary patch. | ||
| pub fn decode_into(input: &[u8], output: &mut impl Extend<u8>) -> Result<(), Base85Error> { | ||
| if input.len() % 5 != 0 { | ||
| return Err(Base85Error::InvalidLength); | ||
| } | ||
|
|
||
| // TODO: Use `as_chunks::<5>()` when MSRV >= 1.88 | ||
| for chunk in input.chunks_exact(5) { | ||
| let mut value: u32 = 0; | ||
| for &byte in chunk { | ||
| let digit = TABLE[byte as usize]; | ||
| if digit == 0xFF { | ||
| return Err(Base85Error::InvalidCharacter(byte as char)); | ||
| } | ||
| value = value * 85 + digit as u32; | ||
| } | ||
|
|
||
| output.extend(value.to_be_bytes()); | ||
| } | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| /// Encodes bytes in Base85 to the provided output. | ||
| /// | ||
| /// ## Limitations | ||
| /// | ||
| /// The input length must be a multiple of 4. | ||
| /// | ||
| /// This function does not handle padding for partial chunks. | ||
| /// Callers encoding data where the byte count isn't a multiple of 4 | ||
| /// must handle padding at a higher level. | ||
| /// For example, via a length indicator in Git binary patch format. | ||
| #[allow(dead_code)] // will be used for patch formatting | ||
| pub fn encode_into(input: &[u8], output: &mut impl Extend<char>) -> Result<(), Base85Error> { | ||
| if input.len() % 4 != 0 { | ||
| return Err(Base85Error::InvalidLength); | ||
| } | ||
|
|
||
| // TODO: Use `as_chunks::<4>()` when MSRV >= 1.88 | ||
| for chunk in input.chunks_exact(4) { | ||
| let mut value = u32::from_be_bytes(chunk.try_into().unwrap()); | ||
|
|
||
| // Extract 5 base85 digits (least to most significant order) | ||
| let mut digits = [0u8; 5]; | ||
| for digit in digits.iter_mut().rev() { | ||
| *digit = ALPHABET[(value % 85) as usize]; | ||
| value /= 85; | ||
| } | ||
| output.extend(digits.iter().map(|&b| b as char)); | ||
| } | ||
|
|
||
| Ok(()) | ||
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { | ||
| use super::*; | ||
|
|
||
| fn decode(input: &str) -> Result<Vec<u8>, Base85Error> { | ||
| let mut result = Vec::with_capacity((input.len() / 5) * 4); | ||
| decode_into(input.as_bytes(), &mut result)?; | ||
| Ok(result) | ||
| } | ||
|
|
||
| fn encode(input: &[u8]) -> Result<String, Base85Error> { | ||
| let mut result = String::with_capacity((input.len() / 4) * 5); | ||
| encode_into(input, &mut result)?; | ||
| Ok(result) | ||
| } | ||
|
|
||
| const TEST_VECTORS: &[(&[u8], &str)] = &[ | ||
| (b"", ""), | ||
| (&[0x00, 0x00, 0x00, 0x00], "00000"), | ||
| (&[0xff, 0xff, 0xff, 0xff], "|NsC0"), | ||
| // Rust ecosystem phrases | ||
| (b"Rust", "Qgw55"), | ||
| (b"Fearless concurrency", "MrC1gY-MwEAY*TCV|8+JWo~16"), | ||
| (b"memory safe!", "ZDnn5a(N(gVP<6^"), | ||
| (b"blazing fast", "Vr*f0X>MmAW?^%5"), | ||
| ( | ||
| b"zero-cost abstraction!??", | ||
| "dS!BNEn{zUbRc13b98cHV{~b6ZXrKE", | ||
| ), | ||
| ]; | ||
|
|
||
| #[test] | ||
| fn table_covers_all_alphabet_chars() { | ||
| for (i, &c) in ALPHABET.iter().enumerate() { | ||
| assert_eq!( | ||
| TABLE[c as usize], i as u8, | ||
| "mismatch for char '{}' at index {}", | ||
| c as char, i | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn table_rejects_invalid_chars() { | ||
| let invalid_chars = b" \t\n\r\"'\\[],:"; | ||
| for &c in invalid_chars { | ||
| assert_eq!( | ||
| TABLE[c as usize], 0xFF, | ||
| "char '{}' should be invalid", | ||
| c as char | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn decode_test_vectors() { | ||
| for (bytes, encoded) in TEST_VECTORS { | ||
| let result = decode(encoded).unwrap(); | ||
| assert_eq!(&result, *bytes, "decode({:?}) failed", encoded); | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn encode_test_vectors() { | ||
| for (bytes, encoded) in TEST_VECTORS { | ||
| let result = encode(bytes).unwrap(); | ||
| assert_eq!(result, *encoded, "encode({:?}) failed", bytes); | ||
| } | ||
| } | ||
|
|
||
| #[test] | ||
| fn decode_invalid_length() { | ||
| assert!(matches!(decode("0000"), Err(Base85Error::InvalidLength))); | ||
| assert!(matches!(decode("000"), Err(Base85Error::InvalidLength))); | ||
| assert!(matches!(decode("00"), Err(Base85Error::InvalidLength))); | ||
| assert!(matches!(decode("0"), Err(Base85Error::InvalidLength))); | ||
| } | ||
|
|
||
| #[test] | ||
| fn decode_invalid_character() { | ||
| assert!(matches!( | ||
| decode("0000 "), | ||
| Err(Base85Error::InvalidCharacter(' ')) | ||
| )); | ||
| assert!(matches!( | ||
| decode("0000\""), | ||
| Err(Base85Error::InvalidCharacter('"')) | ||
| )); | ||
| } | ||
|
|
||
| #[test] | ||
| fn encode_invalid_length() { | ||
| assert!(matches!(encode(&[0]), Err(Base85Error::InvalidLength))); | ||
| assert!(matches!(encode(&[0, 0]), Err(Base85Error::InvalidLength))); | ||
| assert!(matches!( | ||
| encode(&[0, 0, 0]), | ||
| Err(Base85Error::InvalidLength) | ||
| )); | ||
| assert!(matches!( | ||
| encode(&[0, 0, 0, 0, 0]), | ||
| Err(Base85Error::InvalidLength) | ||
| )); | ||
| } | ||
|
|
||
| #[test] | ||
| fn round_trip() { | ||
| for (bytes, _) in TEST_VECTORS { | ||
| let encoded = encode(bytes).unwrap(); | ||
| let decoded = decode(&encoded).unwrap(); | ||
| assert_eq!(&decoded, *bytes, "round-trip failed for {:?}", bytes); | ||
| } | ||
| } | ||
| } |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use our hand-written one. Alternatives:
decode_intoto avoid allocationdecode -> Optionnot good for error reportingdecode_intoto avoid allocationThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hand-written one here is solid, and it is nice to avoid an extra dep if we can