Skip to content

feat(patch_set): support git diff application#59

Merged
bmwill merged 5 commits intobmwill:masterfrom
weihanglo:gitdiff
Apr 18, 2026
Merged

feat(patch_set): support git diff application#59
bmwill merged 5 commits intobmwill:masterfrom
weihanglo:gitdiff

Conversation

@weihanglo
Copy link
Copy Markdown
Contributor

@weihanglo weihanglo commented Apr 12, 2026

Add git diff output parsing and application support. Some highlights:

  • Binary diffs are always pared, and emit PatchKind::Binary marker. Callers decide
    how to handle.
  • API are mostly generic over T: Text, since hunks and filename may contain non-UTF8 bytes. This shares the same limitation of Patch — if you use a PatchSet::parse for Patch<'_, str>, you cannot have non-UTF8 filename in diff --git extended header.

Comment thread src/patch_set/parse.rs Outdated
}
};

// FIXME: error spans point at `diff --git` line, not the specific offending line
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave it for future, I am lazy again to write new code 🙇🏾‍♂️.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if there are explicit limitation or things that we know will be addressed in the future as long as we document the issue i'm fine with avoiding scope bloat on individual PRs

@weihanglo weihanglo force-pushed the gitdiff branch 2 times, most recently from ffb8730 to 2515dd3 Compare April 12, 2026 21:08
Comment thread tests/replay.rs Outdated
Comment thread src/patch_set/mod.rs Outdated
Comment on lines +116 to +120
/// Skip binary diffs silently.
pub fn skip_binary(mut self) -> Self {
self.binary = Binary::Skip;
self
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not good actually. We may want a binary marker like patch so people explicitly know they are skipping something

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following YAGNI, I removed skip and fail options. The new revision will always parse binary patches. If users don't want to parse binary patches, they shouldn't have generated binary patch with --binary in the first place. Anyway, this can be a future feature request and is not too hard.

Comment thread src/patch_set/parse.rs
Comment thread src/patch_set/parse.rs Outdated
Comment thread src/patch_set/parse.rs Outdated
fn extract_file_op_gitdiff<'a>(
header: &GitHeader<'a>,
patch: &Patch<'a, str>,
) -> Result<FileOperation<'a>, PatchSetParseError> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing issue: This FileOperation preserves the raw path with prefix unstripped, e.g., a/src/lib.rs and b/src/lib.rs. I personally think this is the right chose on syntactic level and consumer should know their patch better than us. However, the API doc should call this out more explicitly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that we have yet supported non-UTF8 path even in #64.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that we have yet supported non-UTF8 path

This was one other thing i was going to mention. How do you want to handle that? punt on that for a follow up PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also cherry pick whatever in #64, if that is preferred.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#65 has addressed and #66 fixes #64

Comment thread src/patch_set/parse.rs Outdated
Comment thread src/patch_set/parse.rs Outdated
}
// Select split with longest common path suffix (matches Git behavior)
if let Some(path) = longest_common_path_suffix(left, right) {
if path.len() > longest_path.len() {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple splits with the same length what does git do in those situations? as-is this will prefer the first one we encounter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is interesting. If we have diff --git a/x b/x c/x, git-apply failed to apply

git diff header lacks filename information
when removing N leading pathname component(s)"

Also in https://git-scm.com/docs/diff-format#generate_patch_text_with_p:

The a/ and b/ filenames are the same unless rename/copy is involved.

This kinda tells git-apply's path resolution is strip-level-aware, unlike ours that picks the first one. I'll mark this as incompat in our compat tests. We can decide the actual behavior later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added tests

  • compat::git::fail_ambiguous_suffix_tie
  • compat::git::path_ambiguous_suffix

Though this is still not complete compatible with git-apply. Our parser is more lenient.

Comment thread src/patch_set/parse.rs Outdated
/// Path component boundary means:
///
/// * At `/` character (e.g., `foo/bar.rs` vs `fooo/bar.rs` → `bar.rs`)
/// * Or the entire string is identical
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "entire string is identical" case takes care of when a and b have no/s correct?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The / boundary only matters for partial suffixes. Let me enhance the doc here.

Comment thread src/patch/parse.rs Outdated
Comment thread src/patch/parse.rs
Comment thread src/patch_set/parse.rs
Comment on lines 61 to 62
// Strip email preamble once at construction
let input = strip_email_preamble(input);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not new in this change but I was just wondering how we would properly handle mbox streams (a concatenation of a bunch of email patchsets). Maybe worth adding a comment to comeback and address in a followup?

Copy link
Copy Markdown
Contributor Author

@weihanglo weihanglo Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remembered last time I tried, GNU patch and git apply both failed on this case. We now have compat test infra we can verify with some tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood your comment. It should be fine with mbox stream concatenation, as we ignore trailing garbage when hunk is satisfied

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added. Working as expected. The behavior matches both reference tools (patches applied)

  • compat::git::format_patch_mbox
  • compat::gnu_patch::format_patch_mbox

Comment thread src/patch_set/parse.rs Outdated
}

/// See [`parse_diff_git_path`].
fn parse_quoted_diff_git_path(line: &str) -> Option<(Cow<'_, str>, Cow<'_, str>)> {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for quote-within-quote eg diff --git "a/with\"quote"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested added:

  • patch_set::tests::patchset_gitdiff::rename_both_quoted
  • patch_set::tests::patchset_gitdiff::rename_quoted_to_unquoted
  • patch_set::tests::patchset_gitdiff::rename_unquoted_to_quoted
  • patch_set::tests::patchset_gitdiff::path_quoted_with_escaped_quote (different location than rename)
  • compat::git::path_quoted_inner_quote

Comment thread src/patch_set/parse.rs
let end = loop {
match bytes.get(i)? {
b'"' => break i + 1,
b'\\' => i += 2,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to be concerned about full octal awareness here? If we don't then maybe we should add a comment indicating why its ok?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment added. Thanks for calling it out!

Comment thread src/utils.rs Outdated
weihanglo added a commit to weihanglo/diffy that referenced this pull request Apr 15, 2026
Octal escape \377 decodes to 0xFF, which is not valid UTF-8.
When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics
via `unwrap()` instead of returning a parse error.

This documents the pre-existing bug that the reviewer flagged:
bmwill#59 (comment)
weihanglo added a commit to weihanglo/diffy that referenced this pull request Apr 15, 2026
Octal escape \377 decodes to 0xFF, which is not valid UTF-8.
When parsing into `Patch<'_, str>`,
`convert_cow_to_str` panics via `unwrap()`
instead of returning a parse error.

See bmwill#59 (comment)
@weihanglo weihanglo marked this pull request as draft April 15, 2026 23:11
@weihanglo
Copy link
Copy Markdown
Contributor Author

Just put this on hold and created #65 and #66 for refactoring/improving the existing stuff.

bmwill pushed a commit that referenced this pull request Apr 16, 2026
Octal escape \377 decodes to 0xFF, which is not valid UTF-8.
When parsing into `Patch<'_, str>`,
`convert_cow_to_str` panics via `unwrap()`
instead of returning a parse error.

See #59 (comment)
@weihanglo weihanglo force-pushed the gitdiff branch 3 times, most recently from 5ce67af to 0a3c53d Compare April 17, 2026 04:50
@weihanglo
Copy link
Copy Markdown
Contributor Author

This is ready for review.

Because we merged #66 so this is kind a huge rewrite, so I had no choice but rewrote the history. Each review comment should be addressed already. Most of the code logic didn't change.

@weihanglo weihanglo marked this pull request as ready for review April 17, 2026 05:04
Comment thread tests/compat/git/mod.rs
fn junk_between_hunks() {
Case::git("junk_between_hunks")
.strip(1)
.expect_compat(false)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incompatible with git-apply (diffy being more permissive)

Comment thread tests/compat/git/mod.rs
Case::git("fail_ambiguous_suffix_tie")
.strip(1)
.expect_success(true)
.expect_compat(false)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this incompat in #59 (comment)

* Parse `diff --git` extended headers
* split multi-file git diffs at `diff --git` boundaries
Compat test for also `git apply`.
Unlike unidiff,
gitdiff produces patches for empty file creations/deletions
(`0\t0` in numstat)
because they carry `diff --git` + extended headers even without hunks.

Binary files (`-\t-\t`) are skipped in gitdiff mode for now.
@bmwill
Copy link
Copy Markdown
Owner

bmwill commented Apr 18, 2026

Awesome thank you so much!

@bmwill bmwill merged commit 52ee463 into bmwill:master Apr 18, 2026
18 checks passed
@weihanglo weihanglo deleted the gitdiff branch April 18, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants