feat(patch_set): support git diff application#59
Conversation
| } | ||
| }; | ||
|
|
||
| // FIXME: error spans point at `diff --git` line, not the specific offending line |
There was a problem hiding this comment.
Leave it for future, I am lazy again to write new code 🙇🏾♂️.
There was a problem hiding this comment.
Yeah if there are explicit limitation or things that we know will be addressed in the future as long as we document the issue i'm fine with avoiding scope bloat on individual PRs
ffb8730 to
2515dd3
Compare
49376b2 to
e5895d9
Compare
| /// Skip binary diffs silently. | ||
| pub fn skip_binary(mut self) -> Self { | ||
| self.binary = Binary::Skip; | ||
| self | ||
| } |
There was a problem hiding this comment.
This is not good actually. We may want a binary marker like patch so people explicitly know they are skipping something
There was a problem hiding this comment.
Following YAGNI, I removed skip and fail options. The new revision will always parse binary patches. If users don't want to parse binary patches, they shouldn't have generated binary patch with --binary in the first place. Anyway, this can be a future feature request and is not too hard.
| fn extract_file_op_gitdiff<'a>( | ||
| header: &GitHeader<'a>, | ||
| patch: &Patch<'a, str>, | ||
| ) -> Result<FileOperation<'a>, PatchSetParseError> { |
There was a problem hiding this comment.
Pre-existing issue: This FileOperation preserves the raw path with prefix unstripped, e.g., a/src/lib.rs and b/src/lib.rs. I personally think this is the right chose on syntactic level and consumer should know their patch better than us. However, the API doc should call this out more explicitly.
There was a problem hiding this comment.
Also note that we have yet supported non-UTF8 path even in #64.
There was a problem hiding this comment.
Also note that we have yet supported non-UTF8 path
This was one other thing i was going to mention. How do you want to handle that? punt on that for a follow up PR?
There was a problem hiding this comment.
Can also cherry pick whatever in #64, if that is preferred.
| } | ||
| // Select split with longest common path suffix (matches Git behavior) | ||
| if let Some(path) = longest_common_path_suffix(left, right) { | ||
| if path.len() > longest_path.len() { |
There was a problem hiding this comment.
If there are multiple splits with the same length what does git do in those situations? as-is this will prefer the first one we encounter.
There was a problem hiding this comment.
This one is interesting. If we have diff --git a/x b/x c/x, git-apply failed to apply
git diff header lacks filename information
when removing N leading pathname component(s)"
Also in https://git-scm.com/docs/diff-format#generate_patch_text_with_p:
The a/ and b/ filenames are the same unless rename/copy is involved.
This kinda tells git-apply's path resolution is strip-level-aware, unlike ours that picks the first one. I'll mark this as incompat in our compat tests. We can decide the actual behavior later.
There was a problem hiding this comment.
Added tests
compat::git::fail_ambiguous_suffix_tiecompat::git::path_ambiguous_suffix
Though this is still not complete compatible with git-apply. Our parser is more lenient.
| /// Path component boundary means: | ||
| /// | ||
| /// * At `/` character (e.g., `foo/bar.rs` vs `fooo/bar.rs` → `bar.rs`) | ||
| /// * Or the entire string is identical |
There was a problem hiding this comment.
The "entire string is identical" case takes care of when a and b have no/s correct?
There was a problem hiding this comment.
The / boundary only matters for partial suffixes. Let me enhance the doc here.
| // Strip email preamble once at construction | ||
| let input = strip_email_preamble(input); |
There was a problem hiding this comment.
Not new in this change but I was just wondering how we would properly handle mbox streams (a concatenation of a bunch of email patchsets). Maybe worth adding a comment to comeback and address in a followup?
There was a problem hiding this comment.
I remembered last time I tried, GNU patch and git apply both failed on this case. We now have compat test infra we can verify with some tests.
There was a problem hiding this comment.
I misunderstood your comment. It should be fine with mbox stream concatenation, as we ignore trailing garbage when hunk is satisfied
There was a problem hiding this comment.
Tests added. Working as expected. The behavior matches both reference tools (patches applied)
compat::git::format_patch_mboxcompat::gnu_patch::format_patch_mbox
| } | ||
|
|
||
| /// See [`parse_diff_git_path`]. | ||
| fn parse_quoted_diff_git_path(line: &str) -> Option<(Cow<'_, str>, Cow<'_, str>)> { |
There was a problem hiding this comment.
Can we add a test for quote-within-quote eg diff --git "a/with\"quote"
There was a problem hiding this comment.
Tested added:
patch_set::tests::patchset_gitdiff::rename_both_quotedpatch_set::tests::patchset_gitdiff::rename_quoted_to_unquotedpatch_set::tests::patchset_gitdiff::rename_unquoted_to_quotedpatch_set::tests::patchset_gitdiff::path_quoted_with_escaped_quote(different location than rename)compat::git::path_quoted_inner_quote
| let end = loop { | ||
| match bytes.get(i)? { | ||
| b'"' => break i + 1, | ||
| b'\\' => i += 2, |
There was a problem hiding this comment.
Do we need to be concerned about full octal awareness here? If we don't then maybe we should add a comment indicating why its ok?
There was a problem hiding this comment.
Comment added. Thanks for calling it out!
Octal escape \377 decodes to 0xFF, which is not valid UTF-8. When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics via `unwrap()` instead of returning a parse error. This documents the pre-existing bug that the reviewer flagged: bmwill#59 (comment)
Octal escape \377 decodes to 0xFF, which is not valid UTF-8. When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics via `unwrap()` instead of returning a parse error. See bmwill#59 (comment)
Octal escape \377 decodes to 0xFF, which is not valid UTF-8. When parsing into `Patch<'_, str>`, `convert_cow_to_str` panics via `unwrap()` instead of returning a parse error. See #59 (comment)
5ce67af to
0a3c53d
Compare
|
This is ready for review. Because we merged #66 so this is kind a huge rewrite, so I had no choice but rewrote the history. Each review comment should be addressed already. Most of the code logic didn't change. |
| fn junk_between_hunks() { | ||
| Case::git("junk_between_hunks") | ||
| .strip(1) | ||
| .expect_compat(false) |
There was a problem hiding this comment.
This is incompatible with git-apply (diffy being more permissive)
| Case::git("fail_ambiguous_suffix_tie") | ||
| .strip(1) | ||
| .expect_success(true) | ||
| .expect_compat(false) |
There was a problem hiding this comment.
We discussed this incompat in #59 (comment)
* Parse `diff --git` extended headers * split multi-file git diffs at `diff --git` boundaries
Compat test for also `git apply`.
Unlike unidiff, gitdiff produces patches for empty file creations/deletions (`0\t0` in numstat) because they carry `diff --git` + extended headers even without hunks. Binary files (`-\t-\t`) are skipped in gitdiff mode for now.
bmwill#62 optimized it
|
Awesome thank you so much! |
Add git diff output parsing and application support. Some highlights:
PatchKind::Binarymarker. Callers decidehow to handle.
T: Text, since hunks and filename may contain non-UTF8 bytes. This shares the same limitation ofPatch— if you use aPatchSet::parseforPatch<'_, str>, you cannot have non-UTF8 filename indiff --gitextended header.