fix(patch)!: stop parsing at garbage after hunk satisfied#51
fix(patch)!: stop parsing at garbage after hunk satisfied#51bmwill merged 4 commits intobmwill:masterfrom
Conversation
| fn hunks<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Hunk<'a, T>>> { | ||
| let mut hunks = Vec::new(); | ||
| while parser.peek().is_some() { | ||
| // Following GNU patch behavior: stop at non-@@ content. |
| if hunk_complete { | ||
| break; | ||
| } |
There was a problem hiding this comment.
This kinda of early return would help us stop caring stripping garbage like email signature, which GNU patch is resilient to that.
There was a problem hiding this comment.
This PR is a mix of extractions of
- fix(patch)!: stop parsing at garbage after hunk satisfied weihanglo/diffy#7
- fix(patch): trailing garbage after no-newline-at-eof marker weihanglo/diffy#21
- fix(patchset): ignore trailing junks when hunk is complete weihanglo/diffy#23
We don't have tool compat tests right now, so it is hard to see the actual compatibility between these changes against GNU patch and Git. You can check tests in those pull requests to figure.
There was a problem hiding this comment.
Your commit message mentions that this behavior matches GNU patch, how does git behave in these cases? Does it match GNU patch?
I suppose weihanglo#23 answers this question, in that git is more strict within a single "file". The question is do we want to be more strict like git or a bit more flexible like GNU patch in these cases? Thoughts, since you seemed to opt to be more flexible?
There was a problem hiding this comment.
Yeah there is a table in that PR:
Junk Between Hunks vs Between Files
Scenario GNU patch git apply diffy Junk between hunks (same file) ✅ Ignores trailing, applies first hunk only ❌ Errors ✅ Ignores trailing, applies first hunk only Junk between files ✅ Treats as preamble ✅ Treats as preamble ✅ Treats as preamble Trailing junk at end ✅ Ignores ✅ Ignores ✅ Ignores diffy matches GNU patch behavior.
git apply is stricter (errors on junk between hunks).
We could also be stricter or make it configurable. I forgot why we chose this. Perhaps because it is easier to implement to support both unidiff and gitdiff mode in multi-file patches. So, maybe configurable is better?
There was a problem hiding this comment.
Attached a more comprehensive comparison table in PR description btw.
There was a problem hiding this comment.
Yeah if its not too much trouble maybe nice to have the option to be more strict configurable? We can have it default to be permissive.
There was a problem hiding this comment.
Added Patches::from_str_strict and Patches::from_bytes_strict to match git apply's stricter parsing rules. Let me know if this is a good API or not.
(We could possibly have a ParseOptions struct though not sure if we will go that far)
There was a problem hiding this comment.
Perfect! Yeah this works for now.
f36ad3c to
48f7258
Compare
After this, we stop parsing at trailing garbage after hunk is satisfied. Hunk is satisfied when line counts from header are satisfied. - `hunk_lines()` now tracks old/new line counts during parsing - Stops at non-hunk line when counts satisfied - Errors if non-hunk line before counts satisfied - Handles trailing garbage after `\ No newline at end of file` marker (pattern first appeared in rust-lang/cargo@b119b891d) This is a preparation for multi-patch parsing where splitting by `---/+++` boundaries may leave trailing `diff --git` lines from the next patch. While this is a breaking change in behavior, it matches GNU patch behavior.
Tests document the currenet behavior (git-apply incompatible). This will be fixed in the next commit.
Adds `from_str_strict`/`from_bytes_strict` that reject orphaned hunk headers hidden behind trailing content This matches `git apply` behavior. Plain trailing junk is still accepted. The default `from_str`/`from_bytes` remain permissive (the GNU patch behavior ).
48f7258 to
7b7c81c
Compare
UnexpectedHunkLineUnexpectedHunkLine\ No newline at end of fileExpectedEndOfHunkAfter this, we stop parsing at trailing garbage after hunk is satisfied.
Hunk is satisfied when line counts from header are satisfied.
hunk_lines()now tracks old/new line counts during parsing\ No newline at end of filemarker(pattern first found in rust-lang/cargo@b119b891d)
This is a preparation for multi-patch parsing where splitting by
---/+++boundaries may leave trailingdiff --gitlines from the next patch.While this is a breaking change in behavior, the default behavior matches GNU patch behavior. It is more resilient. We also add
Patches::from_str_strictandPatches::from_bytes_strictto matchgit apply's stricter parsing rules.