Skip to content

fix(patch)!: stop parsing at garbage after hunk satisfied#51

Merged
bmwill merged 4 commits intobmwill:masterfrom
weihanglo:trailing-garbage
Apr 10, 2026
Merged

fix(patch)!: stop parsing at garbage after hunk satisfied#51
bmwill merged 4 commits intobmwill:masterfrom
weihanglo:trailing-garbage

Conversation

@weihanglo
Copy link
Copy Markdown
Contributor

@weihanglo weihanglo commented Apr 9, 2026

‼️ This contains a behavior change!

Scenario GNU patch git apply diffy (before) diffy (this)
Junk between hunks (same file) ✅ Ignores trailing, applies first hunk only ❌ patch fragment without header UnexpectedHunkLine support both ignore and reject modes
Junk between files ✅ Treats as preamble ✅ Treats as preamble ✅ Treats as preamble ✅ Treats as preamble
Trailing junk at end ✅ Ignores ✅ Ignores UnexpectedHunkLine ✅ Ignores
Trailing junk after \ No newline at end of file ✅ Ignores ✅ Ignores ExpectedEndOfHunk ✅ Ignores

After this, we stop parsing at trailing garbage after hunk is satisfied.

Hunk is satisfied when line counts from header are satisfied.

  • hunk_lines() now tracks old/new line counts during parsing
  • Stops at non-hunk line when counts satisfied
  • Errors if non-hunk line before counts satisfied
  • Handles trailing garbage after \ No newline at end of file marker
    (pattern first found in rust-lang/cargo@b119b891d)

This is a preparation for multi-patch parsing where splitting by ---/+++ boundaries may leave trailing diff --git lines from the next patch.

While this is a breaking change in behavior, the default behavior matches GNU patch behavior. It is more resilient. We also add Patches::from_str_strict and Patches::from_bytes_strict to match git apply's stricter parsing rules.

Comment thread src/patch/parse.rs
fn hunks<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Hunk<'a, T>>> {
let mut hunks = Vec::new();
while parser.peek().is_some() {
// Following GNU patch behavior: stop at non-@@ content.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread src/patch/parse.rs
Comment on lines +243 to +245
if hunk_complete {
break;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda of early return would help us stop caring stripping garbage like email signature, which GNU patch is resilient to that.

weihanglo@7d0acc3

Comment thread src/patch/error.rs
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a mix of extractions of

We don't have tool compat tests right now, so it is hard to see the actual compatibility between these changes against GNU patch and Git. You can check tests in those pull requests to figure.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your commit message mentions that this behavior matches GNU patch, how does git behave in these cases? Does it match GNU patch?

I suppose weihanglo#23 answers this question, in that git is more strict within a single "file". The question is do we want to be more strict like git or a bit more flexible like GNU patch in these cases? Thoughts, since you seemed to opt to be more flexible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there is a table in that PR:

Junk Between Hunks vs Between Files

Scenario GNU patch git apply diffy
Junk between hunks (same file) ✅ Ignores trailing, applies first hunk only ❌ Errors ✅ Ignores trailing, applies first hunk only
Junk between files ✅ Treats as preamble ✅ Treats as preamble ✅ Treats as preamble
Trailing junk at end ✅ Ignores ✅ Ignores ✅ Ignores

diffy matches GNU patch behavior.
git apply is stricter (errors on junk between hunks).

We could also be stricter or make it configurable. I forgot why we chose this. Perhaps because it is easier to implement to support both unidiff and gitdiff mode in multi-file patches. So, maybe configurable is better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached a more comprehensive comparison table in PR description btw.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if its not too much trouble maybe nice to have the option to be more strict configurable? We can have it default to be permissive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Patches::from_str_strict and Patches::from_bytes_strict to match git apply's stricter parsing rules. Let me know if this is a good API or not.

(We could possibly have a ParseOptions struct though not sure if we will go that far)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Yeah this works for now.

After this,
we stop parsing at trailing garbage after hunk is satisfied.

Hunk is satisfied when line counts from header are satisfied.

- `hunk_lines()` now tracks old/new line counts during parsing
- Stops at non-hunk line when counts satisfied
- Errors if non-hunk line before counts satisfied
- Handles trailing garbage after `\ No newline at end of file` marker
  (pattern first appeared in rust-lang/cargo@b119b891d)

This is a preparation for multi-patch parsing
where splitting by `---/+++` boundaries
may leave trailing `diff --git` lines from the next patch.

While this is a breaking change in behavior,
it matches GNU patch behavior.
Tests document the currenet behavior (git-apply incompatible).
This will be fixed in the next commit.
Adds `from_str_strict`/`from_bytes_strict` that reject
orphaned hunk headers hidden behind trailing content
This matches `git apply` behavior.
Plain trailing junk is still accepted.

The default `from_str`/`from_bytes` remain permissive
(the GNU patch behavior ).
@bmwill bmwill merged commit d005351 into bmwill:master Apr 10, 2026
6 checks passed
@weihanglo weihanglo deleted the trailing-garbage branch April 10, 2026 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants