-
Notifications
You must be signed in to change notification settings - Fork 37
fix(patch)!: stop parsing at garbage after hunk satisfied #51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
e872388
c83b3d2
30352b0
7b7c81c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,6 +65,19 @@ pub fn parse(input: &str) -> Result<Patch<'_, str>> { | |
| )) | ||
| } | ||
|
|
||
| pub fn parse_strict(input: &str) -> Result<Patch<'_, str>> { | ||
| let mut parser = Parser::new(input); | ||
| let header = patch_header(&mut parser)?; | ||
| let hunks = hunks(&mut parser)?; | ||
| reject_orphaned_hunk_headers(&mut parser)?; | ||
|
|
||
| Ok(Patch::new( | ||
| header.0.map(convert_cow_to_str), | ||
| header.1.map(convert_cow_to_str), | ||
| hunks, | ||
| )) | ||
| } | ||
|
|
||
| pub fn parse_bytes(input: &[u8]) -> Result<Patch<'_, [u8]>> { | ||
| let mut parser = Parser::new(input); | ||
| let header = patch_header(&mut parser)?; | ||
|
|
@@ -73,6 +86,15 @@ pub fn parse_bytes(input: &[u8]) -> Result<Patch<'_, [u8]>> { | |
| Ok(Patch::new(header.0, header.1, hunks)) | ||
| } | ||
|
|
||
| pub fn parse_bytes_strict(input: &[u8]) -> Result<Patch<'_, [u8]>> { | ||
| let mut parser = Parser::new(input); | ||
| let header = patch_header(&mut parser)?; | ||
| let hunks = hunks(&mut parser)?; | ||
| reject_orphaned_hunk_headers(&mut parser)?; | ||
|
|
||
| Ok(Patch::new(header.0, header.1, hunks)) | ||
| } | ||
|
|
||
| // This is only used when the type originated as a utf8 string | ||
| fn convert_cow_to_str(cow: Cow<'_, [u8]>) -> Cow<'_, str> { | ||
| match cow { | ||
|
|
@@ -154,9 +176,26 @@ fn verify_hunks_in_order<T: ?Sized>(hunks: &[Hunk<'_, T>]) -> bool { | |
| true | ||
| } | ||
|
|
||
| /// Scans remaining lines for orphaned `@@ ` hunk headers. | ||
| /// | ||
| /// In strict mode (git-apply behavior), trailing junk is allowed but | ||
| /// an `@@ ` line hiding behind that junk indicates a lost hunk. | ||
| fn reject_orphaned_hunk_headers<T: Text + ?Sized>(parser: &mut Parser<'_, T>) -> Result<()> { | ||
| while let Some(line) = parser.peek() { | ||
| if line.starts_with("@@ ") { | ||
| return Err(parser.error(ParsePatchErrorKind::OrphanedHunkHeader)); | ||
| } | ||
| parser.next()?; | ||
| } | ||
| Ok(()) | ||
| } | ||
|
|
||
| fn hunks<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Hunk<'a, T>>> { | ||
| let mut hunks = Vec::new(); | ||
| while parser.peek().is_some() { | ||
| // Following GNU patch behavior: stop at non-@@ content. | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Found in weihanglo@02fbd9b |
||
| // Any trailing content (including hidden @@ headers) is silently ignored. | ||
| // This is more permissive than git apply, which errors on junk between hunks. | ||
| while parser.peek().is_some_and(|line| line.starts_with("@@ ")) { | ||
| hunks.push(hunk(parser)?); | ||
| } | ||
|
|
||
|
|
@@ -173,13 +212,7 @@ fn hunk<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Hunk<'a, T>> | |
| let header_line = parser.next()?; | ||
| let (range1, range2, function_context) = | ||
| hunk_header(header_line).map_err(|e| parser.error_at(e.kind, hunk_start))?; | ||
| let lines = hunk_lines(parser)?; | ||
|
|
||
| // check counts of lines to see if they match the ranges in the hunk header | ||
| let (len1, len2) = super::hunk_lines_count(&lines); | ||
| if len1 != range1.len || len2 != range2.len { | ||
| return Err(parser.error_at(ParsePatchErrorKind::HunkMismatch, hunk_start)); | ||
| } | ||
| let lines = hunk_lines(parser, range1.len, range2.len, hunk_start)?; | ||
|
|
||
| Ok(Hunk::new(range1, range2, function_context, lines)) | ||
| } | ||
|
|
@@ -223,36 +256,61 @@ fn range<T: Text + ?Sized>(s: &T) -> Result<HunkRange> { | |
| Ok(HunkRange::new(start, len)) | ||
| } | ||
|
|
||
| fn hunk_lines<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Line<'a, T>>> { | ||
| fn hunk_lines<'a, T: Text + ?Sized>( | ||
| parser: &mut Parser<'a, T>, | ||
| expected_old: usize, | ||
| expected_new: usize, | ||
| hunk_start: usize, | ||
| ) -> Result<Vec<Line<'a, T>>> { | ||
| let mut lines: Vec<Line<'a, T>> = Vec::new(); | ||
| let mut no_newline_context = false; | ||
| let mut no_newline_delete = false; | ||
| let mut no_newline_insert = false; | ||
|
|
||
| let mut old_count = 0; | ||
| let mut new_count = 0; | ||
|
|
||
| while let Some(line) = parser.peek() { | ||
| let hunk_complete = old_count >= expected_old && new_count >= expected_new; | ||
|
|
||
| let line = if line.starts_with("@") { | ||
| break; | ||
| } else if no_newline_context { | ||
| if hunk_complete { | ||
| break; | ||
| } | ||
|
Comment on lines
+279
to
+281
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This kinda of early return would help us stop caring stripping garbage like email signature, which GNU patch is resilient to that. |
||
| return Err(parser.error(ParsePatchErrorKind::ExpectedEndOfHunk)); | ||
| } else if let Some(line) = line.strip_prefix(" ") { | ||
| if hunk_complete { | ||
| break; | ||
| } | ||
| Line::Context(line) | ||
| } else if line.starts_with("\n") { | ||
| if hunk_complete { | ||
| break; | ||
| } | ||
| Line::Context(*line) | ||
| } else if let Some(line) = line.strip_prefix("-") { | ||
| if no_newline_delete { | ||
| return Err(parser.error(ParsePatchErrorKind::TooManyDeletedLines)); | ||
| } | ||
| if hunk_complete { | ||
| break; | ||
| } | ||
| Line::Delete(line) | ||
| } else if let Some(line) = line.strip_prefix("+") { | ||
| if no_newline_insert { | ||
| return Err(parser.error(ParsePatchErrorKind::TooManyInsertedLines)); | ||
| } | ||
| if hunk_complete { | ||
| break; | ||
| } | ||
| Line::Insert(line) | ||
| } else if line.starts_with(NO_NEWLINE_AT_EOF) { | ||
| let last_line = lines | ||
| .pop() | ||
| .ok_or_else(|| parser.error(ParsePatchErrorKind::UnexpectedNoNewlineMarker))?; | ||
| match last_line { | ||
| let modified = match last_line { | ||
| Line::Context(line) => { | ||
| no_newline_context = true; | ||
| Line::Context(strip_newline(line)?) | ||
|
|
@@ -265,15 +323,38 @@ fn hunk_lines<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Li | |
| no_newline_insert = true; | ||
| Line::Insert(strip_newline(line)?) | ||
| } | ||
| } | ||
| }; | ||
| lines.push(modified); | ||
| parser.next()?; | ||
| continue; | ||
| } else { | ||
| if hunk_complete { | ||
| break; | ||
| } | ||
| return Err(parser.error(ParsePatchErrorKind::UnexpectedHunkLine)); | ||
| }; | ||
|
|
||
| match &line { | ||
| Line::Context(_) => { | ||
| old_count += 1; | ||
| new_count += 1; | ||
| } | ||
| Line::Delete(_) => { | ||
| old_count += 1; | ||
| } | ||
| Line::Insert(_) => { | ||
| new_count += 1; | ||
| } | ||
| } | ||
|
|
||
| lines.push(line); | ||
| parser.next()?; | ||
| } | ||
|
|
||
| if old_count != expected_old || new_count != expected_new { | ||
| return Err(parser.error_at(ParsePatchErrorKind::HunkMismatch, hunk_start)); | ||
| } | ||
|
|
||
| Ok(lines) | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a mix of extractions of
We don't have tool compat tests right now, so it is hard to see the actual compatibility between these changes against GNU patch and Git. You can check tests in those pull requests to figure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your commit message mentions that this behavior matches GNU patch, how does git behave in these cases? Does it match GNU patch?
I suppose weihanglo#23 answers this question, in that git is more strict within a single "file". The question is do we want to be more strict like git or a bit more flexible like GNU patch in these cases? Thoughts, since you seemed to opt to be more flexible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah there is a table in that PR:
We could also be stricter or make it configurable. I forgot why we chose this. Perhaps because it is easier to implement to support both unidiff and gitdiff mode in multi-file patches. So, maybe configurable is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attached a more comprehensive comparison table in PR description btw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if its not too much trouble maybe nice to have the option to be more strict configurable? We can have it default to be permissive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
Patches::from_str_strictandPatches::from_bytes_strictto matchgit apply's stricter parsing rules. Let me know if this is a good API or not.(We could possibly have a
ParseOptionsstruct though not sure if we will go that far)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Yeah this works for now.