Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/patch/error.rs
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a mix of extractions of

We don't have tool compat tests right now, so it is hard to see the actual compatibility between these changes against GNU patch and Git. You can check tests in those pull requests to figure.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your commit message mentions that this behavior matches GNU patch, how does git behave in these cases? Does it match GNU patch?

I suppose weihanglo#23 answers this question, in that git is more strict within a single "file". The question is do we want to be more strict like git or a bit more flexible like GNU patch in these cases? Thoughts, since you seemed to opt to be more flexible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there is a table in that PR:

Junk Between Hunks vs Between Files

Scenario GNU patch git apply diffy
Junk between hunks (same file) ✅ Ignores trailing, applies first hunk only ❌ Errors ✅ Ignores trailing, applies first hunk only
Junk between files ✅ Treats as preamble ✅ Treats as preamble ✅ Treats as preamble
Trailing junk at end ✅ Ignores ✅ Ignores ✅ Ignores

diffy matches GNU patch behavior.
git apply is stricter (errors on junk between hunks).

We could also be stricter or make it configurable. I forgot why we chose this. Perhaps because it is easier to implement to support both unidiff and gitdiff mode in multi-file patches. So, maybe configurable is better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attached a more comprehensive comparison table in PR description btw.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if its not too much trouble maybe nice to have the option to be more strict configurable? We can have it default to be permissive.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Patches::from_str_strict and Patches::from_bytes_strict to match git apply's stricter parsing rules. Let me know if this is a good API or not.

(We could possibly have a ParseOptions struct though not sure if we will go that far)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Yeah this works for now.

Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,9 @@ pub(crate) enum ParsePatchErrorKind {

/// Missing newline at end of line.
MissingNewline,

/// Orphaned hunk header found after trailing content.
OrphanedHunkHeader,
}

impl fmt::Display for ParsePatchErrorKind {
Expand All @@ -132,6 +135,7 @@ impl fmt::Display for ParsePatchErrorKind {
Self::UnexpectedNoNewlineMarker => "unexpected 'No newline at end of file' line",
Self::UnexpectedHunkLine => "unexpected line in hunk body",
Self::MissingNewline => "missing newline",
Self::OrphanedHunkHeader => "orphaned hunk header after trailing content",
};
write!(f, "{msg}")
}
Expand Down
42 changes: 42 additions & 0 deletions src/patch/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,30 @@ use crate::utils::{byte_needs_quoting, fmt_escaped_byte, write_escaped_byte};
const NO_NEWLINE_AT_EOF: &str = "\\ No newline at end of file";

/// Representation of all the differences between two files
///
/// # Parsing modes
///
/// `Patch` provides two parsing modes with different strictness levels,
/// modeled after the behavior of GNU patch and `git apply`:
///
/// | Scenario | GNU patch | git apply | [`from_str`] | [`from_str_strict`] |
/// |-----------------------------------|-------------|-----------|--------------|---------------------|
/// | Junk after all hunks are complete | Ignores | Ignores | Ignores | Ignores |
/// | Junk between hunks | Ignores[^1] | Errors | Ignores[^1] | Errors |
///
/// [^1]: "Ignores" here means silently stopping at the junk.
/// Only hunks before it are parsed; later hunks are dropped.
///
/// [`from_str`] and [`from_bytes`] follow GNU patch behavior,
/// silently ignoring non-patch content after a hunk's line counts are satisfied.
///
/// [`from_str_strict`] and [`from_bytes_strict`] follow `git apply` behavior,
/// additionally rejecting orphaned hunk headers hidden behind trailing content.
///
/// [`from_str`]: Patch::from_str
/// [`from_bytes`]: Patch::from_bytes
/// [`from_str_strict`]: Patch::from_str_strict
/// [`from_bytes_strict`]: Patch::from_bytes_strict
#[derive(PartialEq, Eq)]
pub struct Patch<'a, T: ToOwned + ?Sized> {
// TODO GNU patch is able to parse patches without filename headers.
Expand Down Expand Up @@ -108,13 +132,31 @@ impl<'a> Patch<'a, str> {
pub fn from_str(s: &'a str) -> Result<Patch<'a, str>, ParsePatchError> {
parse::parse(s)
}

/// Parse a `Patch` from a string in strict mode
///
/// Unlike [`Patch::from_str`],
/// this rejects orphaned hunk headers hidden after trailing content,
/// matching `git apply` behavior.
pub fn from_str_strict(s: &'a str) -> Result<Patch<'a, str>, ParsePatchError> {
parse::parse_strict(s)
}
}

impl<'a> Patch<'a, [u8]> {
/// Parse a `Patch` from bytes
pub fn from_bytes(s: &'a [u8]) -> Result<Patch<'a, [u8]>, ParsePatchError> {
parse::parse_bytes(s)
}

/// Parse a `Patch` from bytes in strict mode
///
/// Unlike [`Patch::from_bytes`],
/// this rejects orphaned hunk headers hidden after trailing content,
/// matching `git apply` behavior.
pub fn from_bytes_strict(s: &'a [u8]) -> Result<Patch<'a, [u8]>, ParsePatchError> {
parse::parse_bytes_strict(s)
}
}

impl<T: ToOwned + ?Sized> Clone for Patch<'_, T> {
Expand Down
103 changes: 92 additions & 11 deletions src/patch/parse.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,19 @@ pub fn parse(input: &str) -> Result<Patch<'_, str>> {
))
}

pub fn parse_strict(input: &str) -> Result<Patch<'_, str>> {
let mut parser = Parser::new(input);
let header = patch_header(&mut parser)?;
let hunks = hunks(&mut parser)?;
reject_orphaned_hunk_headers(&mut parser)?;

Ok(Patch::new(
header.0.map(convert_cow_to_str),
header.1.map(convert_cow_to_str),
hunks,
))
}

pub fn parse_bytes(input: &[u8]) -> Result<Patch<'_, [u8]>> {
let mut parser = Parser::new(input);
let header = patch_header(&mut parser)?;
Expand All @@ -73,6 +86,15 @@ pub fn parse_bytes(input: &[u8]) -> Result<Patch<'_, [u8]>> {
Ok(Patch::new(header.0, header.1, hunks))
}

pub fn parse_bytes_strict(input: &[u8]) -> Result<Patch<'_, [u8]>> {
let mut parser = Parser::new(input);
let header = patch_header(&mut parser)?;
let hunks = hunks(&mut parser)?;
reject_orphaned_hunk_headers(&mut parser)?;

Ok(Patch::new(header.0, header.1, hunks))
}

// This is only used when the type originated as a utf8 string
fn convert_cow_to_str(cow: Cow<'_, [u8]>) -> Cow<'_, str> {
match cow {
Expand Down Expand Up @@ -154,9 +176,26 @@ fn verify_hunks_in_order<T: ?Sized>(hunks: &[Hunk<'_, T>]) -> bool {
true
}

/// Scans remaining lines for orphaned `@@ ` hunk headers.
///
/// In strict mode (git-apply behavior), trailing junk is allowed but
/// an `@@ ` line hiding behind that junk indicates a lost hunk.
fn reject_orphaned_hunk_headers<T: Text + ?Sized>(parser: &mut Parser<'_, T>) -> Result<()> {
while let Some(line) = parser.peek() {
if line.starts_with("@@ ") {
return Err(parser.error(ParsePatchErrorKind::OrphanedHunkHeader));
}
parser.next()?;
}
Ok(())
}

fn hunks<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Hunk<'a, T>>> {
let mut hunks = Vec::new();
while parser.peek().is_some() {
// Following GNU patch behavior: stop at non-@@ content.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// Any trailing content (including hidden @@ headers) is silently ignored.
// This is more permissive than git apply, which errors on junk between hunks.
while parser.peek().is_some_and(|line| line.starts_with("@@ ")) {
hunks.push(hunk(parser)?);
}

Expand All @@ -173,13 +212,7 @@ fn hunk<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Hunk<'a, T>>
let header_line = parser.next()?;
let (range1, range2, function_context) =
hunk_header(header_line).map_err(|e| parser.error_at(e.kind, hunk_start))?;
let lines = hunk_lines(parser)?;

// check counts of lines to see if they match the ranges in the hunk header
let (len1, len2) = super::hunk_lines_count(&lines);
if len1 != range1.len || len2 != range2.len {
return Err(parser.error_at(ParsePatchErrorKind::HunkMismatch, hunk_start));
}
let lines = hunk_lines(parser, range1.len, range2.len, hunk_start)?;

Ok(Hunk::new(range1, range2, function_context, lines))
}
Expand Down Expand Up @@ -223,36 +256,61 @@ fn range<T: Text + ?Sized>(s: &T) -> Result<HunkRange> {
Ok(HunkRange::new(start, len))
}

fn hunk_lines<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Line<'a, T>>> {
fn hunk_lines<'a, T: Text + ?Sized>(
parser: &mut Parser<'a, T>,
expected_old: usize,
expected_new: usize,
hunk_start: usize,
) -> Result<Vec<Line<'a, T>>> {
let mut lines: Vec<Line<'a, T>> = Vec::new();
let mut no_newline_context = false;
let mut no_newline_delete = false;
let mut no_newline_insert = false;

let mut old_count = 0;
let mut new_count = 0;

while let Some(line) = parser.peek() {
let hunk_complete = old_count >= expected_old && new_count >= expected_new;

let line = if line.starts_with("@") {
break;
} else if no_newline_context {
if hunk_complete {
break;
}
Comment on lines +279 to +281
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kinda of early return would help us stop caring stripping garbage like email signature, which GNU patch is resilient to that.

weihanglo@7d0acc3

return Err(parser.error(ParsePatchErrorKind::ExpectedEndOfHunk));
} else if let Some(line) = line.strip_prefix(" ") {
if hunk_complete {
break;
}
Line::Context(line)
} else if line.starts_with("\n") {
if hunk_complete {
break;
}
Line::Context(*line)
} else if let Some(line) = line.strip_prefix("-") {
if no_newline_delete {
return Err(parser.error(ParsePatchErrorKind::TooManyDeletedLines));
}
if hunk_complete {
break;
}
Line::Delete(line)
} else if let Some(line) = line.strip_prefix("+") {
if no_newline_insert {
return Err(parser.error(ParsePatchErrorKind::TooManyInsertedLines));
}
if hunk_complete {
break;
}
Line::Insert(line)
} else if line.starts_with(NO_NEWLINE_AT_EOF) {
let last_line = lines
.pop()
.ok_or_else(|| parser.error(ParsePatchErrorKind::UnexpectedNoNewlineMarker))?;
match last_line {
let modified = match last_line {
Line::Context(line) => {
no_newline_context = true;
Line::Context(strip_newline(line)?)
Expand All @@ -265,15 +323,38 @@ fn hunk_lines<'a, T: Text + ?Sized>(parser: &mut Parser<'a, T>) -> Result<Vec<Li
no_newline_insert = true;
Line::Insert(strip_newline(line)?)
}
}
};
lines.push(modified);
parser.next()?;
continue;
} else {
if hunk_complete {
break;
}
return Err(parser.error(ParsePatchErrorKind::UnexpectedHunkLine));
};

match &line {
Line::Context(_) => {
old_count += 1;
new_count += 1;
}
Line::Delete(_) => {
old_count += 1;
}
Line::Insert(_) => {
new_count += 1;
}
}

lines.push(line);
parser.next()?;
}

if old_count != expected_old || new_count != expected_new {
return Err(parser.error_at(ParsePatchErrorKind::HunkMismatch, hunk_start));
}

Ok(lines)
}

Expand Down
Loading