Skip to content

Refactor the DiffTreeParser and upgrade to JUnit5#54

Merged
pmbittner merged 17 commits into
developfrom
benjamin/parser-refactoring
Oct 25, 2022
Merged

Refactor the DiffTreeParser and upgrade to JUnit5#54
pmbittner merged 17 commits into
developfrom
benjamin/parser-refactoring

Conversation

@ibbem
Copy link
Copy Markdown
Collaborator

@ibbem ibbem commented Oct 16, 2022

The code for parsing DiffTrees is now smaller (considering all the code removed in other classes). This is mostly due to the introduction of the LogicalLine which did make most of the other multi line macro parsing obsolete. This change also reflects the actual C11 standard more closely in nomenclature and how it should actually work.
By using checked exceptions (and a little help of the failable interfaces of Apache commons) I simplified the error handling and force the caller of the diff tree parser to handle such errors (I found at least one issue where some method called by the parser could return an error which was not handled). I think in this case checked exceptions are still the better alternative than RuntimeExceptions or forcing monadic (Haskell style) error handling onto Java.

Warning: This refactoring does introduce breaking changes, but this should be limited to edge cases which are now parsed conforming to the C11 standard or rejected if they are represent an invalid program (before or after the change). There are also some edge cases (which are hopefully very rare in practice) which are not handled correctly by this parser (the old code didn't either). Examples include:

 # \
    ifdef A
 #endif

 # /*
   Comment
   */ ifdef A
 #endif

/*
  #ifdef Comments are removed before the preprocessor is run.
*/

Most of the time such programs are rejected by the parser, but there are cases where this isn't the case, for example:

 /*
   #ifdef A
      This code is considered as annotated by Feature A, although it isn't according to C11.
   #endif
 */

For easier testing I upgraded JUnit4 to JUnit5. This enables us to use features like parameterized tests which I used to replace loops over different test cases. This makes such test cases independent (if some fail, the others are still run), prints the specific test case which actually caused a test failure and allows the selection of one such specific test case for the next run, for example to debug it.
Furthermore the ExportTest (using Graphviz) is now disabled if Graphviz can't be executed.

During testing I found a bug in the GraphvizExporter: The escaping didn't work correctly (java threw an exception if GraphvizExporter.escape encountered a backshlash). This is fixed by 478c8b4.

@ibbem ibbem requested a review from pmbittner October 16, 2022 12:12
@ibbem ibbem force-pushed the benjamin/parser-refactoring branch 4 times, most recently from 7044124 to 5a542d9 Compare October 16, 2022 13:03
@pmbittner
Copy link
Copy Markdown
Member

I guess this PR builds on top of other PRs? Which PRs should we merge first?

@ibbem
Copy link
Copy Markdown
Collaborator Author

ibbem commented Oct 17, 2022

No #53 and this PR are independent. There are a some merge conflicts in imports and tests but nothing too complicated. If you don't want to deal with these conflicts you can wait until #53 is merged so I can rebase this PR.

@pmbittner
Copy link
Copy Markdown
Member

Ok then I will review this PR soon.

@ibbem ibbem mentioned this pull request Oct 18, 2022
@pmbittner pmbittner added enhancement New feature or request Refactoring labels Oct 19, 2022
Copy link
Copy Markdown
Member

@pmbittner pmbittner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look very good. Thank you! I appreciate the simplifications in the parser and in the tests by upgrading to JUnit 5. The changes in the parser almost replace the parser completely, so it was not possible for me to check on it just from scrolling through big git diffs. The changes made a good impression though and having the test cases should give us more certainty in the correctness of the new parser than the old one. I found some documentation issues. I will also see how the new parser behaves within the ESEC/FSE22 validation soon, just to have a comparison.

Comment thread src/main/java/org/variantsync/diffdetective/analysis/AnalysisResult.java Outdated
Comment thread src/main/java/org/variantsync/diffdetective/diff/GitDiffer.java
Comment thread src/main/java/org/variantsync/diffdetective/diff/difftree/parse/LogicalLine.java Outdated
Comment thread src/test/java/ExportTest.java
@ibbem ibbem force-pushed the benjamin/parser-refactoring branch 3 times, most recently from c1c43cc to 413f0dd Compare October 22, 2022 10:04
@ibbem
Copy link
Copy Markdown
Collaborator Author

ibbem commented Oct 22, 2022

The requested changes are now finished. There are also some new modifications of javadoc which I missed initially. The most notable change is the new commit for parsing variation trees from source files.

@ibbem ibbem requested a review from pmbittner October 22, 2022 10:07
@pmbittner
Copy link
Copy Markdown
Member

Thank you for the changes. Only a few comments are left to discuss / resolve. :)

@ibbem ibbem force-pushed the benjamin/parser-refactoring branch from 413f0dd to 414524d Compare October 25, 2022 10:06
ibbem added 7 commits October 25, 2022 12:11
It's way easier to reason about line numbers as simple numbers which do
not have to be copied explicitly.
It was only needed to invalidate the line numbers according to when a
`DiffNode` exists, which can be done more easily during assignment of
the line number. Note that this invalidation will always be correct
because `DiffNode.diffType` is immutable.
In Java there is no way to ensure that a result value is used, so errors
might be unhandled (there was at least one case). It also required kind
of a weird programming style which doesn't really fit Java (See the
`AtomicReference<DiffResult<DiffTree>>` for example). Since the
widespread use of lambdas are checked exceptions also a bit problematic,
but Apache commons helps here and makes it acceptable.
Line continuations are handled much more consistently by introducing
`LogicalLine` and removing `MultiLineMacroParser` and `MultilineMacro`.
Moreover, by fully merging logical lines before creating a new
`DiffNode`, the `DiffNode` does no longer have to know if its a
multi-line macro. Note that logical line is the actual term used by the
C11 standard.

Some parser functionality is move into the class `DiffTreeParser` to
keep concentrate it and keep it private. This includes
`NodeType.ofDiffLine` the class `DiffNodeParser`. In case of
`DiffNodeParser` this also removes an unnecessary indirection between
the `CPPAnnotationParser` and the `DiffTreeParser`.

The `DiffTreeParser` now keeps some parsing state to accommodate the
splitting of the parsing code into multiple, more coherent functions
without passing the whole state around.

This refactoring uncovered multiple new edge cases which got handled in
one of the following ways:
- The parsing is now aligned with the C11 standard. See the changed test
  case for example (empty lines are ignore in this test).
- These edge cases are now rejected because they are invalid according
  to the C11 standard.
- They are documented.
  This especially applies to macros inside of multi-line comments and
  line continuations or multi-line comments between `#` and a
  conditional macro name. Both issues are currently regarded as WONTFIX.
ibbem added 9 commits October 25, 2022 12:14
Without specifying the TikZ `align` property the new lines are ignored.
Most of the time, is shouldn't be necessary to ignore empty lines
because they are collapsed into the same artifact node. This also
uncovered weird test resources which where not valid diffs (they where
missing the initial whitespace on empty lines).
This makes `maven test` consistent with Intellij.
This change makes it easy to identify failed test cases and rerun only
some of them. Also, all these test cases are run independently while
reducing the boiler plate for adding a new test case.

Note: `mvn test` (the surefire plugin) doesn't really print nice
parameter names if a test case fails. If you don't use an IDE with
JUnit5 integration, it is possible to use the JUnit5 console launcher.
For example, I create the following shell function:
```bash
wget https://repo1.maven.org/maven2/org/junit/platform/junit-platform-console-standalone/1.9.1/junit-platform-console-standalone-1.9.1.jar
junit() {
  mvn test-compile exec:exec -Dexec.classpathScope=test -Dexec.executable=java \
    -Dexec.args="-classpath %classpath:'junit-platform-console-standalone-1.9.1.jar' org.junit.platform.console.ConsoleLauncher -e junit-jupiter -n '.*' ${*:- --scan-classpath %classpath}"
}
```
This enables me to use `junit` to execute all tests and use all the test
selectors provided by JUnit5 (see `junit --help` for details).
This helps in debugging and should reduce memory usage.
@pmbittner pmbittner added the bm_work ibbem is paid for working on this label Oct 25, 2022
@ibbem ibbem force-pushed the benjamin/parser-refactoring branch from 414524d to fcfe954 Compare October 25, 2022 14:57
@ibbem
Copy link
Copy Markdown
Collaborator Author

ibbem commented Oct 25, 2022

Note: My last force push was to rebase this branch onto develop to resolve the conflicts.

This heuristic should catch all diffs with modifications and only fail
for very obscurely formatted source code (given the set of languages
which are commonly parsed using DiffDetective). The warning is also
logged for every line which starts with a plus or minus, resulting in
many lines of the same log output if the given file is actually a diff
which should make it easy to detect.

Currently, this can't be disabled, but there might be a demand to
disable it in the future.
@pmbittner pmbittner merged commit 47712a1 into develop Oct 25, 2022
@pmbittner pmbittner deleted the benjamin/parser-refactoring branch November 2, 2022 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bm_work ibbem is paid for working on this enhancement New feature or request Refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants