Refactor the DiffTreeParser and upgrade to JUnit5 by ibbem · Pull Request #54 · VariantSync/DiffDetective

ibbem · 2022-10-16T12:12:57Z

The code for parsing DiffTrees is now smaller (considering all the code removed in other classes). This is mostly due to the introduction of the LogicalLine which did make most of the other multi line macro parsing obsolete. This change also reflects the actual C11 standard more closely in nomenclature and how it should actually work.
By using checked exceptions (and a little help of the failable interfaces of Apache commons) I simplified the error handling and force the caller of the diff tree parser to handle such errors (I found at least one issue where some method called by the parser could return an error which was not handled). I think in this case checked exceptions are still the better alternative than RuntimeExceptions or forcing monadic (Haskell style) error handling onto Java.

Warning: This refactoring does introduce breaking changes, but this should be limited to edge cases which are now parsed conforming to the C11 standard or rejected if they are represent an invalid program (before or after the change). There are also some edge cases (which are hopefully very rare in practice) which are not handled correctly by this parser (the old code didn't either). Examples include:

 # \
    ifdef A
 #endif

 # /*
   Comment
   */ ifdef A
 #endif

/*
  #ifdef Comments are removed before the preprocessor is run.
*/

Most of the time such programs are rejected by the parser, but there are cases where this isn't the case, for example:

 /*
   #ifdef A
      This code is considered as annotated by Feature A, although it isn't according to C11.
   #endif
 */

For easier testing I upgraded JUnit4 to JUnit5. This enables us to use features like parameterized tests which I used to replace loops over different test cases. This makes such test cases independent (if some fail, the others are still run), prints the specific test case which actually caused a test failure and allows the selection of one such specific test case for the next run, for example to debug it.
Furthermore the ExportTest (using Graphviz) is now disabled if Graphviz can't be executed.

During testing I found a bug in the GraphvizExporter: The escaping didn't work correctly (java threw an exception if GraphvizExporter.escape encountered a backshlash). This is fixed by 478c8b4.

pmbittner · 2022-10-17T07:41:38Z

I guess this PR builds on top of other PRs? Which PRs should we merge first?

ibbem · 2022-10-17T10:16:51Z

No #53 and this PR are independent. There are a some merge conflicts in imports and tests but nothing too complicated. If you don't want to deal with these conflicts you can wait until #53 is merged so I can rebase this PR.

pmbittner · 2022-10-18T11:46:58Z

Ok then I will review this PR soon.

pmbittner

The changes look very good. Thank you! I appreciate the simplifications in the parser and in the tests by upgrading to JUnit 5. The changes in the parser almost replace the parser completely, so it was not possible for me to check on it just from scrolling through big git diffs. The changes made a good impression though and having the test cases should give us more certainty in the correctness of the new parser than the old one. I found some documentation issues. I will also see how the new parser behaves within the ESEC/FSE22 validation soon, just to have a comparison.

ibbem · 2022-10-22T10:07:22Z

The requested changes are now finished. There are also some new modifications of javadoc which I missed initially. The most notable change is the new commit for parsing variation trees from source files.

pmbittner · 2022-10-24T08:59:15Z

Thank you for the changes. Only a few comments are left to discuss / resolve. :)

It's way easier to reason about line numbers as simple numbers which do not have to be copied explicitly.

It was only needed to invalidate the line numbers according to when a `DiffNode` exists, which can be done more easily during assignment of the line number. Note that this invalidation will always be correct because `DiffNode.diffType` is immutable.

In Java there is no way to ensure that a result value is used, so errors might be unhandled (there was at least one case). It also required kind of a weird programming style which doesn't really fit Java (See the `AtomicReference<DiffResult<DiffTree>>` for example). Since the widespread use of lambdas are checked exceptions also a bit problematic, but Apache commons helps here and makes it acceptable.

Line continuations are handled much more consistently by introducing `LogicalLine` and removing `MultiLineMacroParser` and `MultilineMacro`. Moreover, by fully merging logical lines before creating a new `DiffNode`, the `DiffNode` does no longer have to know if its a multi-line macro. Note that logical line is the actual term used by the C11 standard. Some parser functionality is move into the class `DiffTreeParser` to keep concentrate it and keep it private. This includes `NodeType.ofDiffLine` the class `DiffNodeParser`. In case of `DiffNodeParser` this also removes an unnecessary indirection between the `CPPAnnotationParser` and the `DiffTreeParser`. The `DiffTreeParser` now keeps some parsing state to accommodate the splitting of the parsing code into multiple, more coherent functions without passing the whole state around. This refactoring uncovered multiple new edge cases which got handled in one of the following ways: - The parsing is now aligned with the C11 standard. See the changed test case for example (empty lines are ignore in this test). - These edge cases are now rejected because they are invalid according to the C11 standard. - They are documented. This especially applies to macros inside of multi-line comments and line continuations or multi-line comments between `#` and a conditional macro name. Both issues are currently regarded as WONTFIX.

Without specifying the TikZ `align` property the new lines are ignored.

Most of the time, is shouldn't be necessary to ignore empty lines because they are collapsed into the same artifact node. This also uncovered weird test resources which where not valid diffs (they where missing the initial whitespace on empty lines).

This makes `maven test` consistent with Intellij.

This change makes it easy to identify failed test cases and rerun only some of them. Also, all these test cases are run independently while reducing the boiler plate for adding a new test case. Note: `mvn test` (the surefire plugin) doesn't really print nice parameter names if a test case fails. If you don't use an IDE with JUnit5 integration, it is possible to use the JUnit5 console launcher. For example, I create the following shell function: ```bash wget https://repo1.maven.org/maven2/org/junit/platform/junit-platform-console-standalone/1.9.1/junit-platform-console-standalone-1.9.1.jar junit() { mvn test-compile exec:exec -Dexec.classpathScope=test -Dexec.executable=java \ -Dexec.args="-classpath %classpath:'junit-platform-console-standalone-1.9.1.jar' org.junit.platform.console.ConsoleLauncher -e junit-jupiter -n '.*' ${*:- --scan-classpath %classpath}" } ``` This enables me to use `junit` to execute all tests and use all the test selectors provided by JUnit5 (see `junit --help` for details).

This helps in debugging and should reduce memory usage.

ibbem · 2022-10-25T15:00:08Z

Note: My last force push was to rebase this branch onto develop to resolve the conflicts.

This heuristic should catch all diffs with modifications and only fail for very obscurely formatted source code (given the set of languages which are commonly parsed using DiffDetective). The warning is also logged for every line which starts with a plus or minus, resulting in many lines of the same log output if the given file is actually a diff which should make it easy to detect. Currently, this can't be disabled, but there might be a demand to disable it in the future.

ibbem requested a review from pmbittner October 16, 2022 12:12

ibbem mentioned this pull request Oct 16, 2022

Update Main with New Features and Artifact Camera-Ready Changes for ESEC/FSE'22 #52

Merged

ibbem force-pushed the benjamin/parser-refactoring branch 4 times, most recently from 7044124 to 5a542d9 Compare October 16, 2022 13:03

ibbem mentioned this pull request Oct 17, 2022

Various refactorings mainly focused on the LineGraph exporter #53

Merged

pmbittner assigned ibbem Oct 18, 2022

ibbem mentioned this pull request Oct 18, 2022

Issue 47 #50

Merged

pmbittner added enhancement New feature or request Refactoring labels Oct 19, 2022

pmbittner requested changes Oct 19, 2022

View reviewed changes

ibbem force-pushed the benjamin/parser-refactoring branch 3 times, most recently from c1c43cc to 413f0dd Compare October 22, 2022 10:04

ibbem requested a review from pmbittner October 22, 2022 10:07

pmbittner requested changes Oct 24, 2022

View reviewed changes

Comment thread src/main/java/org/variantsync/diffdetective/diff/difftree/parse/DiffTreeParser.java

ibbem force-pushed the benjamin/parser-refactoring branch from 413f0dd to 414524d Compare October 25, 2022 10:06

ibbem added 7 commits October 25, 2022 12:11

Refactor DiffLineNumber to a record

092cbe5

It's way easier to reason about line numbers as simple numbers which do not have to be copied explicitly.

Transform DiffError to an enum

3ddcab7

Use Path instead of String in TikzExporter

29bd17f

Properly quote the replacement for Graphiz escaping

8dbe834

ibbem added 9 commits October 25, 2022 12:14

Fix multi line labels in TikZ examples

800882c

Without specifying the TikZ `align` property the new lines are ignored.

Do not ignore empty lines in tests

2567dca

Most of the time, is shouldn't be necessary to ignore empty lines because they are collapsed into the same artifact node. This also uncovered weird test resources which where not valid diffs (they where missing the initial whitespace on empty lines).

Create tests for DiffTreeParser

55f9e70

Upgrade from JUnit4 to JUnit5

97782b4

Do not exclude tests by their class name

0d6fefe

This makes `maven test` consistent with Intellij.

Disable tests depending on Graphviz if its not installed

c753661

Keep a copy of failed line graph export tests

e5a8b5c

This helps in debugging and should reduce memory usage.

Create a method for parsing variation trees

fcfe954

pmbittner added the bm_work ibbem is paid for working on this label Oct 25, 2022

ibbem force-pushed the benjamin/parser-refactoring branch from 414524d to fcfe954 Compare October 25, 2022 14:57

ibbem mentioned this pull request Oct 25, 2022

Remove the Fragile* interfaces in favor of the Failable* interfaces of Apache Commons Lang3 VariantSync/Functjonal#4

Closed

ibbem requested a review from pmbittner October 25, 2022 15:58

pmbittner approved these changes Oct 25, 2022

View reviewed changes

pmbittner merged commit 47712a1 into develop Oct 25, 2022

pmbittner deleted the benjamin/parser-refactoring branch November 2, 2022 09:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor the DiffTreeParser and upgrade to JUnit5#54

Refactor the DiffTreeParser and upgrade to JUnit5#54
pmbittner merged 17 commits into
developfrom
benjamin/parser-refactoring

ibbem commented Oct 16, 2022 •

edited

Loading

Uh oh!

pmbittner commented Oct 17, 2022

Uh oh!

ibbem commented Oct 17, 2022

Uh oh!

pmbittner commented Oct 18, 2022

Uh oh!

pmbittner left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ibbem commented Oct 22, 2022

Uh oh!

Uh oh!

pmbittner commented Oct 24, 2022

Uh oh!

ibbem commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ibbem commented Oct 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmbittner commented Oct 17, 2022

Uh oh!

ibbem commented Oct 17, 2022

Uh oh!

pmbittner commented Oct 18, 2022

Uh oh!

pmbittner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ibbem commented Oct 22, 2022

Uh oh!

Uh oh!

pmbittner commented Oct 24, 2022

Uh oh!

ibbem commented Oct 25, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ibbem commented Oct 16, 2022 •

edited

Loading