Refactor the DiffTreeParser and upgrade to JUnit5#54
Conversation
7044124 to
5a542d9
Compare
|
I guess this PR builds on top of other PRs? Which PRs should we merge first? |
|
Ok then I will review this PR soon. |
pmbittner
left a comment
There was a problem hiding this comment.
The changes look very good. Thank you! I appreciate the simplifications in the parser and in the tests by upgrading to JUnit 5. The changes in the parser almost replace the parser completely, so it was not possible for me to check on it just from scrolling through big git diffs. The changes made a good impression though and having the test cases should give us more certainty in the correctness of the new parser than the old one. I found some documentation issues. I will also see how the new parser behaves within the ESEC/FSE22 validation soon, just to have a comparison.
c1c43cc to
413f0dd
Compare
|
The requested changes are now finished. There are also some new modifications of javadoc which I missed initially. The most notable change is the new commit for parsing variation trees from source files. |
|
Thank you for the changes. Only a few comments are left to discuss / resolve. :) |
413f0dd to
414524d
Compare
It's way easier to reason about line numbers as simple numbers which do not have to be copied explicitly.
It was only needed to invalidate the line numbers according to when a `DiffNode` exists, which can be done more easily during assignment of the line number. Note that this invalidation will always be correct because `DiffNode.diffType` is immutable.
In Java there is no way to ensure that a result value is used, so errors might be unhandled (there was at least one case). It also required kind of a weird programming style which doesn't really fit Java (See the `AtomicReference<DiffResult<DiffTree>>` for example). Since the widespread use of lambdas are checked exceptions also a bit problematic, but Apache commons helps here and makes it acceptable.
Line continuations are handled much more consistently by introducing `LogicalLine` and removing `MultiLineMacroParser` and `MultilineMacro`. Moreover, by fully merging logical lines before creating a new `DiffNode`, the `DiffNode` does no longer have to know if its a multi-line macro. Note that logical line is the actual term used by the C11 standard. Some parser functionality is move into the class `DiffTreeParser` to keep concentrate it and keep it private. This includes `NodeType.ofDiffLine` the class `DiffNodeParser`. In case of `DiffNodeParser` this also removes an unnecessary indirection between the `CPPAnnotationParser` and the `DiffTreeParser`. The `DiffTreeParser` now keeps some parsing state to accommodate the splitting of the parsing code into multiple, more coherent functions without passing the whole state around. This refactoring uncovered multiple new edge cases which got handled in one of the following ways: - The parsing is now aligned with the C11 standard. See the changed test case for example (empty lines are ignore in this test). - These edge cases are now rejected because they are invalid according to the C11 standard. - They are documented. This especially applies to macros inside of multi-line comments and line continuations or multi-line comments between `#` and a conditional macro name. Both issues are currently regarded as WONTFIX.
Without specifying the TikZ `align` property the new lines are ignored.
Most of the time, is shouldn't be necessary to ignore empty lines because they are collapsed into the same artifact node. This also uncovered weird test resources which where not valid diffs (they where missing the initial whitespace on empty lines).
This makes `maven test` consistent with Intellij.
This change makes it easy to identify failed test cases and rerun only some of them. Also, all these test cases are run independently while reducing the boiler plate for adding a new test case. Note: `mvn test` (the surefire plugin) doesn't really print nice parameter names if a test case fails. If you don't use an IDE with JUnit5 integration, it is possible to use the JUnit5 console launcher. For example, I create the following shell function: ```bash wget https://repo1.maven.org/maven2/org/junit/platform/junit-platform-console-standalone/1.9.1/junit-platform-console-standalone-1.9.1.jar junit() { mvn test-compile exec:exec -Dexec.classpathScope=test -Dexec.executable=java \ -Dexec.args="-classpath %classpath:'junit-platform-console-standalone-1.9.1.jar' org.junit.platform.console.ConsoleLauncher -e junit-jupiter -n '.*' ${*:- --scan-classpath %classpath}" } ``` This enables me to use `junit` to execute all tests and use all the test selectors provided by JUnit5 (see `junit --help` for details).
This helps in debugging and should reduce memory usage.
414524d to
fcfe954
Compare
|
Note: My last force push was to rebase this branch onto develop to resolve the conflicts. |
This heuristic should catch all diffs with modifications and only fail for very obscurely formatted source code (given the set of languages which are commonly parsed using DiffDetective). The warning is also logged for every line which starts with a plus or minus, resulting in many lines of the same log output if the given file is actually a diff which should make it easy to detect. Currently, this can't be disabled, but there might be a demand to disable it in the future.
The code for parsing
DiffTrees is now smaller (considering all the code removed in other classes). This is mostly due to the introduction of theLogicalLinewhich did make most of the other multi line macro parsing obsolete. This change also reflects the actual C11 standard more closely in nomenclature and how it should actually work.By using checked exceptions (and a little help of the failable interfaces of Apache commons) I simplified the error handling and force the caller of the diff tree parser to handle such errors (I found at least one issue where some method called by the parser could return an error which was not handled). I think in this case checked exceptions are still the better alternative than
RuntimeExceptions or forcing monadic (Haskell style) error handling onto Java.Warning: This refactoring does introduce breaking changes, but this should be limited to edge cases which are now parsed conforming to the C11 standard or rejected if they are represent an invalid program (before or after the change). There are also some edge cases (which are hopefully very rare in practice) which are not handled correctly by this parser (the old code didn't either). Examples include:
# \ ifdef A #endif # /* Comment */ ifdef A #endif /* #ifdef Comments are removed before the preprocessor is run. */Most of the time such programs are rejected by the parser, but there are cases where this isn't the case, for example:
/* #ifdef A This code is considered as annotated by Feature A, although it isn't according to C11. #endif */For easier testing I upgraded JUnit4 to JUnit5. This enables us to use features like parameterized tests which I used to replace loops over different test cases. This makes such test cases independent (if some fail, the others are still run), prints the specific test case which actually caused a test failure and allows the selection of one such specific test case for the next run, for example to debug it.
Furthermore the
ExportTest(using Graphviz) is now disabled if Graphviz can't be executed.During testing I found a bug in the
GraphvizExporter: The escaping didn't work correctly (java threw an exception ifGraphvizExporter.escapeencountered a backshlash). This is fixed by 478c8b4.