Integration of LB Thesis#63
Conversation
node duplication has errors => fromID() seems to have errors
Merged commits from the main repo
fix: remove brackets around single words in boolean abstraction
These functions might be useful for other use cases as well and fit into these classes.
A repository is either processed or it's skipped but not both.
This is a more specific name and enables the use of `Validation` for a common set of actions for validations.
This saves some intermediate data structures by using more of `Stream`s potential, but most changes are only cosmetic.
Usages of the constructors with all arguments should use the default constructor and assign the required values afterwards.
|
As requested, I merged develop into this branch. |
|
@pmbittner ping for review |
There are some behavioral changes: - The > temporary fix for renaming from Unchanged to Untouched has been removed. - There are two more metadata keys: `exportedCommits` and `exportedTrees`. These where previously called `processedCommits` and `processedTrees` and used with different meanings in the `DiffTreeMiner` and the validations which caused a bug increasing this these counters twice, although that was probably my mistake, introduced during refactoring or merging. - The order of metadata snapshots has probably changed
pmbittner
left a comment
There was a problem hiding this comment.
Hi @ibbem ,
thanks for the heavy refactoring of the analyses framework. It looks much better now. It was not easy to review it given the extensive amount of changes and the still complicated analyses we perform.
Two major things I noticed are:
1.) I did not review the FeatureSplit parts yet. I have to dig into those in the next weeks. Can you move all the new FeatureSplit related classes to dedicated sub-packages so that we can easily identify them after the merge? What is your opinion on the location of the new code?
2.) Please add some documentation (javadoc) to the new analyses framework. It is still complicated and guidance for developers on how to properly use it is missing.
After that we can merge. Thank you. :)
| while (hook.hasNext()) { | ||
| if (!callHook.apply(hook.next(), this)) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
FilterHooks should be side-effect free but they aren't. For example, LineGraphExportAnalysis::onParsedCommit has a side-effect. When exitiving here as soon as some filter says "no", we would omit running the remaining side-effects = bug! So I would argue to just run all hooks and make a big AND of their return values.
There was a problem hiding this comment.
No. Filter hooks are not required to be side-effect free. I will explain this in the upcoming documentation.
There was a problem hiding this comment.
As discussed in our last meeting, the current behaviour is fine but should be documented.
| import java.util.stream.Collectors; | ||
|
|
||
| /** Accumulates multiple {@link AnalysisResult}s of several datasets. */ | ||
| public class MiningResultAccumulator { |
There was a problem hiding this comment.
Rename to AnalysesResultAccumulator?
There was a problem hiding this comment.
I don't think this is a better name for the current implementation as it doesn't accumulate all results but only ExplainedFilterSummary, EditClassCount and StatisticsAnalysis which are created by the DiffTreeMiner.
We could generalize MiningResultAccumulator by either
- creating a registry for all
Metadataso it can be created inAnalysesResultAccumulator, or - adding the qualified class names to the serialization and creating the relevant classes at runtime.
Then AnalysesResultAccumulator would be appropriate.
There was a problem hiding this comment.
Let's discuss that in a meeting. Putting effort in here only makes sense if we actually profit from such a refactoring.
During discussion you noted that |
|
I integrated the changes into |
|
#65 has been merged. Closing this PR now. |
No description provided.