Integration of LB Thesis by pmbittner · Pull Request #63 · VariantSync/DiffDetective

pmbittner · 2023-01-17T09:03:37Z

No description provided.

node duplication has errors => fromID() seems to have errors

Merged commits from the main repo

fix: remove brackets around single words in boolean abstraction

These functions might be useful for other use cases as well and fit into these classes.

A repository is either processed or it's skipped but not both.

This is a more specific name and enables the use of `Validation` for a common set of actions for validations.

This saves some intermediate data structures by using more of `Stream`s potential, but most changes are only cosmetic.

Usages of the constructors with all arguments should use the default constructor and assign the required values afterwards.

ibbem · 2023-02-09T22:39:50Z

As requested, I merged develop into this branch.
I continued to work on the analysis/validation refactoring, so sorry for the big diff :).
Of course, there is still cleanup potential but now I am quite happy about the general structure of the analysis/validation.

ibbem · 2023-02-09T22:40:36Z

@pmbittner ping for review

There are some behavioral changes: - The > temporary fix for renaming from Unchanged to Untouched has been removed. - There are two more metadata keys: `exportedCommits` and `exportedTrees`. These where previously called `processedCommits` and `processedTrees` and used with different meanings in the `DiffTreeMiner` and the validations which caused a bug increasing this these counters twice, although that was probably my mistake, introduced during refactoring or merging. - The order of metadata snapshots has probably changed

pmbittner

Hi @ibbem ,

thanks for the heavy refactoring of the analyses framework. It looks much better now. It was not easy to review it given the extensive amount of changes and the still complicated analyses we perform.

Two major things I noticed are:
1.) I did not review the FeatureSplit parts yet. I have to dig into those in the next weeks. Can you move all the new FeatureSplit related classes to dedicated sub-packages so that we can easily identify them after the merge? What is your opinion on the location of the new code?
2.) Please add some documentation (javadoc) to the new analyses framework. It is still complicated and guidance for developers on how to properly use it is missing.

After that we can merge. Thank you. :)

pmbittner · 2023-02-10T12:06:50Z

+        while (hook.hasNext()) {
+            if (!callHook.apply(hook.next(), this)) {
+                return false;
+            }


FilterHooks should be side-effect free but they aren't. For example, LineGraphExportAnalysis::onParsedCommit has a side-effect. When exitiving here as soon as some filter says "no", we would omit running the remaining side-effects = bug! So I would argue to just run all hooks and make a big AND of their return values.

No. Filter hooks are not required to be side-effect free. I will explain this in the upcoming documentation.

As discussed in our last meeting, the current behaviour is fine but should be documented.

pmbittner · 2023-02-10T12:26:20Z

 import java.util.stream.Collectors;

 /** Accumulates multiple {@link AnalysisResult}s of several datasets. */
 public class MiningResultAccumulator {


Rename to AnalysesResultAccumulator?

I don't think this is a better name for the current implementation as it doesn't accumulate all results but only ExplainedFilterSummary, EditClassCount and StatisticsAnalysis which are created by the DiffTreeMiner.
We could generalize MiningResultAccumulator by either

creating a registry for all Metadata so it can be created in AnalysesResultAccumulator, or

adding the qualified class names to the serialization and creating the relevant classes at runtime.

Then AnalysesResultAccumulator would be appropriate.

Let's discuss that in a meeting. Putting effort in here only makes sense if we actually profit from such a refactoring.

ibbem · 2023-02-20T09:16:46Z

I don't think it's a good idea to move all feature split related operations into a sub package. To me, it does make more sense to have a package with all transformation we can perform on DiffTrees as we have now.
I am currently writing documentation. Stay tuned :)

During discussion you noted that thesis_lb should be essentially freezed and be reset to before the origin/develop merge and all these refactorings. As we loose all the above discussion I will postpone this action until the discussion is finished.

ibbem · 2023-02-20T14:37:58Z

I integrated the changes into benjamin/analysis (#65) and thesis_lb-refactorings. For comparison I uploaded the state before these changes but after the rebase onto develop to benjamin/old/analysis-refactoring (Note: I made some cosmetic changes during the rebase to make the single commits more readable).

pmbittner · 2023-02-28T09:08:43Z

#65 has been merged. Closing this PR now.

TheBormann added 30 commits July 6, 2022 14:58

feat: init featureSplit operator

26f2abf

feat: added node duplication

f0089b8

node duplication has errors => fromID() seems to have errors

Created and tested shallowClone of a tree node

8f9bf85

feat: implemented deepCloning

88b6a8c

feat: created deepCloning for diffTrees

0adc84d

feat: correct subtrees can now be created

5be940b

fix: improved code quality

5e320e1

feat: Implemented cluster generation

db36ee0

feat: implemented DiffTree comparison

9e80934

feat: implemented DiffTree composition

17a11e8

feat: Added basic testing for each component in feature split

28e7ccc

feat: added feature query generator

d5c2160

fix: fixed several bugs

c929398

feat: added foundation for featureSplitValidation

a21e79b

fix: fixed bugs stackOverflow and wrong function call bug

73e079a

feat: added test

eb72f84

fix: fixed atomic diff composition bugs

63aced1

feat: improved implementation of custom featureSplit validation

86fee50

feat: added custom metadatakeys and custom HistoryAnalysis

797ab0a

feat: added feature query generator test

b3a4177

fix: simplyfied composition of DiffTrees

32278b5

fix: improved evaluation metrics

7f5e89b

Merge pull request #1 from VariantSync/main

2bc42da

Merged commits from the main repo

fix: fixed composition bug by changing the algorithm.

4c6cf8b

fix: improved design of the evaluation

b557ef8

Merge pull request #2 from VariantSync/issue47

5245f86

fix: remove brackets around single words in boolean abstraction

Merge remote-tracking branch 'upstream/main' into FeatureSplit

2edc25c

fixed: added error messages and updated dataset.md

1249b2b

Merge branch 'main' into FeatureSplit

ec336ee

fix: edited dataset

beef991

ibbem added 3 commits February 6, 2023 10:39

Move the duplication code to DiffTree and DiffNode

4d75e12

These functions might be useful for other use cases as well and fit into these classes.

Refactor AtomicDiffComparison

fb73bdf

Simplify an ArrayList creation

a1e8a1b

ibbem force-pushed the thesis_lb branch from 144570e to d9fa173 Compare February 9, 2023 21:31

ibbem added 15 commits February 9, 2023 23:25

Factor duplicate code out of *Analysis and *Result

4420d67

Improve log messages in Analysis.forEachRepository

7aaaa58

A repository is either processed or it's skipped but not both.

Merge FeatureSplitAnalysisTask and CommitHistoryAnalysisTask

c5613e3

Rename Validation to EditClassValidation

4947e74

This is a more specific name and enables the use of `Validation` for a common set of actions for validations.

Factor duplicate code out of the validation package

d54f49d

Remove duplicated test

d2860ee

Refactor FeatureSplit

6c170a0

This saves some intermediate data structures by using more of `Stream`s potential, but most changes are only cosmetic.

Merge branch 'develop' into thesis_lb

45c5fd5

Use more of the potential of the Time abstraction

8667de4

Get rid of FACommitExtractionAnalysisTaskFactory

847c8c9

Move all validations into the validation package

87f6694

Use nice notation for *Result constructors

36d7cda

Usages of the constructors with all arguments should use the default constructor and assign the required values afterwards.

Use hooks for modifying analysis behaviour

75be71c

Unify class names related to analysis and validation

c4d76a7

Merge *Validation with *ValidationAnalysis

bbd142b

ibbem force-pushed the thesis_lb branch from d9fa173 to f8cc2f1 Compare February 9, 2023 22:34

ibbem force-pushed the thesis_lb branch from f8cc2f1 to 1b91456 Compare February 10, 2023 08:34

pmbittner commented Feb 10, 2023

View reviewed changes

ibbem mentioned this pull request Feb 20, 2023

Analysis refactoring #65

Merged

pmbittner closed this Feb 28, 2023

Conversation

pmbittner commented Jan 17, 2023

Uh oh!

ibbem commented Feb 9, 2023

Uh oh!

ibbem commented Feb 9, 2023

Uh oh!

pmbittner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmbittner Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

ibbem Feb 20, 2023

Choose a reason for hiding this comment

Uh oh!

pmbittner Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pmbittner Feb 10, 2023

Choose a reason for hiding this comment

Uh oh!

ibbem Feb 20, 2023

Choose a reason for hiding this comment

Uh oh!

pmbittner Feb 21, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ibbem commented Feb 20, 2023

Uh oh!

ibbem commented Feb 20, 2023

Uh oh!

pmbittner commented Feb 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants