Skip to content

[feat](skew & kurt) New aggregate function skew & kurt #40945#41277

Merged
dataroaring merged 1 commit intoapache:branch-3.0from
zhiqiang-hhhh:pick_40945_to_upstream_branch-3.0
Sep 28, 2024
Merged

[feat](skew & kurt) New aggregate function skew & kurt #40945#41277
dataroaring merged 1 commit intoapache:branch-3.0from
zhiqiang-hhhh:pick_40945_to_upstream_branch-3.0

Conversation

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor

cherry pick from #40945

@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@doris-robot
Copy link
Copy Markdown

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <stddef.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: inclusion of deprecated C++ header 'stddef.h'; consider using 'cstddef' instead [modernize-deprecated-headers]

Suggested change
#include <stddef.h>
#include <cstddef>

++m[0];
m[1] += x;
m[2] += x * x;
if constexpr (_level >= 3) m[3] += x * x * x;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if constexpr (_level >= 3) m[3] += x * x * x;
if constexpr (_level >= 3) { m[3] += x * x * x;
}

m[1] += x;
m[2] += x * x;
if constexpr (_level >= 3) m[3] += x * x * x;
if constexpr (_level >= 4) m[4] += x * x * x * x;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if constexpr (_level >= 4) m[4] += x * x * x * x;
if constexpr (_level >= 4) { m[4] += x * x * x * x;
}

m[0] += rhs.m[0];
m[1] += rhs.m[1];
m[2] += rhs.m[2];
if constexpr (_level >= 3) m[3] += rhs.m[3];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if constexpr (_level >= 3) m[3] += rhs.m[3];
if constexpr (_level >= 3) { m[3] += rhs.m[3];
}

m[1] += rhs.m[1];
m[2] += rhs.m[2];
if constexpr (_level >= 3) m[3] += rhs.m[3];
if constexpr (_level >= 4) m[4] += rhs.m[4];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if constexpr (_level >= 4) m[4] += rhs.m[4];
if constexpr (_level >= 4) { m[4] += rhs.m[4];
}

ErrorCode::INTERNAL_ERROR,
"Variation moments should be obtained by 'get_population' method");
} else {
if (m[0] == 0) return std::numeric_limits<T>::quiet_NaN();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (m[0] == 0) return std::numeric_limits<T>::quiet_NaN();
if (m[0] == 0) { return std::numeric_limits<T>::quiet_NaN();
}

} else {
if (m[0] == 0) return std::numeric_limits<T>::quiet_NaN();
// to avoid accuracy problem
if (m[0] == 1) return 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (m[0] == 1) return 0;
if (m[0] == 1) { return 0;
}

ErrorCode::INTERNAL_ERROR,
"Variation moments should be obtained by 'get_population' method");
} else {
if (m[0] == 0) return std::numeric_limits<T>::quiet_NaN();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (m[0] == 0) return std::numeric_limits<T>::quiet_NaN();
if (m[0] == 0) { return std::numeric_limits<T>::quiet_NaN();
}

} else {
if (m[0] == 0) return std::numeric_limits<T>::quiet_NaN();
// to avoid accuracy problem
if (m[0] == 1) return 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (m[0] == 1) return 0;
if (m[0] == 1) { return 0;
}

Comment on lines +110 to +111
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant return statement at the end of a function with a void return type [readability-redundant-control-flow]

Suggested change
return;
}
}

@zhiqiang-hhhh zhiqiang-hhhh force-pushed the pick_40945_to_upstream_branch-3.0 branch from 8c81afd to ab3e14b Compare September 25, 2024 11:58
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@zhiqiang-hhhh zhiqiang-hhhh force-pushed the pick_40945_to_upstream_branch-3.0 branch from ab3e14b to 89ca88b Compare September 26, 2024 15:40
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

1 similar comment
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

`skew`,`skew_pop` and `skewness` is used to calculate
[skewness](https://en.wikipedia.org/wiki/Skewness#Pearson.27s_moment_coefficient_of_skewness)
of a data distribution.
`kurt`,`kurt_pop` and `kurtosis` is used to calculate
[kurtosis](https://en.wikipedia.org/wiki/Kurtosis) of a data
distribution.

The implementation references
ClickHouse/ClickHouse#5200, and modified result
type to AlwaysNullable since doris do not support NaN.

The formula used to calculate skew is `3-th moments / (variance^{1.5})`
The formula used to calculate kurt is `4-th moments / (variance^{2}) -
3`

when value of any result is NaN, doris will return NULL.

doc: apache/doris-website#1127
@zhiqiang-hhhh zhiqiang-hhhh force-pushed the pick_40945_to_upstream_branch-3.0 branch from 4df774f to 696ad91 Compare September 27, 2024 12:03
@zhiqiang-hhhh
Copy link
Copy Markdown
Contributor Author

run buildall

@dataroaring dataroaring merged commit da3ab02 into apache:branch-3.0 Sep 28, 2024
@gavinchou gavinchou mentioned this pull request Oct 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants