ARROW-13772 binds quantile() to tdigest() which returns approximate quantiles and binds median() to approximate_median() which returns an approximate median. The bindings issue a warning saying that the median/quantile is approximate. Once ARROW-13309 is implemented, modify the binding to call Arrow functions that returns exact quantiles and medians, and remove the warnings.
We should keep the approximate quantile and median bindings but rename them.
When doing this, we should also modify the bindings to accept type and interpolation arguments like we do in the quantile.ArrowDatum method:
|
quantile.ArrowDatum <- function(x, |
|
probs = seq(0, 1, 0.25), |
|
na.rm = FALSE, |
|
type = 7, |
|
interpolation = c("linear", "lower", "higher", "nearest", "midpoint"), |
|
...) { |
|
if (inherits(x, "Scalar")) x <- Array$create(x) |
|
assert_is(probs, c("numeric", "integer")) |
|
assert_that(length(probs) > 0) |
|
assert_that(all(probs >= 0 & probs <= 1)) |
|
if (!na.rm && x$null_count > 0) { |
|
stop("Missing values not allowed if 'na.rm' is FALSE", call. = FALSE) |
|
} |
|
if (type != 7) { |
|
stop( |
|
"Argument `type` not supported in Arrow. To control the quantile ", |
|
"interpolation algorithm, set argument `interpolation` to one of: ", |
|
"\"linear\" (the default), \"lower\", \"higher\", \"nearest\", or ", |
|
"\"midpoint\".", |
|
call. = FALSE |
|
) |
|
} |
|
interpolation <- QuantileInterpolation[[toupper(match.arg(interpolation))]] |
|
out <- call_function("quantile", x, options = list(q = probs, interpolation = interpolation)) |
|
if (length(out) == 0) { |
|
# When there are no non-missing values in the data, the Arrow quantile |
|
# function returns an empty Array, but for consistency with the R quantile |
|
# function, we want an Array of NA_real_ with the same length as probs |
|
out <- Array$create(rep(NA_real_, length(probs))) |
|
} |
|
out |
|
} |
Reporter: Ian Cook / @ianmcook
Related issues:
Note: This issue was originally created as ARROW-14021. Please see the migration documentation for further details.
ARROW-13772 binds
quantile()totdigest()which returns approximate quantiles and bindsmedian()toapproximate_median()which returns an approximate median. The bindings issue a warning saying that the median/quantile is approximate. Once ARROW-13309 is implemented, modify the binding to call Arrow functions that returns exact quantiles and medians, and remove the warnings.We should keep the approximate quantile and median bindings but rename them.
When doing this, we should also modify the bindings to accept
typeandinterpolationarguments like we do in thequantile.ArrowDatummethod:arrow/r/R/compute.R
Lines 156 to 187 in 170a24f
Reporter: Ian Cook / @ianmcook
Related issues:
Note: This issue was originally created as ARROW-14021. Please see the migration documentation for further details.