`type = function()` by vincentarelbundock · Pull Request #168 · grantmcdermott/tinyplot

vincentarelbundock · 2024-07-20T02:46:14Z

In this comment @grantmcdermott says he’d like to support user-supplied functions as a type. This is a WIP proof of concept, just to see what people would like in a “real” implementation.

The idea is simple: type accepts a user-supplied function which takes: x, y. The user-supplied function returns a list with x, y, and also a type value.

I feel like this would be pretty cool for users. But most important, it could serve as a very easy platform to add more geoms for fitting functions.

library(tinyplot)

loess <- function(x, y, ...) {
    dat <- data.frame(x, y)
    fit <- stats::loess(y ~ x, data = dat)
    nd <- data.frame(x = seq(min(x), max(x), length.out = 1000))
    y <- predict(fit, newdata = nd)
    out <- list(y = y, x = nd$x, type = "l")
    return(out)
}

# default
with(iris, plt(Sepal.Length ~ Petal.Length, type = loess))

# by
with(iris, plt(Sepal.Length ~ Petal.Length | Species, type = loess))

# facet
with(iris, plt(Sepal.Length ~ Petal.Length, type = loess, facet = Species))

# Define a new function to draw points instead of lines
loess <- function(x, y, ...) {
    dat <- data.frame(x, y)
    fit <- stats::loess(y ~ x, data = dat)
    nd <- data.frame(x = sort(x))
    y <- predict(fit, newdata = nd)
    out <- list(y = y, x = nd$x, type = "p")
    return(out)
}

with(iris, plt(Sepal.Length ~ Petal.Length, type = loess))

grantmcdermott · 2024-07-20T03:43:29Z

Oh, I like the look of this. I think the current approach of internally converting to type = "l" (instead of type = "p") might prove a little too tricky in the end. But definitely keen to keep iterating.

vincentarelbundock · 2024-07-20T12:25:57Z

Added documentation and a few “type factories”:

library(tinyplot)

plt(am ~ mpg, type = type_glm(family = binomial), data = mtcars)

plt(wt ~ mpg | cyl, type = type_lm(), data = mtcars)

plt(hp ~ mpg, type = type_spline(), data = mtcars)

vincentarelbundock · 2024-07-20T12:59:25Z

@grantmcdermott I think this is ready for a "real" (initial) review.

Two major benefits of this approach to keep in mind:

Code simplification: This allows us to extract a lot of code out of the tinyplot() function, which is starting to get a bit unwieldy.
Arguments: Each new type can have its own set of arguments, much like ggplot2::geom_*(). And those are documented in a nice self-contained spot: ?type_spline. For instance, I might argue that jitter would be better as one of those type factories, since it would allow us to implement a bunch of different arguments like vertical and horizontal noise without polluting the main interface. plt(y ~ x, type = type_jitter(vnoise = .1, hnoise = .1))

grantmcdermott

This is fantastic @vincentarelbundock. I really like the direction and overall design. High level comments are:

I think we need to move the type_fun logic upstream so that it can better handle the extremities of the confidence intervals. I have some pseudo code for this that I'll try to fomalize and share.
I'd prefer to stick to pure base R for the pre-shipped function factories.

grantmcdermott · 2024-07-20T23:55:55Z

      yymax = idata[[ii]]$ymax
-
+
+      if (is.function(type_fun)) {


This would require some additional thought, but can we move this code logic higher up in the function?

Reason: At present, you are only extracting ymin, ymax etc. after the plot window has already been drawn (where, in turn, the axes ranges are only based on the raw values of x and y). This is leading to some confidence bands being clipped, since they extend beyond the raw variable ranges. (See your lm and loess examples here.)

One way to overcome this issue, is to get these function type factories to accept an addition by argument and then use this argument (a) split the data sample upstream and (b) run the prediction functions on the relevant split data samples. That's essentially what we do now for the "histogram" and "density" types. I'll try to share a concrete example shortly.

I don't have a very strong view on this, and understand that clipping is annoying. But the counter argument is: (1) ylim argument is easy to use, and (2) adding a by argument to the type_ functions will significantly increase complexity and barriers to entry for users. One cool thing about the current design is that it is trivial for people to write their own functions. Again, don't have a strong view; just something to consider as you decide.

Here's the sort of thing I could envisage. In brief, we keep the existing type function factories the same, but internally we pass them to an intermediate function (here: called type_fun_by but we can obviously rename it) that correctly maps it across any grouped data. This would allow you to keep the user-facing simplicity of the main type_x functions, and recycle the additional job of correctly mapping across groups to this same intermediate function for all them.

Proof of concept using a slightly simplified version of your type_spline function factory (here, adapted so that it always produces output rather than constructs a function).

# helper function ci = function(estimate, std.error, conf.level, df, backtransform = identity) { crit = stats::qt(1 - (1-conf.level)/2, df) out = list( estimate = backtransform(estimate), conf.low = backtransform(estimate - crit * std.error), conf.high = backtransform(estimate + crit * std.error) ) return(out) } # simplified type_spline function type_spline = function(x, y, conf.level = 0.95, k = -1, fx = FALSE, bs = "tp", ...) { se = if (isFALSE(conf.level)) FALSE else TRUE conf.level = if (isFALSE(conf.level)) 0.95 else conf.level dat = data.frame(x, y) fit = mgcv::gam(y ~ s(x, k = k, fx = fx, bs = bs), data = dat) nd = data.frame(x = seq(min(x), max(x), length.out = 100)) p = stats::predict(fit, newdata = nd, se.fit = TRUE) p = ci( estimate = p$fit, std.error = p$se.fit, conf.level = conf.level, df = fit$df.residual) out = list(x = nd$x, y = p$estimate, ymin = p$conf.low, ymax = p$conf.high) return(out) } # NEW: intermediate mapping function type_fun_by = function(type_fun, x, y, by = NULL, new_data_length = 100, ...) { new_data = data.frame(x = seq(min(x), max(x), length.out = new_data_length)) dots = list(...) if (is.null(by)) { arg_list = list(x = x, y = y, newdata = new_data) arg_list = modifyList(arg_list, dots, keep.null = TRUE) do.call(type_fun, args = arg_list) } else { uby = sort(unique(by)) split_data = lapply(list(x = x, y = y), function(l) split(l, by)) ret = lapply( seq_along(uby), function(i, ...) { x = split_data[["x"]][[i]] y = split_data[["y"]][[i]] minx = min(x, na.rm = TRUE) maxx = max(x, na.rm = TRUE) idx = new_data[["x"]] > minx & new_data[["x"]] < maxx new_data = new_data[idx, , drop = FALSE] new_data = rbind(data.frame(x = minx), new_data) arg_list = list(x = x, y = y, newdata = new_data) arg_list = modifyList(arg_list, dots, keep.null = TRUE) fun_ret = do.call(type_fun, args = arg_list) if (!is.function(fun_ret)) fun_ret[["by"]] = rep(uby[i], length(fun_ret[["x"]])) return(fun_ret) }, ) if (is.function(ret[[1]])) { return(ret[[1]]) } else { return(do.call(function(...) Map("c", ...), ret)) } } } ## Test # direct use a1 = type_spline(x = airquality$Day, y = airquality$Temp) # via intermediate function with no "by" mapping (should be the same as a1) a2 = type_fun_by(type_fun = type_spline, x = airquality$Day, y = airquality$Temp) # via intermediate function with by mapping (will map type function across different groups # and add an extra by column) a3 = type_fun_by(type_fun = type_spline, x = airquality$Day, y = airquality$Temp, by = airquality$Month) lapply(a1, head) #> $x #> [1] 1.000000 1.303030 1.606061 1.909091 2.212121 2.515152 #> #> $y #> 1 2 3 4 5 6 #> 80.65279 80.66687 80.68089 80.69482 80.70861 80.72214 #> #> $ymin #> 1 2 3 4 5 6 #> 75.90010 76.21427 76.50726 76.77525 77.01541 77.22605 #> #> $ymax #> 1 2 3 4 5 6 #> 85.40549 85.11946 84.85452 84.61439 84.40180 84.21823

lapply(a2, head) #> $x #> [1] 1.000000 1.303030 1.606061 1.909091 2.212121 2.515152 #> #> $y #> 1 2 3 4 5 6 #> 80.65279 80.66687 80.68089 80.69482 80.70861 80.72214 #> #> $ymin #> 1 2 3 4 5 6 #> 75.90010 76.21427 76.50726 76.77525 77.01541 77.22605 #> #> $ymax #> 1 2 3 4 5 6 #> 85.40549 85.11946 84.85452 84.61439 84.40180 84.21823

lapply(a3, head) #> $x #> [1] 1.000000 1.303030 1.606061 1.909091 2.212121 2.515152 #> #> $y #> 1 2 3 4 5 6 #> 70.31894 69.77182 69.22524 68.67976 68.13604 67.59610 #> #> $ymin #> 1 2 3 4 5 6 #> 62.04877 62.37023 62.58531 62.66354 62.58437 62.34635 #> #> $ymax #> 1 2 3 4 5 6 #> 78.58911 77.17341 75.86518 74.69598 73.68771 72.84584 #> #> $by #> [1] 5 5 5 5 5 5

^{Created on 2024-07-21 with reprex v2.1.0}

That makes a lot of sense. Let's put this on ice for now. I'm doing some refactoring experiments. Let's see if I manage to simplify things a little before deciding on an implementation of the type functions.

Sgtm thanks

grantmcdermott · 2024-07-20T23:59:57Z

+#' @examples
+#' plt(Sepal.Length ~ Petal.Length, type = type_spline(), data = iris)
+#' @export
+type_spline = function(x, y, conf_level = 0.95, k = -1, fx = FALSE, bs = "tp", ...) {


See my comment above about passing a dedicated by argument to this and other function type factories.

vincentarelbundock · 2024-08-12T04:18:50Z

FYI, I intend to close this and start over based on #198

If/when that dataframe split-apply-combine mechanism is merged, I have some good ideas about how to handle the by/facets gracefully, even in user-supplied type functions.

type = function()

30a920a

vincentarelbundock marked this pull request as draft July 20, 2024 02:46

This comment was marked as outdated.

Sign in to view

vincentarelbundock mentioned this pull request Jul 20, 2024

"jitter" (alias "j"?) #169

Closed

type=function(): ymin, ymax, xmin, xmax

ceb4205

This comment was marked as outdated.

Sign in to view

vincentarelbundock added 2 commits July 20, 2024 08:05

type factories

198ba48

type factories + docs

8e936e0

vincentarelbundock marked this pull request as ready for review July 20, 2024 12:26

vincentarelbundock added 2 commits July 20, 2024 08:29

Merge branch 'main' into typefun

1cd87c3

docs

fc0d61c

vincentarelbundock changed the title ~~[WIP] type = function()~~ type = function() Jul 20, 2024

vincentarelbundock added 2 commits July 20, 2024 08:36

type factory: conf_level + docs

16d6515

type factories: se=FALSE -> conf_level=FALSE

7454fad

grantmcdermott requested changes Jul 21, 2024

View reviewed changes

vincentarelbundock added 2 commits July 21, 2024 08:22

PR grantmcdermott#168 code review

ca06018

docs + namespace

89f0a05

vincentarelbundock mentioned this pull request Jul 21, 2024

Split data at the top #172

Closed

vincentarelbundock marked this pull request as draft July 22, 2024 04:24

vincentarelbundock mentioned this pull request Aug 12, 2024

Avoid type aliases #199

Closed

vincentarelbundock closed this Sep 9, 2024

vincentarelbundock deleted the typefun branch September 9, 2024 00:55

grantmcdermott mentioned this pull request Oct 7, 2024

Support custom types #222

Merged

Conversation

vincentarelbundock commented Jul 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

grantmcdermott commented Jul 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

vincentarelbundock commented Jul 20, 2024

Uh oh!

vincentarelbundock commented Jul 20, 2024

Uh oh!

grantmcdermott left a comment

Choose a reason for hiding this comment

Uh oh!

grantmcdermott Jul 20, 2024

Choose a reason for hiding this comment

Uh oh!

vincentarelbundock Jul 21, 2024

Choose a reason for hiding this comment

Uh oh!

grantmcdermott Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

vincentarelbundock Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

grantmcdermott Jul 22, 2024

Choose a reason for hiding this comment

Uh oh!

grantmcdermott Jul 20, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vincentarelbundock commented Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vincentarelbundock commented Jul 20, 2024 •

edited

Loading

grantmcdermott commented Jul 20, 2024 •

edited

Loading