Expose df in pool() by munoztd0 · Pull Request #562 · openpharma/rbmi

munoztd0 · 2026-06-01T13:30:57Z

pool() now exposes df (Barnard-Rubin pooled degrees of freedom) across all pooling methods and in as.data.frame.pool(). For methods without a d.f. concept (jackknife, bootstrap, bmlmi), df = NA_real_.
The constant-d.f. assertion is remains unchanged

munoztd0 · 2026-06-01T13:32:41Z

In accord to our previous conservation #560 with @danielinteractive and @luwidmer

munoztd0 · 2026-06-01T13:33:27Z

fix johnsonandjohnson/junco#369

danielinteractive · 2026-06-12T09:26:55Z

Hi @luwidmer , what are your thoughts on this one? 😃

luwidmer · 2026-06-16T16:24:31Z

Thank you for flagging this again @danielinteractive, I was out of office. Will take a look over the next days @munoztd0

tobiasmuetze · 2026-06-24T11:00:05Z

It would be good to see a reference for the statement "median fallback is the standard pragmatic choice". Such a decision would need to be thoroughly documented and also highlighted in the methods vignette.

luwidmer

@danielinteractive @munoztd0:

I agree with @tobiasmuetze here RE the statement "median fallback is the standard pragmatic choice". I would also like to see references for this added to the documentation.

In addition:

The old code threw a clear error when dfs varied. Now it silently proceeds with median(dfs). This can silently introduce unexpected behavior in case a user relied on this error. From a software engineering standpoint I don't think this is desirable.
This PR introduces test failures, which would need to be addressed.
If the new median df is indeed desirable, the behavior there should have tests as well.
pool_internal.jackknife(), pool_internal.bootstrap(), and pool_internal.bmlmi() all return the parametric_ci() list without $df. Only pool_internal.rubin() now appends it. This inconsistency means downstream code cannot reliably access $df without first checking which method was used. If df should be exposed, one should consider to do this as consistently as possible (and/or as_data_frame_internal() should be updated to include it).
as.data.frame.pool() won't surface the new df. The as_data_frame_internal() function extracts est, se, ci, pvalue but not df. If the goal is to expose df to downstream callers, it should appear in the data frame representation too, which is the primary user-facing output.

danielinteractive · 2026-06-24T13:54:05Z

It would be good to see a reference for the statement "median fallback is the standard pragmatic choice". Such a decision would need to be thoroughly documented and also highlighted in the methods vignette.

Thanks @tobiasmuetze - good point, I don't think there is any literature reference for this statement, because it was just my personal judgement call when I ran into this problem 😄

If we think that it is not a good idea to add this I can double check again first if we still need this feature for our outputs. (I remember one case where we later removed one feature use...)

munoztd0 · 2026-06-29T14:53:16Z

@luwidmer and @tobiasmuetze
First thanks for the reviews and comments!

We discussed with @danielinteractive the pros and cons and decided to drop the multiple df values cases from this PR (so I reverted this part)

Then I made sure that to expose df across all methods for consistency and updated the tests accordingly.

One remaining issue: the print.md snapshot shows df = because sysdata.rda predates this PR and the stored pool objects lack the $df field. Re-running data-raw/create_print_test_data.R fixes the df column but also chnages point estimates due to the RNG (I beleieve?) see below:

Unrelated change that would pollute the diff.. Should we accept the snapshot for now and regenerate sysdata.rda in a separate dedicated PR, or is there a preferred way to handle this?

pool: expose df, median v_com

f7253ab

munoztd0 marked this pull request as draft June 11, 2026 09:50

munoztd0 marked this pull request as ready for review June 11, 2026 09:50

danielinteractive requested a review from luwidmer June 12, 2026 09:26

luwidmer requested changes Jun 24, 2026

View reviewed changes

munoztd0 added 2 commits June 29, 2026 13:56

revert: support for varying d.f.

de42ffa

pool: expose df across all methods

aa49cbb

munoztd0 changed the title ~~Expose df in Rubin pooling + support varying d.f.~~ Expose df in pool() Jun 29, 2026

update: snapshot update to reflect d.f.

bde7821

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose df in pool()#562

Expose df in pool()#562
munoztd0 wants to merge 4 commits into
openpharma:mainfrom
munoztd0:junco_rbmi_pool

munoztd0 commented Jun 1, 2026 •

edited

Loading

Uh oh!

munoztd0 commented Jun 1, 2026 •

edited

Loading

Uh oh!

munoztd0 commented Jun 1, 2026

Uh oh!

danielinteractive commented Jun 12, 2026

Uh oh!

luwidmer commented Jun 16, 2026

Uh oh!

tobiasmuetze commented Jun 24, 2026

Uh oh!

luwidmer left a comment

Uh oh!

danielinteractive commented Jun 24, 2026

Uh oh!

munoztd0 commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

munoztd0 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

munoztd0 commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

munoztd0 commented Jun 1, 2026

Uh oh!

danielinteractive commented Jun 12, 2026

Uh oh!

luwidmer commented Jun 16, 2026

Uh oh!

tobiasmuetze commented Jun 24, 2026

Uh oh!

luwidmer left a comment

Choose a reason for hiding this comment

Uh oh!

danielinteractive commented Jun 24, 2026

Uh oh!

munoztd0 commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

munoztd0 commented Jun 1, 2026 •

edited

Loading

munoztd0 commented Jun 1, 2026 •

edited

Loading