Skip to content

perf: optimized core tabulation and summary functions; added benchmark suite#575

Open
kpagacz wants to merge 8 commits into
insightsengineering:mainfrom
kpagacz:perf/ard-functions-performance-improvements
Open

perf: optimized core tabulation and summary functions; added benchmark suite#575
kpagacz wants to merge 8 commits into
insightsengineering:mainfrom
kpagacz:perf/ard-functions-performance-improvements

Conversation

@kpagacz

@kpagacz kpagacz commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

The below 5 functions from the cards package contributed the most to runtime of the below gtsummary benchmark:

    suppressPackageStartupMessages({
      library(profvis)
      library(dplyr)
      library(gtsummary)
    })

    # Replicate the trial dataset 10x to simulate a moderately sized dataset
    data_big <- trial[rep(seq_len(nrow(trial)), 10), ]

    # Run profvis to generate the interactive flame graph
    profvis({
      for(i in 1:2) {
        tbl_strata(
          data_big,
          strata = grade,
          .tbl_fun = ~ .x |> tbl_summary(by = trt, include = c(age, marker))
        )
      }
    })

A more detailed list:

  • ard_summary.data.frame : Replaced deeply nested purrr::map() , purrr::map2() , and dplyr::bind_rows() iterations within .calculate_stats_as_ard and .lst_results_as_df with lapply() and data.frame() assignments. (~36% speedup)

  • ard_tabulate.data.frame : refactored .calculate_tabulation_statistics to remove tidyr::pivot_longer and dplyr::mutate(across()) (~36% speedup)

  • tidy_ard_column_order & tidy_ard_row_order : removed tidyselect operations and dplyr::arrange , substituting them base R string searches ( grepl ) and order() vector-subsetting. (~28x speedup)

  • ard_tabulate_value.data.frame : replaced pmap with a vectorized O(N) which() / %in% base R scan. (~16x speedup)

  • ard_total_n.data.frame : Completely avoided a full-dataframe dplyr::mutate scan by creating a direct 1-row tibble object for the total N ARD object. (~45x speedup)

    CI Additions:

  • Added .github/workflows/benchmark.yaml to trigger automated profiling workflows on PRs prefixed with perf: .

  • Unified benchmark code into .github/scripts/benchmark.R , which runs bench::mark() on core functions and automatically posts a markdown table reporting itr/sec and memory allocation directly to the PR comment thread.

Related to insightsengineering/nestdevs-tasks#118


Pre-review Checklist (if item does not apply, mark is as complete)

  • All GitHub Action workflows pass with a ✅
  • PR branch has pulled the most recent updates from master branch: usethis::pr_merge_main()
  • If a bug was fixed, a unit test was added.
  • Code coverage is suitable for any new functions/features (generally, 100% coverage for new code): devtools::test_coverage()
  • Request a reviewer

Reviewer Checklist (if item does not apply, mark is as complete)

  • If a bug was fixed, a unit test was added.
  • Run pkgdown::build_site(). Check the R console for errors, and review the rendered website.
  • Code coverage is suitable for any new functions/features: devtools::test_coverage()

When the branch is ready to be merged:

  • Update NEWS.md with the changes from this pull request under the heading "# cards (development version)". If there is an issue associated with the pull request, reference it in parentheses at the end update (see NEWS.md for examples).
  • All GitHub Action workflows pass with a ✅
  • Approve Pull Request
  • Merge the PR. Please use "Squash and merge" or "Rebase and merge".

Optional Reverse Dependency Checks:

Install checked with pak::pak("Genentech/checked") or pak::pak("checked")

# Check dev versions of `cardx`, `gtsummary`, and `tfrmt` which are in the `ddsjoberg` R Universe
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2L, repos = c('https://ddsjoberg.r-universe.dev', 'https://cloud.r-project.org'))"

# Check CRAN reverse dependencies but run tests skipped on CRAN
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = TRUE)); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"

# Check CRAN reverse dependencies in a CRAN-like environment
Rscript -e "options(checked.check_envvars = c(NOT_CRAN = FALSE), checked.check_build_args = '--as-cran'); checked::check_rev_deps(path = '.', n = parallel::detectCores() - 2, repos = 'https://cloud.r-project.org')"

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

✅ All contributors have signed the CLA
Posted by the CLA Assistant Lite bot.

@kpagacz

kpagacz commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants