Skip to content

melt(measure.vars=list) returns indices in variable column#5247

Merged
tdhock merged 17 commits intomasterfrom
fix5209
Apr 8, 2024
Merged

melt(measure.vars=list) returns indices in variable column#5247
tdhock merged 17 commits intomasterfrom
fix5209

Conversation

@tdhock
Copy link
Member

@tdhock tdhock commented Nov 2, 2021

Closes #5209

This was not a big problem, but it would be nice for consistency: whenever user specifies measure.vars=list, variable column should contains indices rather than column names. Increased consistency may help avoid issues such as #5201 in which a user thought that measure.vars=list(...) and measure.vars=c(...) should yield same result (docs say otherwise).

Previous behavior: (master/CRAN)

> melt(data.table(a=10, b=20), measure.vars=list("a"))
    b variable value
1: 20        a    10

New behavior:

> melt(data.table(a=10, b=20), measure.vars=list("a"))
    b variable value
1: 20        1    10

Main fix involves checking if measure.vars is list in fmelt C code.
To make some old tests keep passing, I had to change

  • patterns: when one argument/pattern input, used to output a list with one element (integer vector of match indices), now returns just the integer vector (not in a list). When used like melt(DT, measure.vars=patterns("^a") we therefore still get the same result as before this PR (variable is column name, not index).
  • .SDcols=patterns("^a") again patterns returns vector instead of list so that needed to be special cased in [.data.table -- just use that vector instead of trying to pass it through Reduce(intersect, ...) -- again same results as before this PR.

@codecov
Copy link

codecov bot commented Nov 2, 2021

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.53%. Comparing base (53df7e5) to head (548d94c).
Report is 3 commits behind head on master.

❗ Current head 548d94c differs from pull request most recent head d738a12. Consider uploading reports for the commit d738a12 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5247      +/-   ##
==========================================
+ Coverage   97.51%   97.53%   +0.02%     
==========================================
  Files          80       80              
  Lines       14920    14920              
==========================================
+ Hits        14549    14552       +3     
+ Misses        371      368       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tdhock tdhock requested a review from mattdowle November 2, 2021 06:22
@mnazarov
Copy link
Contributor

mnazarov commented Mar 1, 2023

Another inconsistency with length-1 list in measure.vars, is that names are discarded, i.e.

# CRAN version
> melt(data.table(a=10, b=20), measure.vars=list("n" = "a"))
    b variable value
1: 20        a    10

And this PR seems to fix that too (even if it seems that the names were returned when PR was created in 2021):

# with this PR
> melt(data.table(a=10, b=20), measure.vars=list("n" = "a"))
    b variable  n
1: 20        1 10

Maybe adding an extra test with names could be useful.

@tdhock tdhock added this to the 1.16.0 milestone Jan 5, 2024
@tdhock
Copy link
Member Author

tdhock commented Feb 15, 2024

hi @jangorecki @MichaelChirico I would like to include this fix in next release, and I believe that it is ready to merge into master, so could you please review and/or merge if you agree that it is ready? Thanks!

test(2182.4, melt(DTid, measure.vars=list(a=c(NA,"a2"), b=c("b1","b2")), id.vars="id"), exid)
test(2182.5, melt(DT.wide, measure.vars=list(a=c(NA,1), b=2:3), na.rm=TRUE), data.table(variable=factor(2), a=2, b=2))
test(2182.6, melt(DT.wide, measure.vars=list(b=c("b1","b2"))), data.table(a2=2, variable=factor(c("b1","b2")), b=c(1,2))) # measure.vars named list length=1, #5065
test(2182.6, melt(DT.wide, measure.vars=list(b=c("b1","b2"))), data.table(a2=2, variable=factor(c("1","2")), b=c(1,2))) # measure.vars named list length=1, #5065
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why change the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the new result using this PR

> melt(DT.wide, measure.vars=list(b=c("b1","b2")))
      a2 variable     b
   <num>   <fctr> <num>
1:     2        1     1
2:     2        2     2
> melt(DT.wide, measure.vars=c("b1","b2"))
      a2 variable value
   <num>   <fctr> <num>
1:     2       b1     1
2:     2       b2     2

@MichaelChirico
Copy link
Member

Minor feedback only. It does look like it could be a breaking change, though? We can merge and see what revdep testing tells us.

tdhock and others added 4 commits March 26, 2024 09:43
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
@tdhock
Copy link
Member Author

tdhock commented Apr 5, 2024

yes it could be a breaking change. I can follow up and suggest fixes for any revdeps that show up.

@MichaelChirico
Copy link
Member

yes it could be a breaking change. I can follow up and suggest fixes for any revdeps that show up.

Here are some places to look beyond CRAN packages:

https://github.com/search?q=lang%3AR+%2Fmeasure%5B.%5Dvars%5Cs*%3D%5Cs*list%5B%28%5D%5B%5E%2C%29%5D%2B%5B%29%5D%2F&type=code

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge after addressing latest round of review. Thanks!

@jangorecki
Copy link
Member

I feel we are much less strict about backward compatibility than we used to be. Checking revdeps is not sufficient IMO, it is not sufficient by a huge factor. First, most of pkgs on CRAN does not have good test coverage. Second, we don't know how many revdeps there are which are not o CRAN or github, I would assume at least 3-4 times more than CRAN. Third, there are thousands of plain scripts. Current process of looking at cran/bioc revdeps, and checking github for the usage is as much as we can do, but IMO not enough to jump in with breaking changes. Let's try to give at least 1 release cycle for transition. We could have extra section in NEWS file "breaking changes in next release" which described how to switch to new behavior.

@MichaelChirico
Copy link
Member

MichaelChirico commented Apr 8, 2024

While you make a very good point in general, in this case, we are aligning implementation with documented behavior. I am not sure we should be as deferential to users relying on bugs/undocumented behavior.

Revdeps/GH search can help us in this case -- if many users wound up relying on this, we should proceed more carefully.

tdhock and others added 3 commits April 8, 2024 09:02
Co-authored-by: Michael Chirico <michaelchirico4@gmail.com>
@tdhock
Copy link
Member Author

tdhock commented Apr 8, 2024

We could have extra section in NEWS file "breaking changes in next release" which described how to switch to new behavior.

great idea Jan, I added a sentence about that:

  1. melt returns an integer column for variable when measure.vars is a list of length=1, consistent with the documented behavior, #5209. Thanks to @tdhock for reporting and fixing. Any users who were relying on this behavior can change measure.vars=list("col_name") (output variable was column name, now is column index/integer) to measure.vars="col_name" (variable still is column name).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

melt with measure.vars=list of length=1 should return integer in variable column

4 participants