Skip to content

Add drop_duplicates for dims#5239

Merged
max-sixty merged 15 commits intopydata:masterfrom
ahuang11:drop_duplicates_dim
May 15, 2021
Merged

Add drop_duplicates for dims#5239
max-sixty merged 15 commits intopydata:masterfrom
ahuang11:drop_duplicates_dim

Conversation

@ahuang11
Copy link
Copy Markdown
Contributor

@ahuang11 ahuang11 commented May 1, 2021

Ruined #5089 with reverting so remaking the PR for just dims

  • Closes #xxxx
  • Tests added
  • Passes pre-commit run --all-files
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@pep8speaks
Copy link
Copy Markdown

pep8speaks commented May 1, 2021

Hello @ahuang11! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-13 17:47:30 UTC

@ahuang11 ahuang11 mentioned this pull request May 1, 2021
5 tasks
Copy link
Copy Markdown
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually -- there is still one unresolved design concern here.

If you drop duplicates over multiple variables at once, what dimensions should the result have? In particular -- should it have the original dimensions, or should all dimensions involved be combined into one?

The latter might seem a little crazy now, but would make more sense once we allow dropping over multi-dimensional variables.

If we really want to avoid any possible controversy here, it might be best to stick to only supporting one variable for now, too.

@max-sixty
Copy link
Copy Markdown
Collaborator

The docs are failing with:

/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/5239/xarray/core/dataarray.py:docstring of xarray.core.dataarray.DataArray.drop_duplicates:3: WARNING: Unexpected indentation.
/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/5239/xarray/core/dataarray.py:docstring of xarray.core.dataarray.DataArray.drop_duplicates:4: WARNING: Block quote ends without a blank line; unexpected unindent.
looking for now-outdated files... none found

@max-sixty
Copy link
Copy Markdown
Collaborator

If you drop duplicates over multiple variables at once, what dimensions should the result have? In particular -- should it have the original dimensions, or should all dimensions involved be combined into one?

The latter might seem a little crazy now, but would make more sense once we allow dropping over multi-dimensional variables.

If we really want to avoid any possible controversy here, it might be best to stick to only supporting one variable for now, too.

For me this case supports consistent behavior over any number of dims! Which would mean rejecting the "stack only if you supply exactly ndims" proposal.

Completely fine with restricting to one dim for the moment and seeing how that goes (it's always possible to pass .drop_duplicates_coords(dim1).drop_duplicates_coords(dim2).

Were we going with drop_duplicates or drop_duplicates_coords? No strong view from me at all.

Comment thread xarray/core/dataarray.py Outdated
Co-authored-by: keewis <keewis@users.noreply.github.com>
@keewis
Copy link
Copy Markdown
Collaborator

keewis commented May 1, 2021

the docs build can probably be fixed by adding newlines before the section headers ("Parameters" / "Returns")

@max-sixty
Copy link
Copy Markdown
Collaborator

@ahuang11 we discussed this on the core team call. People are excited to merge this, and appreciate you bearing with the changes.

I suggested that we narrow this even further to one dimension and merge, so we can benefit from this now and consider additions from there. Would that be OK with you?

@ahuang11
Copy link
Copy Markdown
Contributor Author

ahuang11 commented May 12, 2021 via email

@max-sixty
Copy link
Copy Markdown
Collaborator

@shoyer could I confirm that limiting to a single dimension satisfies this? You say "multiple variables" here. I mentioned a single dimension on the call but maybe wasn't clear.

If you drop duplicates over multiple variables at once, what dimensions should the result have? In particular -- should it have the original dimensions, or should all dimensions involved be combined into one?

The latter might seem a little crazy now, but would make more sense once we allow dropping over multi-dimensional variables.

If we really want to avoid any possible controversy here, it might be best to stick to only supporting one variable for now, too.

@ahuang11
Copy link
Copy Markdown
Contributor Author

Dont understand this

xarray/core/dataarray.py:4605: error: Argument 1 to "isel" of "DataArray" has incompatible type "Dict[str, Any]"; expected "Optional[Mapping[Hashable, Any]]"  [arg-type]
Found 1 error in 1 file (checked 140 source files)

self.isel(indexes)

Comment thread xarray/core/dataarray.py Outdated
Comment thread xarray/core/dataarray.py Outdated
Comment thread xarray/core/dataarray.py Outdated
ahuang11 and others added 3 commits May 12, 2021 23:06
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com>
@max-sixty max-sixty added the plan to merge Final call for comments label May 13, 2021
* upstream/master:
  Update release guide (pydata#5274)
  combine keep_attrs and combine_attrs in apply_ufunc (pydata#5041)
  Explained what a deprecation cycle is (pydata#5289)
  Code cleanup (pydata#5234)
  FacetGrid docstrings (pydata#5293)
  Add whats new for dataset interpolation with non-numerics (pydata#5297)
  Allow dataset interpolation with different datatypes (pydata#5008)
@dcherian
Copy link
Copy Markdown
Contributor

@shoyer could I confirm that limiting to a single dimension satisfies this? You say "multiple variables" here. I mentioned a single dimension on the call but maybe wasn't clear.

@shoyer can you kindly merge this if you're happy with it?

@dcherian dcherian mentioned this pull request May 13, 2021
9 tasks
@max-sixty
Copy link
Copy Markdown
Collaborator

Merging but I or @ahuang11 can do any follow-ups.

Thank you v much @ahuang11 — appreciate this was a lot of work and happy to have you contribute such a good feature.

@max-sixty max-sixty merged commit b1bd6c8 into pydata:master May 15, 2021
@keewis keewis mentioned this pull request May 16, 2021
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants