Skip to content

Review (re)set_index#6992

Merged
benbovy merged 9 commits intopydata:mainfrom
benbovy:review-set-reset-index
Sep 27, 2022
Merged

Review (re)set_index#6992
benbovy merged 9 commits intopydata:mainfrom
benbovy:review-set-reset-index

Conversation

@benbovy
Copy link
Copy Markdown
Member

@benbovy benbovy commented Sep 5, 2022

Restore behavior prior to the explicit indexes refactor (i.e., refactored but without breaking changes).

TODO:

  • review set_index
  • review reset_index

For reset_index, the only behavior that is not restored here is the coordinate renamed with a _ suffix when dropping a single index. This was originally to prevent any coordinate with no index matching a dimension name, which is now irrelevant. That is a quite dirty workaround and I don't know who is relying on it (no complaints yet), but I'm open to restore it if needed (esp. considering that we may later deprecate reset_index completely in favor of drop_indexes #6971).

Restore old behavior, i.e.,

- drop the multi-index dimension name (even if
  drop=False) unless reset_index still returns a multi-index
- rename the level coordinate to the dimension name if the multi-index
  is reduced to a single index
- drop the whole multi-index if its dimension coordinate is given as
  argument

Fix IndexVariable -> Variable conversion
@benbovy benbovy changed the title Review (re)set_index + tests Review (re)set_index Sep 5, 2022
@benbovy benbovy mentioned this pull request Sep 23, 2022
@benbovy
Copy link
Copy Markdown
Member Author

benbovy commented Sep 27, 2022

Let's merge this. The failing flaky tests are not related to this PR.

@benbovy benbovy merged commit a042ae6 into pydata:main Sep 27, 2022
@benbovy benbovy deleted the review-set-reset-index branch August 30, 2023 09:05
piyushhhxyz added a commit to piyushhhxyz/xarray that referenced this pull request Apr 11, 2026
…bles

After the index refactor, _coord_names can contain names that are not in
_variables (e.g. after set_index + reset_index with drop=True). This caused
__len__ to return a negative number since it computed:
  len(_variables) - len(_coord_names)

Fix by only counting coord names that are actually present in _variables:
  len(_variables) - len(_variables.keys() & _coord_names)

This makes __len__ consistent with __iter__, which already correctly filters
by checking both conditions.

Fixes pydata#6992
piyushhhxyz added a commit to piyushhhxyz/xarray that referenced this pull request Apr 11, 2026
…bles

After the index refactor, _coord_names can contain names that are not in
_variables (e.g. after set_index + reset_index with drop=True). This caused
__len__ to return a negative value, raising ValueError.

Fix by intersecting _coord_names with _variables.keys() before computing
the length difference.

Fixes pydata#6992
piyushhhxyz pushed a commit to piyushhhxyz/xarray that referenced this pull request Apr 11, 2026
When reset_index is called with drop=True, variables are removed from
_variables but their names were not removed from _coord_names. This caused
_coord_names to contain names not present in _variables, which made
DataVariables.__len__ return a negative value (triggering ValueError).

Fix: intersect coord_names with the final variables dict in reset_index
so stale names are excluded. Also harden DataVariables.__len__ to only
count coord names that are actually present in _variables, as a defensive
measure against similar mismatches from other code paths.

Fixes pydataGH-6992
piyushhhxyz added a commit to piyushhhxyz/xarray that referenced this pull request Apr 11, 2026
…(drop=True)

In Dataset.reset_index(), when drop=True, variables were removed from
_variables but their names were not removed from _coord_names. This caused
_coord_names to have more entries than _variables, making
DataVariables.__len__() return a negative number and breaking repr.

Fix by intersecting coord_names with the variables dict keys in reset_index,
ensuring _coord_names is always a subset of _variables.

Fixes pydata#6992
piyushhhxyz pushed a commit to piyushhhxyz/xarray that referenced this pull request Apr 12, 2026
After set_index + reset_index(drop=True), _coord_names could contain
entries not present in _variables (e.g., the dropped multi-index dimension
name). This caused DataVariables.__len__ to return a negative value,
triggering 'ValueError: __len__() should return >= 0'.

Two fixes:
1. In reset_index(), intersect coord_names with variables.keys() to
   ensure coord_names is always a subset of variables.
2. In DataVariables.__len__(), only count coord names that actually
   exist in _variables, making it robust against any future inconsistency.

Fixes pydata#6992
piyushhhxyz added a commit to piyushhhxyz/xarray that referenced this pull request Apr 12, 2026
After set_index + reset_index(drop=True), _coord_names could contain
entries not present in _variables (e.g., the dropped multi-index dimension
name). This caused DataVariables.__len__ to return a negative value,
triggering 'ValueError: __len__() should return >= 0'.

Two fixes:
1. In reset_index(), intersect coord_names with variables.keys() to
   ensure coord_names is always a subset of variables.
2. In DataVariables.__len__(), only count coord names that actually
   exist in _variables, keeping it consistent with __iter__().

Added regression test for the exact MVCE from the issue.

Fixes pydata#6992
piyushhhxyz added a commit to piyushhhxyz/xarray that referenced this pull request Apr 12, 2026
…ex with drop=True

After set_index/reset_index(drop=True), _coord_names could contain names
of variables that were dropped, leading to more _coord_names than _variables.
This caused DataVariables.__len__ to return a negative number, breaking repr
and other operations.

Fix:
1. In reset_index: intersect coord_names with variables keys so dropped
   variable names are removed from coord_names.
2. In DataVariables.__len__: defensively count only coord_names that are
   actually in _variables, preventing negative lengths if this invariant
   is violated elsewhere.

Fixes pydata#6992
piyushhhxyz pushed a commit to piyushhhxyz/xarray that referenced this pull request Apr 14, 2026
After set_index + reset_index(drop=True), _coord_names could contain
names not present in _variables, causing DataVariables.__len__ to return
a negative value and breaking repr/iteration.

Two fixes:
1. Root cause: In reset_index(), intersect coord_names with variables
   so dropped variables are also removed from coord_names.
2. Defensive: In DataVariables.__len__(), count only coord names that
   are actually present in _variables, consistent with __iter__.

Fixes pydata#6992
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment