Review (re)set_index#6992
Merged
benbovy merged 9 commits intopydata:mainfrom Sep 27, 2022
Merged
Conversation
Restore old behavior, i.e., - drop the multi-index dimension name (even if drop=False) unless reset_index still returns a multi-index - rename the level coordinate to the dimension name if the multi-index is reduced to a single index - drop the whole multi-index if its dimension coordinate is given as argument Fix IndexVariable -> Variable conversion
- Convert the coordinates left unindexed from IndexVariable to Variable - Keep multi-index coordinates next to each other in a consistent order
This was referenced Sep 8, 2022
Closed
Member
Author
|
Let's merge this. The failing flaky tests are not related to this PR. |
piyushhhxyz
added a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 11, 2026
…bles After the index refactor, _coord_names can contain names that are not in _variables (e.g. after set_index + reset_index with drop=True). This caused __len__ to return a negative number since it computed: len(_variables) - len(_coord_names) Fix by only counting coord names that are actually present in _variables: len(_variables) - len(_variables.keys() & _coord_names) This makes __len__ consistent with __iter__, which already correctly filters by checking both conditions. Fixes pydata#6992
piyushhhxyz
added a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 11, 2026
…bles After the index refactor, _coord_names can contain names that are not in _variables (e.g. after set_index + reset_index with drop=True). This caused __len__ to return a negative value, raising ValueError. Fix by intersecting _coord_names with _variables.keys() before computing the length difference. Fixes pydata#6992
piyushhhxyz
pushed a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 11, 2026
When reset_index is called with drop=True, variables are removed from _variables but their names were not removed from _coord_names. This caused _coord_names to contain names not present in _variables, which made DataVariables.__len__ return a negative value (triggering ValueError). Fix: intersect coord_names with the final variables dict in reset_index so stale names are excluded. Also harden DataVariables.__len__ to only count coord names that are actually present in _variables, as a defensive measure against similar mismatches from other code paths. Fixes pydataGH-6992
piyushhhxyz
added a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 11, 2026
…(drop=True) In Dataset.reset_index(), when drop=True, variables were removed from _variables but their names were not removed from _coord_names. This caused _coord_names to have more entries than _variables, making DataVariables.__len__() return a negative number and breaking repr. Fix by intersecting coord_names with the variables dict keys in reset_index, ensuring _coord_names is always a subset of _variables. Fixes pydata#6992
piyushhhxyz
pushed a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 12, 2026
After set_index + reset_index(drop=True), _coord_names could contain entries not present in _variables (e.g., the dropped multi-index dimension name). This caused DataVariables.__len__ to return a negative value, triggering 'ValueError: __len__() should return >= 0'. Two fixes: 1. In reset_index(), intersect coord_names with variables.keys() to ensure coord_names is always a subset of variables. 2. In DataVariables.__len__(), only count coord names that actually exist in _variables, making it robust against any future inconsistency. Fixes pydata#6992
piyushhhxyz
added a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 12, 2026
After set_index + reset_index(drop=True), _coord_names could contain entries not present in _variables (e.g., the dropped multi-index dimension name). This caused DataVariables.__len__ to return a negative value, triggering 'ValueError: __len__() should return >= 0'. Two fixes: 1. In reset_index(), intersect coord_names with variables.keys() to ensure coord_names is always a subset of variables. 2. In DataVariables.__len__(), only count coord names that actually exist in _variables, keeping it consistent with __iter__(). Added regression test for the exact MVCE from the issue. Fixes pydata#6992
piyushhhxyz
added a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 12, 2026
…ex with drop=True After set_index/reset_index(drop=True), _coord_names could contain names of variables that were dropped, leading to more _coord_names than _variables. This caused DataVariables.__len__ to return a negative number, breaking repr and other operations. Fix: 1. In reset_index: intersect coord_names with variables keys so dropped variable names are removed from coord_names. 2. In DataVariables.__len__: defensively count only coord_names that are actually in _variables, preventing negative lengths if this invariant is violated elsewhere. Fixes pydata#6992
piyushhhxyz
pushed a commit
to piyushhhxyz/xarray
that referenced
this pull request
Apr 14, 2026
After set_index + reset_index(drop=True), _coord_names could contain names not present in _variables, causing DataVariables.__len__ to return a negative value and breaking repr/iteration. Two fixes: 1. Root cause: In reset_index(), intersect coord_names with variables so dropped variables are also removed from coord_names. 2. Defensive: In DataVariables.__len__(), count only coord names that are actually present in _variables, consistent with __iter__. Fixes pydata#6992
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
_coord_namesthan_variableson Dataset #7036whats-new.rstRestore behavior prior to the explicit indexes refactor (i.e., refactored but without breaking changes).
TODO:
set_indexreset_indexFor
reset_index, the only behavior that is not restored here is the coordinate renamed with a_suffix when dropping a single index. This was originally to prevent any coordinate with no index matching a dimension name, which is now irrelevant. That is a quite dirty workaround and I don't know who is relying on it (no complaints yet), but I'm open to restore it if needed (esp. considering that we may later deprecatereset_indexcompletely in favor ofdrop_indexes#6971).