Skip to content

(feat): use numpy nan-able string type for anndata.experimental.MaskedArray#2011

Merged
ilan-gold merged 7 commits intomainfrom
ig/numpy_nan_types
Jun 18, 2025
Merged

(feat): use numpy nan-able string type for anndata.experimental.MaskedArray#2011
ilan-gold merged 7 commits intomainfrom
ig/numpy_nan_types

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented Jun 17, 2025

Using direct nan-able types is much better than object dtypes. In theory, this functionality should be usable independent of the new zarr dtype business and give us stronger typing going forward.

@ilan-gold ilan-gold added this to the 0.12.0 milestone Jun 17, 2025
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.46%. Comparing base (ee4ba21) to head (e368910).
⚠️ Report is 46 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2011      +/-   ##
==========================================
- Coverage   87.47%   85.46%   -2.02%     
==========================================
  Files          46       46              
  Lines        7056     7057       +1     
==========================================
- Hits         6172     6031     -141     
- Misses        884     1026     +142     
Files with missing lines Coverage Δ
src/anndata/_io/specs/lazy_methods.py 96.12% <ø> (ø)
src/anndata/compat/__init__.py 80.73% <100.00%> (+0.08%) ⬆️
src/anndata/experimental/backed/_lazy_arrays.py 94.28% <100.00%> (ø)

... and 7 files with indirect coverage changes

@flying-sheep
Copy link
Copy Markdown
Member

Didn't we discuss this before? I must have suggested this when this came out.

Do we have an issue?

@ilan-gold
Copy link
Copy Markdown
Contributor Author

@flying-sheep I didn't even think to look. I wasn't aware of this until it came up in the linked PR. In any case I don't see anything.

@flying-sheep
Copy link
Copy Markdown
Member

I mentioned numpy's string type here: #679 (comment)

I think this needs more testing, e.g. what when someone sticks in a masked array based on a regular numpy array?

@ilan-gold
Copy link
Copy Markdown
Contributor Author

ilan-gold commented Jun 17, 2025

I think this needs more testing, e.g. what when someone sticks in a masked array based on a regular numpy array?

Could you be a bit more specific? You're saying if some does something like adata.obs.iloc[[1, 2, 3]] = np.array(["a", "b", pd.NA], dtype="object")?

@flying-sheep
Copy link
Copy Markdown
Member

OK, so is this only an implementation detail and the user-facing data type is always a pandas Series, or is there a way to see a numpy array with that dtype as a user?

@ilan-gold
Copy link
Copy Markdown
Contributor Author

OK, so is this only an implementation detail and the user-facing data type is always a pandas Series, or is there a way to see a numpy array with that dtype as a user?

Right, this will be hidden behind pandas.arrays.StringArray for non-lazy cases, and for the new lazy API we haven't released anything yet so no worries there. And pd.array(np.array(["a", pd.NA], dtype=np.dtypes.StringDType(na_object=pd.NA))).to_numpy() returns an object-typed array so there shouldn't be any loss there either. So I think then we can probably safely remove https://github.com/scverse/anndata/pull/2011/files#diff-c0bf6e23503a6b9cc2420a02c6d6a727a9d035d8fc2ebc8aebf5aa657cca0ecb without worry, hopefully.

@ilan-gold ilan-gold changed the title (feat): use numpy nan-able string type if possible (feat): use numpy nan-able string type for anndata.experimental.MaskedArray Jun 17, 2025
@ilan-gold ilan-gold enabled auto-merge (squash) June 18, 2025 10:19
@ilan-gold ilan-gold merged commit 3ed0329 into main Jun 18, 2025
15 checks passed
@ilan-gold ilan-gold deleted the ig/numpy_nan_types branch June 18, 2025 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants