feat:get_ranges by d-v-b · Pull Request #3925 · zarr-developers/zarr-python

d-v-b · 2026-04-24T19:06:38Z

this PR adds a get_ranges protocol for stores. the protocol defines the shape of a function that fetches multiple byte ranges within the same stored object. The purpose is to define a method stores can opt into if they offer an efficient way to fetch multiple byte ranges from the same object, which would be immediately useful for the sharding codec. The protocol looks like this:

async def get_ranges(
    key: str,
    byte_ranges: Iterable[ByteRequest | None],
    *,
    prototype: BufferPrototype,
) -> AsyncIterator[Sequence[tuple[int, Buffer | None]]]:

the return type is an async iterator over sequences, where each sequence is the result of an IO operation the store implemented. this provides some observability to the caller about the actual coalescing, if any, that occurred. Results are returned in computed order, so the inner result type is tuple[int, Buffer | None], where the int is the index into the input byte_ranges for that result.

Only byte range requests that declare an explicit interval (RangeByteRequest) are coalesced. Any other byte range, or None, results in no coalescing and so the ranges will be fetched separately. I assume here that we do not care about coalescing overlapping suffix or prefix range requests, but we could add support for that if we need to.

In addition to this protocol, there's a freestanding function that takes:

a f(byte range) -> Awaitable[Buffer] function (which we would generate by combining Store.get with functools.partial)
an iterable of byte range requests
options (max gap bytes, max total bytes, etc)

this function contains basic byte range coalescing logic, and it can be re-used for multiple stores. This is a non-abstract-base-class alternative to a default implementation on an abc.

That freestanding function is used to implement get_ranges on the FsspecStore. This is probably not useful for local- or memory-backed storage, but is useful for remote storage. The actual implementation is lightweight:

    async def get_ranges(
        self,
        key: str,
        byte_ranges: Iterable[ByteRequest | None],
        *,
        prototype: BufferPrototype,
    ) -> AsyncIterator[Sequence[tuple[int, Buffer | None]]]:
        """Read many byte ranges from ``key``, coalescing nearby ranges and fetching concurrently.

        See :class:`zarr.storage._protocols.SupportsGetRanges` for the contract and
        :func:`zarr.core._coalesce.coalesced_get` for the full semantics.
        """
        fetch = partial(self.get, key, prototype)
        async for group in coalesced_get(fetch, byte_ranges, options=self.coalesce_options):
            yield group

cc @aldenks, the idea here is to build a basis for your range coalescing work for the sharding codec
cc @kylebarron, would love your feedback on this design.

related issues/PRs:

#1758
#3004

…sced_get

…ed_get

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Split the overlong first line into a short numpydoc summary plus an extended description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the input split at the top of coalesced_get, merged groups only ever contain RangeByteRequest members. Replace the per-element isinstance filters (and the defensive ``else 0`` sort-key branch) with a single assertion at the top of the merged-group block and direct attribute access. Also remove the unreachable ``if total == 0: return`` guard (``indexed`` is non-empty by construction once we pass the earlier guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Exercise the ``kind == "missing"`` branch in the uncoalescable single-fetch arm for Offset/Suffix/None inputs, which was not hit by existing tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two related correctness issues in coalesced_get's drain loop: 1. When the consumer breaks out of the async-for (early exit), the generator's finally block only awaited in-flight tasks rather than cancelling them. That wasted I/O. Cancel first, then gather. 2. The drain loop waited on completion_queue for ``total`` entries, but after a "missing" or "error" we cancel pending tasks -- and cancelled tasks never enqueue a completion. With max_concurrency > 1 this could hang. Rework the drain loop to break out immediately on the first miss/error; the finally block handles cleanup. The new structure also collapses the redundant miss/error branches and removes the now-unused ``total``/``drained``/``stopped`` bookkeeping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Exercises the concurrent path where a missing key is observed while other fetches are still in flight. Uses an asyncio.Event to gate late arrivals until after the miss has been processed, giving the drain loop an opportunity to observe and discard post-stop completions, and verifies the iterator terminates cleanly without hanging or raising. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drives many slow ranges with a small max_concurrency, breaks out of the async-for after the first yield, and verifies that at least one still-running fetch was cancelled rather than being left to run to completion. Cancellation is observed via a counter in the fetch's CancelledError branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coalesced_get is implemented as an async generator (uses yield) and callers need access to aclose() to drive its finally block deterministically. Declaring the return type as AsyncGenerator instead of AsyncIterator exposes aclose()/asend()/athrow() through the type system, matches the runtime object, and lets consumers (e.g. the consumer-break test) avoid type-ignore escape hatches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pyproject asyncio_mode=auto already covers async test dispatch; the explicit pytestmark was a vestige. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Used 0000.feature.md as a placeholder; rename to {pr-number}.feature.md once the PR is opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The SupportsGetRanges protocol is private; a user-facing release note shouldn't advertise it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

d-v-b · 2026-04-24T19:17:52Z

for context, we do already have a get_partial_values method, but it returns eagerly (no streaming), and it handles fetching from multiple keys. get_ranges only targets a single key, and it supports streaming. This is IMO a more targeted and performance-friendly design, and it's narrowly what the sharding codec needs for IO against a single shard.

The min_deps CI job pins fsspec to 2023.10.0, which predates AsyncFileSystemWrapper. Wrapping a sync MemoryFileSystem fails there at fixture setup. Guard the affected tests with the same skipif pattern already used in test_fsspec.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-04-24T19:29:08Z

Codecov Report

❌ Patch coverage is 96.29630% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 93.30%. Comparing base (7c2f372) to head (5eb25bc).

Files with missing lines	Patch %	Lines
src/zarr/core/_coalesce.py	95.78%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3925      +/-   ##
==========================================
+ Coverage   93.26%   93.30%   +0.03%     
==========================================
  Files          87       89       +2     
  Lines       11721    11829     +108     
==========================================
+ Hits        10932    11037     +105     
- Misses        789      792       +3

Files with missing lines	Coverage Δ
src/zarr/storage/_fsspec.py	`91.71% <100.00%> (+0.38%)`	⬆️
src/zarr/storage/_protocols.py	`100.00% <100.00%> (ø)`
src/zarr/core/_coalesce.py	`95.78% <95.78%> (ø)`

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kylebarron · 2026-04-24T19:55:48Z

I think making it iterable adds complexity and adds confusion to whether request coalescing is expected to be applied here or not.

If you have requests:

0-1000
1000-2000
2000-3000
100_000 - 100_005

Then of course we should coalesce all the first 3 requests into one. But then the async iterable implies that we might want to use one of the first responses before the last one arrives... but the last request would've arrived first.

Either, if you really want the response type to be async iterable, the responses should probably have an index for which of the input requests it's associated to.

But I think it would be much simpler to take in a sequence of byte ranges and return a sequence of results. Just like object-store/obstore/obspec.

d-v-b · 2026-04-24T20:03:58Z

If you have requests:

* 0-1000

* 1000-2000

* 2000-3000

* 100_000 - 100_005

In the design in this PR we are iterating over IO calls the reader actually did, which is less than or equal to the number of byte ranges requested. So, assuming the first three are fused, we would either see:

yields ((0, <payload>), (1, <payload>), (2, <payload>))
yields ((3, <payload>),)

or

yields ((3, <payload>),)
yields ((0, <payload>), (1, <payload>), (2, <payload>))

Which, for sharding is useful -- you can start decoding chunks immediately while you wait for the rest of the sub-chunks to come in. Does that make sense? or am I misunderstanding something.

ilan-gold

Nice!

At an architecture level, it's not immediately clear to me how this will be exposed to users. But maybe that is out of the scope of this PR? I think we don't want to turn this on by default without some rough estimates.

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

d-v-b · 2026-05-07T20:14:45Z

@ilan-gold have another look, I cleaned things up here a bit. one notable change that I plan on re-using: define subfunctions that explicitly depend on a shared state, instead of relying implicitly on variables defined in the outer scope (like the queue and the semaphore)

chuckwondo

Is there any reason you chose not to implement a range coalescing function independent of fetching? Perhaps I'm missing something, particularly since I'm still very much getting my feet wet with this codebase, but it seems to me that having an independent range coalescing function would allow for far simpler and easier testing.

As currently written, your tests must include fetching since you have coupled coalescing and fetching, which seems to have added quite a bit of verbosity/complexity to your tests that you might otherwise be able to avoid, since existing tests should already be covering the fetching.

d-v-b · 2026-05-07T23:52:56Z

No good reason! I think factoring the range coalescing into a separate routine is a great idea for the reasons you mention. I'll implement that

chuckwondo · 2026-05-08T00:20:22Z

I would also argue that OffsetByteRequests are coalescable.

Although you may not know how many total bytes are available, you certainly know, by definition, where the range starts (the offset), so as you coalesce earlier RangeByteRequests, when you bump into your first OffsetByteRequest, and you still haven't hit your max_coalesced_bytes limit, you can pull as many bytes from the front of the OffsetByteRequest as you need to reach the limit, and shift the offset by the corresponding number of bytes.

You can also coalesce beyond that point. For example, all RangeByteRequests and OffsetByteRequests starting after the first OffsetByteRequest are subsumed by that first OffsetByteRequest.

This, of course, would seem to add a bit of complexity to the coalescing logic, so I suspect you could exclude such logic from this PR, and separately consider whether or not it would be worth the additional complexity at a later date (or never).

d-v-b · 2026-05-08T00:24:26Z

Agree, but the only consumer of this method is the sharding codec which exclusively requests explicit byte ranges. So if we implemented coalescing for other byte range requests, it would never be used.

chuckwondo · 2026-05-08T00:40:07Z

I'm a bit confused then. Why do you accept all types of byte requests as input if you expect only byte range requests?

d-v-b · 2026-05-08T00:48:59Z

The coalescing is a performance optimization, and we don't want to push branching on the request type to the caller. A non-coalesced request falls through to a regular fetch, which is submitted in the same batch as the coalesced fetches. The alternative requires two batches.

And we can't predict the requirements of future codecs. We know today what sharding needs, so we optimize for that. If a future caller needs to coalesce offset range requests, we can implement it inside this function without changing the signature.

chuckwondo · 2026-05-08T00:56:32Z

That clarifies a bit for me, but I suspect I might need more time getting familiar with the library before that will be much more clear to me.

After looking at the PR code a bit more, I see that you group the range requests, but don't actually collapse them (except when you actually fetch the bytes) so that once the bytes are fetched, you can slice them up according to the original range requests that you preserved.

ilan-gold · 2026-05-08T10:07:20Z

+DEFAULT_COALESCE_OPTIONS: CoalesceOptions = {
+    "max_gap_bytes": 1 << 20,  # 1 MiB
+    "max_coalesced_bytes": 16 << 20,  # 16 MiB
+    "max_concurrency": 10,


Why does this need its own maximal concurrency? How will this interact with the global setting?

this needs a max concurrency because implementations may launch concurrent range reads. Right now this setting doesn't interact at all with the global setting, and that's hard to support given our current concurrency limit design -- codecs like sharding call concurrent_map recursively, so the global concurrency limit is alrady on shaky ground for sharding. We should do some testing with this parameter to see if there are any pathologies possible with too much concurrency.

d-v-b and others added 24 commits April 24, 2026 20:19

feat(core): add _coalesce module skeleton with CoalesceOptions and stub

d007c64

test(core): add failing tests for coalesced_get basic cases

abac6d3

feat(core): implement coalesced_get for basic sequential cases

f65018a

test(core): cover Offset/Suffix/None and mixed-cluster cases in coale…

9e1f1d2

…sced_get

feat(core): run coalesced fetches concurrently under max_concurrency

cd5097b

test(core): cover key-missing (start/mid) and fetch-raises in coalesc…

3a85488

…ed_get

test(core): cover max_coalesced_bytes cap in coalesced_get

4553523

test(core): add coverage-invariant property test for coalesced_get

162dd6d

test(core): drop unused HEAVY_MERGE/NO_MERGE constants

401e28b

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs(core): shorten coalesced_get docstring summary line

913928c

Split the overlong first line into a short numpydoc summary plus an extended description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test(core): cover key-missing on uncoalescable input

b2ec638

Exercise the ``kind == "missing"`` branch in the uncoalescable single-fetch arm for Offset/Suffix/None inputs, which was not hit by existing tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

refactor(test): drop redundant pytestmark asyncio

865baf0

pyproject asyncio_mode=auto already covers async test dispatch; the explicit pytestmark was a vestige. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

feat(storage): add private SupportsGetRanges protocol

17d9f75

test(storage): add failing tests for FsspecStore.get_ranges

3ab711d

feat(storage): add FsspecStore.get_ranges and coalesce_options kwarg

913be10

test(storage): add SupportsGetRanges conformance tests

0328e01

chore: add towncrier fragment for get_ranges

e7432c5

Used 0000.feature.md as a placeholder; rename to {pr-number}.feature.md once the PR is opened. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

docs: drop private-symbol mention from get_ranges changelog

349bd9c

The SupportsGetRanges protocol is private; a user-facing release note shouldn't advertise it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

test: refactor tests

79e9927

d-v-b mentioned this pull request Apr 24, 2026

Optimize partial shard reads #3004

Open

6 tasks

d-v-b requested review from TomAugspurger and jhamman May 1, 2026 12:52

Merge branch 'main' into feat/get-many

1607420

d-v-b requested review from ilan-gold May 6, 2026 15:21

d-v-b added 2 commits May 6, 2026 15:46

Merge branch 'main' into feat/get-many

ba6fef2

Merge branch 'main' into feat/get-many

49289de

ilan-gold reviewed May 7, 2026

View reviewed changes

Comment thread src/zarr/core/_coalesce.py

Comment thread src/zarr/core/_coalesce.py Outdated

Comment thread src/zarr/core/_coalesce.py Outdated

aldenks mentioned this pull request May 7, 2026

Performance: fetch inner chunks concurrently within a shard #3953

Open

7 tasks

d-v-b and others added 6 commits May 7, 2026 15:30

Update src/zarr/core/_coalesce.py

f158af3

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update src/zarr/core/_coalesce.py

ef65b65

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Update src/zarr/core/_coalesce.py

3e1f22c

Co-authored-by: Ilan Gold <ilanbassgold@gmail.com>

Merge branch 'main' into feat/get-many

a4906cc

chore: lint

2d556d4

refactor: better function design with explicit context

0e5b2c5

chuckwondo reviewed May 7, 2026

View reviewed changes

Merge branch 'main' into feat/get-many

6e6a1ad

ilan-gold reviewed May 8, 2026

View reviewed changes

chuckwondo reviewed May 8, 2026

View reviewed changes

Comment thread src/zarr/storage/_protocols.py Outdated

d-v-b added 3 commits May 8, 2026 13:31

Merge branch 'main' into feat/get-many

a997792

refactor(coalesce): split coalescing from coalesced get

17f8d64

refactor(coalesce): use sequence instead of iterable

5eb25bc

Uh oh!

Conversation

d-v-b commented Apr 24, 2026

Uh oh!

d-v-b commented Apr 24, 2026

Uh oh!

codecov Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kylebarron commented Apr 24, 2026

Uh oh!

d-v-b commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilan-gold left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

d-v-b commented May 7, 2026

Uh oh!

chuckwondo left a comment

Choose a reason for hiding this comment

Uh oh!

d-v-b commented May 7, 2026

Uh oh!

chuckwondo commented May 8, 2026

Uh oh!

d-v-b commented May 8, 2026

Uh oh!

chuckwondo commented May 8, 2026

Uh oh!

d-v-b commented May 8, 2026

Uh oh!

chuckwondo commented May 8, 2026

Uh oh!

ilan-gold May 8, 2026

Choose a reason for hiding this comment

Uh oh!

d-v-b May 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov Bot commented Apr 24, 2026 •

edited

Loading

d-v-b commented Apr 24, 2026 •

edited

Loading