fix(storage): preserve leading slashes in FsspecStore.path#3926
fix(storage): preserve leading slashes in FsspecStore.path#3926d-v-b merged 5 commits intozarr-developers:mainfrom
Conversation
zarr-developers#3924 ran the constructor's `path` argument through `normalize_path`, which is intended for zarr logical keys and strips leading slashes. Applied to a filesystem-side root, this turned absolute paths like /home/foo/data.zarr into the relative home/foo/data.zarr, breaking LocalFileSystem-backed FsspecStore for any caller that passed an absolute path. Downstream impact: titiler-xarray's test-upstream job fails on every dataset_3d.zarr fixture access. The original zarr-developers#3922 issue (path="/" producing "//key" via _join_paths) is still resolved: rstrip("/") collapses "/" to "", so the join filter drops it. Trailing slashes are also still stripped. Updates the existing test_fsspec_store_path_normalization parametrization with the new (correct) expectations and adds two absolute-path cases. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3926 +/- ##
=======================================
Coverage 93.11% 93.11%
=======================================
Files 85 85
Lines 11365 11369 +4
=======================================
+ Hits 10582 10586 +4
Misses 783 783
🚀 New features to boost your workflow:
|
maxrjones
left a comment
There was a problem hiding this comment.
Is the changelog entry necessary? This regression wasn't released, which makes me wonder if it adds confusion rather than clarity. It could be better as an in-line comment to prevent future mistakes with this fragile part of the codebase.
that's a good point, I will remove the changelog entry and add the comment as you suggest |
|
@maxrjones I'd appreciate another round of review. I realized that the root cause of the bugged fsspec path behavior was an off-target effect of removing a private path normalization helper in #3679, combined with low test coverage for fsspec backends that were sensitive to path normalization details (reference file system). The latest changes in this PR restore that path normalization helper, and add regression tests that use the reference file system in a way that should reveal recurrence of this bug. |
maxrjones
left a comment
There was a problem hiding this comment.
There are some potential issues about the interactions between this PR and fsspec's internals to consider
Loss of normalization invariants
Looking at normalize_path (src/zarr/storage/_utils.py:105-139), it does four things, only one of which (slash stripping) was problematic:
result.replace("\\", "/")— backslash → forward slash conversionresult.strip("/")— leading/trailing slash stripping (the broken behavior)re.sub(r"//+", "/")— collapse repeated internal slashes- Reject
./..segments (raisesValueError)
The PR removes all four. Three of them were doing useful work:
Backslash conversion (Windows). A Windows user constructing FsspecStore.from_upath or passing a pathlib.Path rendered to a string can produce C:\\Users\\foo\\data.zarr. Pre-PR, that became C:/Users/foo/data.zarr. Post-PR, the backslashes flow into _cat_file, _info, etc. fsspec's LocalFileSystem on Windows is generally tolerant, but several other backends (memory, reference, ZIP) treat backslash as a literal byte in keys.
./.. rejection. This was a defense-in-depth check against path traversal. With LocalFileSystem, post-PR, FsspecStore(local_fs, path="/srv/data/../../etc") no longer raises at construction; the OS resolves the parent traversal. Probably not exploitable in normal zarr workflows (keys are generated, not user-supplied), but worth noting that the safety net is gone.
Internal // collapse. A user passing path="bucket//prefix" (or producing it from string concatenation) now sends "bucket//prefix/zarr.json" to the backend. Many fsspec backends tolerate this via their own _strip_protocol, but ReferenceFileSystem (the very backend this PR targets) does not collapse internal slashes — so a malformed root passes through.
A more conservative fix would be to keep normalize_path but also strip the leading-slash behavior, or introduce a separate _normalize_store_root that does everything except lstrip("/").
__eq__ / __hash__ sensitivity to surface form
FsspecStore.__eq__ (line 268-274) compares self.path textually. Pre-PR:
FsspecStore(fs, "foo") == FsspecStore(fs, "foo/") # True
FsspecStore(fs, "foo/bar") == FsspecStore(fs, "foo//bar") # TruePost-PR, both pairs are unequal. This affects code that uses stores as dict keys or for cache deduplication. Combined with the _dereference_path final rstrip("/"), two unequal stores can dispatch to identical fsspec calls — a confusing failure mode.
_strip_protocol interaction is now load-bearing
Different fsspec backends normalize input paths differently inside their internal methods:
S3FileSystem:_strip_protocol("/bucket/key")returns"bucket/key"— leading slash silently dropped. SoFsspecStore.pathand the path actually used by S3 differ.LocalFileSystem: leading slash is meaningful (absolute vs relative-to-cwd). The PR is correct here — that's why this fix exists.MemoryFileSystem:_strip_protocolprepends/if missing, so internal keys always start with/. Path equality betweenFsspecStore(mem, "foo")andFsspecStore(mem, "/foo")is broken at the zarr layer even though they address the same bytes.ReferenceFileSystem: bare keys, no leading slash. The whole motivation for the PR.
Verbatim storage means the caller has to know backend conventions. Pre-PR's normalize_path papered over this; post-PR, the user-visible repr/eq reflects whatever surface form was passed in, not the effective key.
Hazards in methods not touched by the PR
These were already present, but the PR makes them more visible because users will now construct stores with path="/" or absolute paths more freely:
clear()(line 256-263) callsself.fs._find(self.path, withdirs=True). Withpath="/"onLocalFileSystem, this attempts a recursive traversal of the actual filesystem root before deleting. A misconfiguredFsspecStore(LocalFileSystem(asynchronous=True), path="/").clear()is a footgun — and the PR's "absolute paths now work" message arguably encourages this pattern.list/list_dir/list_prefix: still use raw f-string concatenation, so they retain the//keybug forpath="/". The PR's own end-to-endReferenceFileSystemtests don't exercise these paths.
Tighter from_url / url_to_fs coupling
from_url calls fsspec.url_to_fs(url) and passes the returned path verbatim. fsspec's url_to_fs semantics for the path component vary by version and protocol — this PR's correctness now depends on url_to_fs returning a path that is already in the form each backend expects for direct method calls. That's the contract fsspec advertises, but it's worth noting that any future fsspec change to url_to_fs path conventions becomes a FsspecStore regression with no zarr-side cushion.
I'm not convinced this is the case. The use of |
|
(prompt your agent to look at the change history of the affected files, not just the changes in this PR 😄 ) |
|
sorry for the misdirected review @d-v-b. I think this restoration of the previous behavior makes sense. I've been working on:
|
#3924 ran the constructor's
pathargument throughnormalize_path,which is intended for zarr logical keys and strips leading slashes.
Applied to a filesystem-side root, this turned absolute paths like
/home/foo/data.zarr into the relative home/foo/data.zarr, breaking
LocalFileSystem-backed FsspecStore for any caller that passed an
absolute path. Downstream impact: titiler-xarray's test-upstream job
fails on every dataset_3d.zarr fixture access.
The original #3922 issue (path="/" producing "//key" via _join_paths)
is still resolved: rstrip("/") collapses "/" to "", so the join filter
drops it. Trailing slashes are also still stripped.
Updates the existing test_fsspec_store_path_normalization
parametrization with the new (correct) expectations and adds two
absolute-path cases.
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com
cc @vincentsarago