Arrow: Suppress warning and cache bucket location#1709
Arrow: Suppress warning and cache bucket location#1709kevinjqliu merged 3 commits intoapache:mainfrom
Conversation
Attemt to remove the unneccessary warning, and cache the location of the bucket independent of the FileIO. Fixes apache#1705 Fixes apache#1708
pyiceberg/io/pyarrow.py
Outdated
| try: | ||
| bucket_region = resolve_s3_region(bucket=netloc) | ||
| bucket_region = _cached_resolve_s3_region(bucket=netloc) | ||
| except (OSError, TypeError): | ||
| bucket_region = None | ||
| logger.warning(f"Unable to resolve region for bucket {netloc}, using default region {provided_region}") |
There was a problem hiding this comment.
How about moving try/except to _cached_resolve_s3_region and caching even the None value? Does the result change when we retry to resolve the same netloc? Or do we need to warn the user every time it fails to resolve the netloc?
There was a problem hiding this comment.
Hey @hussein-awala Thanks for jumping in here, and I really like your suggestion. Just pushed the suggestion, LMKWYT.
Or do we need to warn the user every time it fails to resolve the netloc?
I was surprised by hearing that folks are seeing multiple warnings, since Python only warns once by design:
Repetitions of a particular warning for the same source location are typically suppressed.
876f997 to
1e3e360
Compare
|
Thanks for the suggestion @hussein-awala I like this approach. My only question before proceeding is whether we want the We can also revisit this at a later time. |
c23058e to
6437502
Compare
There was a problem hiding this comment.
LGTM! I like this behavior
- When
s3.regionis set, use it to initialize theS3FileSystem(s3.resolve-regionisFalseby default) - When
s3.regionis set ands3.resolve-regionisTrue, resolve the region by callingresolve_s3_regionand use the resolved region forS3FileSystem. If the resolved region is different thans3.region, emit a warning to the user. - When
s3.regionis not set ands3.resolve-regionisFalse(default), use thes3.region'sNonevalue forS3FileSystem. TheS3FileSystemalso does its own resolution. - When
s3.regionis not set ands3.resolve-regionisTrue, resolve the region and use that forS3FileSystem, but do not emit a warning to user.
Attemt to remove the unneccessary warning, and cache the location of the bucket independent of the FileIO. Fixes apache#1705 Fixes apache#1708
Attemt to remove the unneccessary warning, and cache the location of the bucket independent of the FileIO.
Fixes #1705
Fixes #1708