[opt](load) S3 Load and TVF support access without AKSK#53592
[opt](load) S3 Load and TVF support access without AKSK#53592dataroaring merged 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
d794ed9 to
94c1a8e
Compare
|
run buildall |
TPC-H: Total hot run time: 34320 ms |
TPC-DS: Total hot run time: 189628 ms |
ClickBench: Total hot run time: 33.27 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
| //should add connectivity test | ||
| boolean connectivityTest = FileSystemFactory.get(brokerDesc.getStorageProperties()) | ||
| .connectivityTest(filePaths); | ||
| if (!connectivityTest) { | ||
| throw new UserException("Failed to access object storage, message=connectivity test failed"); | ||
| } |
There was a problem hiding this comment.
Why skip connectivity checks? It's always better to fail fast when user credentials are invalid. public access is the least commonly used authentication method in production.
There was a problem hiding this comment.
The headBucket interface used for connectivity check requires users to have either headBucket or listBucket permissions. But actual users may not necessarily have this permission. For example, accessing only a specific file does not require granting Bucket related permissions.
| // For anonymous access (no credentials required) | ||
| if (StringUtils.isBlank(getAccessKey()) && StringUtils.isBlank(getSecretKey())) { | ||
| return AnonymousCredentialsProvider.create(); | ||
| } |
There was a problem hiding this comment.
This setup allows COS/OBS/OSS to be accessed without credentials. Have you tested whether they actually permit this kind of access?
There was a problem hiding this comment.
I moved the code to each own implementation, and COS/OBS/OSS also support anonymous access.
|
Could you please add some context and design details to the PR description? tbh, this approach feels a bit too blunt. You need to make sure the corresponding bucket/key is publicly accessible. Besides that, I really find it hard to imagine that database-related applications in a production environment would actually access public S3 data |
For users do tests using public dataset. |
94c1a8e to
c82baa2
Compare
|
run buildall |
|
I tested in my environment and all is OK without AK and SK to access public-available S3 buckets. |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
TPC-H: Total hot run time: 34511 ms |
TPC-DS: Total hot run time: 191853 ms |
ClickBench: Total hot run time: 32.74 s |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 33722 ms |
TPC-DS: Total hot run time: 186587 ms |
ClickBench: Total hot run time: 32.13 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
There are scenarios where publicly readable buckets are used for testing with open datasets, requiring access without AK/SK credentials. Anonymous access to data on S3 is supported here.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)