GH-44308: [C++][FS][Azure] Implement SAS token authentication#45021
GH-44308: [C++][FS][Azure] Implement SAS token authentication#45021kou merged 22 commits intoapache:mainfrom
Conversation
…n. This avoids cheating by using the account key again to generate SAS tokens in tests
cpp/src/arrow/filesystem/azurefs.cc
Outdated
| // Assume these are part of a SAS token. Its not ideal to make such an assumption | ||
| // but given that a SAS token is a complex set of URI parameters, that could be | ||
| // tricky to exhaustively list I think its the best option. | ||
| credential_kind = CredentialKind::kSasToken; |
There was a problem hiding this comment.
We don't have the SAS token specification that includes parameter names used by a SAS token, right?
There was a problem hiding this comment.
Yeah, I had a quick search and couldn't find what we need. If you think it's important I can try a bit harder. The closest I found seemed to be unabbreviated versions of what actually appears in the sas token.
There was a problem hiding this comment.
There are many parameters but can we check them...?
There was a problem hiding this comment.
I think that doc is only for user delegated SAS tokens so I unioned it with the parameters for account and service SAS tokens and hopefully the spec is slowly changing.
I wasn't really confident on the best way to define a constant set of strings to do contains checks against in C++. Since there are only 27 values, I ended up with a constexpr array and std::find but please let me know if this is not a good option.
|
Wait... I might have just accidentally worked out how to avoid any of the special authentication stuff for copying... |
d30e6ce to
7576c4c
Compare
cpp/src/arrow/filesystem/azurefs.cc
Outdated
| } else if (kv.first == "background_writes") { | ||
| ARROW_ASSIGN_OR_RAISE(background_writes, | ||
| ::arrow::internal::ParseBoolean(kv.second)); | ||
| } else if (std::find(sas_token_query_parameters.begin(), |
There was a problem hiding this comment.
Can we use std::binary_search() with sorted sas_token_query_parameters?
There was a problem hiding this comment.
Could do but I would be a bit concerned about keeping sas_token_query_parameters sorted. It looks like C++20 allows using std::sort with constexpr but I believe arrow currently uses C++17. If you are concerned about the complexity of the lookup I think my preference would be to use a std::set and forget about trying to make it a constexpr.
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit ba2b9e5. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 16 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
SAS token auth is sometimes useful and it the last one we haven't implemented.
What changes are included in this PR?
ConfigureSasCredentialAzureOptions::FromUriso that simply appending a SAS token to a blob storage URI works. e.g.AzureOptions::FromUri("abfs://file_system@account.dfs.core.windows.net/?se=2024-12-12T18:57:47Z&sig=pAs7qEBdI6sjUhqX1nrhNAKsTY%2B1SqLxPK%2BbAxLiopw%3D&sp=racwdxylti&spr=https,http&sr=c&sv=2024-08-04")CopyFileto use StartCopyFromUri instead of CopyFromUriAre these changes tested?
Yes
CopyFileAzureOptions::FromUriwith a SAS token.I also made sure to run the tests which connect to real blob storage.
Are there any user-facing changes?
AzureOptions::FromUriinstead of failing fast. IMO this is a regression but still the best option to support SAS token.